atom feed26 messages in org.apache.lucene.java-devRe: FST and FieldCache?
FromSent OnAttachments
Davi...@MITRE.org)May 18, 2011 9:52 pm 
Dawid WeissMay 18, 2011 11:30 pm 
Earwin BurrfootMay 19, 2011 1:24 am 
Dawid WeissMay 19, 2011 1:42 am 
Michael McCandlessMay 19, 2011 3:08 am 
Dawid WeissMay 19, 2011 3:16 am 
Michael McCandlessMay 19, 2011 3:21 am 
Dawid WeissMay 19, 2011 3:36 am 
Earwin BurrfootMay 19, 2011 5:30 am 
Michael McCandlessMay 19, 2011 5:44 am 
Dawid WeissMay 19, 2011 5:45 am 
Earwin BurrfootMay 19, 2011 6:02 am 
Robert MuirMay 19, 2011 6:04 am 
Dawid WeissMay 19, 2011 6:20 am 
Jason RutherglenMay 19, 2011 6:22 am 
Earwin BurrfootMay 19, 2011 6:29 am 
Jason RutherglenMay 19, 2011 6:36 am 
Michael McCandlessMay 19, 2011 6:40 am 
Jason RutherglenMay 19, 2011 7:08 am 
Davi...@MITRE.org)May 19, 2011 7:53 am 
Davi...@MITRE.org)May 19, 2011 8:58 am 
Michael McCandlessMay 19, 2011 8:59 am 
Jason RutherglenMay 19, 2011 9:35 am 
Michael McCandlessMay 19, 2011 9:42 am 
Earwin BurrfootMay 19, 2011 11:00 am 
Dawid WeissMay 19, 2011 11:48 am 
Subject:Re: FST and FieldCache?
From:Jason Rutherglen (jaso@gmail.com)
Date:May 19, 2011 6:36:53 am
List:org.apache.lucene.java-dev

This is more about compressing strings in TermsIndex, I think.

Ah, because they're sorted. I think if the string lookup cost degrades then it's not worth it? That's something that needs to be tested in the MMap case as well, eg, are ByteBuffers somehow slowing down everything by a factor of 10%?

On Thu, May 19, 2011 at 6:30 AM, Earwin Burrfoot <ear@gmail.com> wrote:

This is more about compressing strings in TermsIndex, I think. And ability to use said TermsIndex directly in some cases that required FieldCache before. (Maybe FC is still needed, but it can be degraded to docId->ord map, storing actual strings in TI). This yields fat space savings when we, eg,  need to both lookup on a field and build facets out of it.

mmap is cool :)  What I want to see is a FST-based TermsDict that is simply mmaped into memory, without building intermediate indexes, like Lucene does now. And docvalues are orthogonal to that, no?