atom feed26 messages in org.apache.lucene.java-devRe: FST and FieldCache?
FromSent OnAttachments
Davi...@MITRE.org)May 18, 2011 9:52 pm 
Dawid WeissMay 18, 2011 11:30 pm 
Earwin BurrfootMay 19, 2011 1:24 am 
Dawid WeissMay 19, 2011 1:42 am 
Michael McCandlessMay 19, 2011 3:08 am 
Dawid WeissMay 19, 2011 3:16 am 
Michael McCandlessMay 19, 2011 3:21 am 
Dawid WeissMay 19, 2011 3:36 am 
Earwin BurrfootMay 19, 2011 5:30 am 
Michael McCandlessMay 19, 2011 5:44 am 
Dawid WeissMay 19, 2011 5:45 am 
Earwin BurrfootMay 19, 2011 6:02 am 
Robert MuirMay 19, 2011 6:04 am 
Dawid WeissMay 19, 2011 6:20 am 
Jason RutherglenMay 19, 2011 6:22 am 
Earwin BurrfootMay 19, 2011 6:29 am 
Jason RutherglenMay 19, 2011 6:36 am 
Michael McCandlessMay 19, 2011 6:40 am 
Jason RutherglenMay 19, 2011 7:08 am 
Davi...@MITRE.org)May 19, 2011 7:53 am 
Davi...@MITRE.org)May 19, 2011 8:58 am 
Michael McCandlessMay 19, 2011 8:59 am 
Jason RutherglenMay 19, 2011 9:35 am 
Michael McCandlessMay 19, 2011 9:42 am 
Earwin BurrfootMay 19, 2011 11:00 am 
Dawid WeissMay 19, 2011 11:48 am 
Subject:Re: FST and FieldCache?
From:Earwin Burrfoot (ear@gmail.com)
Date:May 19, 2011 6:29:45 am
List:org.apache.lucene.java-dev

This is more about compressing strings in TermsIndex, I think. And ability to use said TermsIndex directly in some cases that required FieldCache before. (Maybe FC is still needed, but it can be degraded to docId->ord map, storing actual strings in TI). This yields fat space savings when we, eg, need to both lookup on a field and build facets out of it.

mmap is cool :) What I want to see is a FST-based TermsDict that is simply mmaped into memory, without building intermediate indexes, like Lucene does now. And docvalues are orthogonal to that, no?

On Thu, May 19, 2011 at 17:22, Jason Rutherglen <jaso@gmail.com> wrote:

maybe thats because we have one huge monolithic implementation

Doesn't the DocValues branch solve this?

Also, instead of trying to implement clever ways of compressing strings in the field cache, which probably won't bare fruit, I'd prefer to look at [eventually] MMap'ing (using DV) the field caches to avoid the loading and heap costs, which are signifcant.  I'm not sure if we can easily MMap packed ints and the shared byte[], though it seems fairly doable?

On Thu, May 19, 2011 at 6:05 AM, Robert Muir <rcm@gmail.com> wrote:

2011/5/19 Michael McCandless <luc@mikemccandless.com>:

Of course, for certain apps that perf hit is justified, so probably we should make this an option when populating field cache (ie, in-memory storage option of using an FST vs using packed ints/byte[]).

or should we actually try to have different fieldcacheimpls?

I see all these missions to refactor the thing, which always fail.

maybe thats because we have one huge monolithic implementation.