| From | Sent On | Attachments |
|---|---|---|
| Davi...@MITRE.org) | May 18, 2011 9:52 pm | |
| Dawid Weiss | May 18, 2011 11:30 pm | |
| Earwin Burrfoot | May 19, 2011 1:24 am | |
| Dawid Weiss | May 19, 2011 1:42 am | |
| Michael McCandless | May 19, 2011 3:08 am | |
| Dawid Weiss | May 19, 2011 3:16 am | |
| Michael McCandless | May 19, 2011 3:21 am | |
| Dawid Weiss | May 19, 2011 3:36 am | |
| Earwin Burrfoot | May 19, 2011 5:30 am | |
| Michael McCandless | May 19, 2011 5:44 am | |
| Dawid Weiss | May 19, 2011 5:45 am | |
| Earwin Burrfoot | May 19, 2011 6:02 am | |
| Robert Muir | May 19, 2011 6:04 am | |
| Dawid Weiss | May 19, 2011 6:20 am | |
| Jason Rutherglen | May 19, 2011 6:22 am | |
| Earwin Burrfoot | May 19, 2011 6:29 am | |
| Jason Rutherglen | May 19, 2011 6:36 am | |
| Michael McCandless | May 19, 2011 6:40 am | |
| Jason Rutherglen | May 19, 2011 7:08 am | |
| Davi...@MITRE.org) | May 19, 2011 7:53 am | |
| Davi...@MITRE.org) | May 19, 2011 8:58 am | |
| Michael McCandless | May 19, 2011 8:59 am | |
| Jason Rutherglen | May 19, 2011 9:35 am | |
| Michael McCandless | May 19, 2011 9:42 am | |
| Earwin Burrfoot | May 19, 2011 11:00 am | |
| Dawid Weiss | May 19, 2011 11:48 am |
| Subject: | Re: FST and FieldCache? | |
|---|---|---|
| From: | Jason Rutherglen (jaso...@gmail.com) | |
| Date: | May 19, 2011 6:36:53 am | |
| List: | org.apache.lucene.java-dev | |
This is more about compressing strings in TermsIndex, I think.
Ah, because they're sorted. I think if the string lookup cost degrades then it's not worth it? That's something that needs to be tested in the MMap case as well, eg, are ByteBuffers somehow slowing down everything by a factor of 10%?
On Thu, May 19, 2011 at 6:30 AM, Earwin Burrfoot <ear...@gmail.com> wrote:
This is more about compressing strings in TermsIndex, I think. And ability to use said TermsIndex directly in some cases that required FieldCache before. (Maybe FC is still needed, but it can be degraded to docId->ord map, storing actual strings in TI). This yields fat space savings when we, eg, need to both lookup on a field and build facets out of it.
mmap is cool :) What I want to see is a FST-based TermsDict that is simply mmaped into memory, without building intermediate indexes, like Lucene does now. And docvalues are orthogonal to that, no?





