atom feed26 messages in org.apache.lucene.java-devRe: FST and FieldCache?
FromSent OnAttachments
Davi...@MITRE.org)May 18, 2011 9:52 pm 
Dawid WeissMay 18, 2011 11:30 pm 
Earwin BurrfootMay 19, 2011 1:24 am 
Dawid WeissMay 19, 2011 1:42 am 
Michael McCandlessMay 19, 2011 3:08 am 
Dawid WeissMay 19, 2011 3:16 am 
Michael McCandlessMay 19, 2011 3:21 am 
Dawid WeissMay 19, 2011 3:36 am 
Earwin BurrfootMay 19, 2011 5:30 am 
Michael McCandlessMay 19, 2011 5:44 am 
Dawid WeissMay 19, 2011 5:45 am 
Earwin BurrfootMay 19, 2011 6:02 am 
Robert MuirMay 19, 2011 6:04 am 
Dawid WeissMay 19, 2011 6:20 am 
Jason RutherglenMay 19, 2011 6:22 am 
Earwin BurrfootMay 19, 2011 6:29 am 
Jason RutherglenMay 19, 2011 6:36 am 
Michael McCandlessMay 19, 2011 6:40 am 
Jason RutherglenMay 19, 2011 7:08 am 
Davi...@MITRE.org)May 19, 2011 7:53 am 
Davi...@MITRE.org)May 19, 2011 8:58 am 
Michael McCandlessMay 19, 2011 8:59 am 
Jason RutherglenMay 19, 2011 9:35 am 
Michael McCandlessMay 19, 2011 9:42 am 
Earwin BurrfootMay 19, 2011 11:00 am 
Dawid WeissMay 19, 2011 11:48 am 
Subject:Re: FST and FieldCache?
From:Dawid Weiss (dawi@cs.put.poznan.pl)
Date:May 19, 2011 3:16:26 am
List:org.apache.lucene.java-dev

We should add this (lookup by value, when value is guaranteed to monotonically increase as the key increases) to our core FST APIs? It's generically useful in many places ;)  I'll open an issue.

The data structure itself should sort of "build itself" if you create an FST with increasing integers because the "shared suffix" should be pushed towards the root anyway, so the only thing would be to correct values on all outgoing arcs (they need to contain the count of leaves on the subtree).... but then, this may be tricky if arc values are vcoded... I'd have to think how to do this.

While FST w/ lookup-by-monotonic-value would work here, I would be worried about the per hit of that representation vs what

There are actually two things:

a) performance; you need to descend in the automaton and some bookkeeping to maintain the count of nodes; this adds overhead,

b) size; the procedure for storing/ calculating perfect hashes I described requires leaf counts on each arc and these are usually large integers. Even vcoded they bloat the resulting data structure.

Dawid