caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [1/2 OT] Indexing (and mergeable Index-algorithms)
@ 2005-11-16 23:42 Oliver Bandel
  2005-11-17  8:15 ` [Caml-list] " skaller
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Oliver Bandel @ 2005-11-16 23:42 UTC (permalink / raw)
  To: caml-list

Hello,


I'm looking for indexing algorithms and especially - if
such a thing exists - mergeable/extendable indexing algorithms.

So, say we have 10^6 texts that we want ot have an index for,
to retrieve the text according to some parts of the text
(keywords, substrings,...).
We want to make an index from these texts.

After a while we get 10^5 new texts and want to extend
the exisiting index, so that the whole index not necessarily
must be created again, with the indexer-tool running on
all files (^10^6 + 10^5) again, but only have to index the new files,
but the big index can be extended with additional smaller indizes.

Is there something like that already existing?
Or must the new index be created on all files again,
or must there be a workaround with the big and a small index-file,
where handling of both would be a solution we must provide by ourselfes?

It's mainly a question on datastructures/algorithms, so this mailing list
may be the wrong, but the reason to aske here is: Are functional
datastructures in some way good for implementing such tools?


BTW: Let's mention that the application I intended to write is
     performance critical.... so, if functional datastructures
     are quite good for such extendable indexes, but are too slow,
     then thsi would also be a problem.


Any hints here?
(Maybe using OCaml, but the imperative features of it would help,
 if the functional features would be too slow?)

Any hint on algorithms/datastructures for this would be fine...


Thanks In Advance,
           Oliver



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2005-11-18 21:12 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-16 23:42 [1/2 OT] Indexing (and mergeable Index-algorithms) Oliver Bandel
2005-11-17  8:15 ` [Caml-list] " skaller
2005-11-17 15:09   ` Brian Hurt
2005-11-17 17:31     ` skaller
2005-11-17 18:08       ` Brian Hurt
2005-11-17 18:57         ` skaller
2005-11-17 22:15           ` Brian Hurt
2005-11-18  1:49             ` skaller
2005-11-17  8:35 ` Florian Hars
2005-11-17  9:24   ` Oliver Bandel
2005-11-17 12:39     ` Florian Weimer
2005-11-17 20:57       ` Oliver Bandel
2005-11-17 22:02         ` Florian Weimer
2005-11-17 11:49 ` Florian Weimer
2005-11-17 13:55   ` Richard Jones
2005-11-18 14:54   ` Jonathan Bryant
2005-11-18 14:22     ` Oliver Bandel
2005-11-18 14:37       ` Florian Weimer
2005-11-18 15:05         ` Thomas Fischbacher
2005-11-18 15:14           ` Florian Weimer
2005-11-18 16:03             ` Thomas Fischbacher
2005-11-18 20:03               ` Gerd Stolpmann
2005-11-18 20:01             ` Gerd Stolpmann
2005-11-18 21:12               ` Florian Weimer
2005-11-18 16:13         ` Oliver Bandel
2005-11-18 14:45     ` Florian Weimer
     [not found] ` <437CD0E5.8080503@yahoo.fr>
2005-11-17 20:02   ` Oliver Bandel
     [not found]     ` <437CE8EC.1070109@yahoo.fr>
2005-11-17 20:41       ` Oliver Bandel
2005-11-18 15:06         ` Florian Hars
     [not found] ` <437BD5F5.6010307@1969web.com>
2005-11-17 20:10   ` Oliver Bandel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).