caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Simple full-text search dictionary
@ 2002-09-24 17:04 Kontra, Gergely
  2002-09-25  6:37 ` Mattias Waldau
  0 siblings, 1 reply; 2+ messages in thread
From: Kontra, Gergely @ 2002-09-24 17:04 UTC (permalink / raw)
  To: caml-list

Hi!

I want to implement a simple *full-text* search dictionary.
I have a textfile with a list of the words.

The main question of mine is how can I (easily) collect words, that
match a regexp. I've found Str.search_forward, but this is seems
complicated when using with List.filter (raises Exception...).

Another question, whether to store the dictionary in a say: list or
not. If yes, how to fill the list efficiently.

Gergo

+-[Kontra, Gergely @ Budapest University of Technology and Economics]-+
|         Email: kgergely@mcl.hu,  kgergely@turul.eet.bme.hu          |
|  URL:   turul.eet.bme.hu/~kgergely    Mobile: (+36 20) 356 9656     |
+-------"Olyan langesz vagyok, hogy poroltoval kellene jarnom!"-------+
.
Magyar php mirror es magyar php dokumentacio: http://hu.php.net

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: [Caml-list] Simple full-text search dictionary
  2002-09-24 17:04 [Caml-list] Simple full-text search dictionary Kontra, Gergely
@ 2002-09-25  6:37 ` Mattias Waldau
  0 siblings, 0 replies; 2+ messages in thread
From: Mattias Waldau @ 2002-09-25  6:37 UTC (permalink / raw)
  To: 'Kontra, Gergely', caml-list

> I want to implement a simple *full-text* search dictionary.
> I have a textfile with a list of the words.
> 

Suffix arrays are very efficient for fast string searching 
in big texts. 

I year ago I implemented it using O'Caml, which 
took 40 s to run on my P4, when applied to the bible (4.5 MB)

The nice version is at 
http://www.abc.se/~m10217/download/mans.tar.bz2

The fast version is at
http://www.abc.se/~m10217/download/mans.opt.tar.bz2

/mattias

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-09-25  6:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-24 17:04 [Caml-list] Simple full-text search dictionary Kontra, Gergely
2002-09-25  6:37 ` Mattias Waldau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).