caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Richard Jones <rich@annexia.org>
To: caml-list@inria.fr
Subject: Re: [Caml-list] [1/2 OT] Indexing (and mergeable Index-algorithms)
Date: Thu, 17 Nov 2005 13:55:34 +0000	[thread overview]
Message-ID: <20051117135534.GB19578@furbychan.cocan.org> (raw)
In-Reply-To: <87r79fjxy4.fsf@mid.deneb.enyo.de>

On Thu, Nov 17, 2005 at 12:49:55PM +0100, Florian Weimer wrote:
> Plenty.  Berkeley DB, SQLite, full-blown SQL database servers like
> PostgreSQL or MySQL.  The list is pretty long.

We use PostgreSQL's tsearch2[1] module to index web pages across our
main site and customer sites.  Today we have 38,437 pages including
old versions in the index.

Pros:

* Extremely easy to use - you just insert pages as rows in the database.
* Very featureful - does stemming, multiple language support, etc.
* Works from OCaml using, eg., ocamldbi, OCaml-PostgreSQL module.

Cons:

* Quite hard to install - you need to read the documentation carefully.
* Slow for lookups - I haven't quite got to the bottom of this so I
  don't know if it's inherently slow or if I haven't set up the indexes
  right.

Rich.

[1] http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


  reply	other threads:[~2005-11-17 13:35 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-16 23:42 Oliver Bandel
2005-11-17  8:15 ` [Caml-list] " skaller
2005-11-17 15:09   ` Brian Hurt
2005-11-17 17:31     ` skaller
2005-11-17 18:08       ` Brian Hurt
2005-11-17 18:57         ` skaller
2005-11-17 22:15           ` Brian Hurt
2005-11-18  1:49             ` skaller
2005-11-17  8:35 ` Florian Hars
2005-11-17  9:24   ` Oliver Bandel
2005-11-17 12:39     ` Florian Weimer
2005-11-17 20:57       ` Oliver Bandel
2005-11-17 22:02         ` Florian Weimer
2005-11-17 11:49 ` Florian Weimer
2005-11-17 13:55   ` Richard Jones [this message]
2005-11-18 14:54   ` Jonathan Bryant
2005-11-18 14:22     ` Oliver Bandel
2005-11-18 14:37       ` Florian Weimer
2005-11-18 15:05         ` Thomas Fischbacher
2005-11-18 15:14           ` Florian Weimer
2005-11-18 16:03             ` Thomas Fischbacher
2005-11-18 20:03               ` Gerd Stolpmann
2005-11-18 20:01             ` Gerd Stolpmann
2005-11-18 21:12               ` Florian Weimer
2005-11-18 16:13         ` Oliver Bandel
2005-11-18 14:45     ` Florian Weimer
     [not found] ` <437CD0E5.8080503@yahoo.fr>
2005-11-17 20:02   ` Oliver Bandel
     [not found]     ` <437CE8EC.1070109@yahoo.fr>
2005-11-17 20:41       ` Oliver Bandel
2005-11-18 15:06         ` Florian Hars
     [not found] ` <437BD5F5.6010307@1969web.com>
2005-11-17 20:10   ` Oliver Bandel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051117135534.GB19578@furbychan.cocan.org \
    --to=rich@annexia.org \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).