From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 32F95BB9C for ; Thu, 17 Nov 2005 14:35:00 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jAHDYx8V021467 for ; Thu, 17 Nov 2005 14:34:59 +0100 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA17983 for ; Thu, 17 Nov 2005 14:34:59 +0100 (MET) Received: from furbychan.cocan.org (furbychan.cocan.org [80.68.91.176]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id jAHDYvbx004268 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Thu, 17 Nov 2005 14:34:58 +0100 Received: from rich by furbychan.cocan.org with local (Exim 3.35 #1 (Debian)) id 1EckEg-0006Fg-00 for ; Thu, 17 Nov 2005 13:55:34 +0000 Date: Thu, 17 Nov 2005 13:55:34 +0000 To: caml-list@inria.fr Subject: Re: [Caml-list] [1/2 OT] Indexing (and mergeable Index-algorithms) Message-ID: <20051117135534.GB19578@furbychan.cocan.org> References: <20051116234238.GA5741@first.in-berlin.de> <87r79fjxy4.fsf@mid.deneb.enyo.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r79fjxy4.fsf@mid.deneb.enyo.de> User-Agent: Mutt/1.3.28i From: Richard Jones X-Miltered: at concorde with ID 437C8703.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 437C8701.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 indexing:01 ocaml:01 inherently:01 notepad:01 sai:98 wrote:01 florian:02 cons:03 module:03 module:03 berkeley:03 long:05 thu:05 install:05 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Thu, Nov 17, 2005 at 12:49:55PM +0100, Florian Weimer wrote: > Plenty. Berkeley DB, SQLite, full-blown SQL database servers like > PostgreSQL or MySQL. The list is pretty long. We use PostgreSQL's tsearch2[1] module to index web pages across our main site and customer sites. Today we have 38,437 pages including old versions in the index. Pros: * Extremely easy to use - you just insert pages as rows in the database. * Very featureful - does stemming, multiple language support, etc. * Works from OCaml using, eg., ocamldbi, OCaml-PostgreSQL module. Cons: * Quite hard to install - you need to read the documentation carefully. * Slow for lookups - I haven't quite got to the bottom of this so I don't know if it's inherently slow or if I haven't set up the indexes right. Rich. [1] http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com