From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 387CDBB9C for ; Thu, 17 Nov 2005 21:02:55 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jAHK2sLv001176 for ; Thu, 17 Nov 2005 21:02:54 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA26910 for ; Thu, 17 Nov 2005 21:02:54 +0100 (MET) Received: from einhorn.in-berlin.de (einhorn.in-berlin.de [192.109.42.8]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jAHK2rdG001171 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Thu, 17 Nov 2005 21:02:54 +0100 X-Envelope-From: oliver@first.in-berlin.de X-Envelope-To: Received: from first.in-berlin.de (e178009010.adsl.alicedsl.de [85.178.9.10]) (authenticated bits=0) by einhorn.in-berlin.de (8.12.10/8.12.10/Debian-4) with ESMTP id jAHK2qt4013468 for ; Thu, 17 Nov 2005 21:02:52 +0100 Received: by first.in-berlin.de (Postfix, from userid 501) id 21A7B193BEF; Thu, 17 Nov 2005 21:02:13 +0100 (CET) Date: Thu, 17 Nov 2005 21:02:13 +0100 From: Oliver Bandel To: caml-list@inria.fr Subject: Re: [Caml-list] [1/2 OT] Indexing (and mergeable Index-algorithms) Message-ID: <20051117200213.GC344@first.in-berlin.de> References: <20051116234238.GA5741@first.in-berlin.de> <437CD0E5.8080503@yahoo.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <437CD0E5.8080503@yahoo.fr> User-Agent: Mutt/1.5.6i X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 X-Miltered: at concorde with ID 437CE1EE.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 437CE1ED.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; oliver:01 bandel:01 oliver:01 in-berlin:01 caml-list:01 indexing:01 bandel:01 ocaml:01 wrote:01 imperative:01 functional:02 python:02 hints:03 langage:03 algorithms:03 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO autolearn=disabled version=3.0.3 On Thu, Nov 17, 2005 at 07:50:13PM +0100, sejourne_kevin wrote: > Oliver Bandel a écrit : > >Any hints here? > >(Maybe using OCaml, but the imperative features of it would help, > > if the functional features would be too slow?) > > > >Any hint on algorithms/datastructures for this would be fine... > > > > In my work we use "Lucene" (apache.org). Lucene is in java but have been > re-coded in few other langage (c++,python...), the spec of the infex > format is availble for free (online). > > I think the real problem for 10^x (x > 6) docs is the size of the index, > not really the speed of the answer(not for lucene (fast and small)). The size of the index as well as creating time of it. If it's not updateble and must be created new for every update, than this needs to much time. If updates are cheap, then thats the right thing. Ciao, Oliver