From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA18518; Thu, 8 Nov 2001 16:25:06 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id QAA19044 for ; Thu, 8 Nov 2001 16:25:05 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id fA8FOgX24748; Thu, 8 Nov 2001 16:24:42 +0100 (MET) Received: (from vouillon@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA18894; Thu, 8 Nov 2001 16:24:41 +0100 (MET) Date: Thu, 8 Nov 2001 16:24:41 +0100 From: Jerome Vouillon To: andrew@absentis.com Cc: caml-list@inria.fr Subject: Re: [Caml-list] Searching large lists Message-ID: <20011108162441.A16848@pauillac.inria.fr> References: <20011108140657.75752.qmail@web12305.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20011108140657.75752.qmail@web12305.mail.yahoo.com>; from xandrew_lawsonx@yahoo.co.uk on Thu, Nov 08, 2001 at 06:06:57AM -0800 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Thu, Nov 08, 2001 at 06:06:57AM -0800, Andrew Lawson wrote: > I have a list containing up to 100,000 strings > between 10 and 200 characters in length. I want to > produce a list of those that match a regular > expression. It seems that the obvious way is to > List.filter with a predicate returning true if the > string matches, however in my case this can take up to > 15 seconds. Has anyone got any ideas for speeding this > up? The Str library is really slow. For Unison (http://www.cis.upenn.edu/~bcpierce/unison/), we wrote our own regular expression library to get acceptable performances. You should try PCRE (http://sourceforge.net/projects/pcre-ocaml/) or maybe RE (http://sourceforge.net/projects/libre/). If you compile to native code, the RE library should be the fastest in your case (probably about 5 to 10 times faster than PCRE). It is still under development though, so some features are missing. -- Jerome ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr