From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id PAA15824; Thu, 8 Nov 2001 15:14:44 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id PAA16377 for ; Thu, 8 Nov 2001 15:14:43 +0100 (MET) Received: from staff.cs.usyd.edu.au (staff.cs.usyd.edu.au [129.78.8.1]) by concorde.inria.fr (8.11.1/8.10.0) with SMTP id fA8EEfX20173 for ; Thu, 8 Nov 2001 15:14:41 +0100 (MET) Received: from hons.cs.usyd.edu.au. by staff.cs.usyd.edu.au.; Fri, 09 Nov 2001 01:14:36 +1100 Received: from localhost (mrak@localhost) by hons.cs.usyd.edu.au (8.9.3/8.9.3) with ESMTP id BAA29917 for ; Fri, 9 Nov 2001 01:14:36 +1100 X-Authentication-Warning: hons.cs.usyd.edu.au: mrak owned process doing -bs Date: Fri, 9 Nov 2001 01:14:36 +1100 (EST) From: Mark Wotton cc: caml-list@inria.fr Subject: Re: [Caml-list] Searching large lists In-Reply-To: <20011108140657.75752.qmail@web12305.mail.yahoo.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Thu, 8 Nov 2001, Andrew Lawson wrote: > Hi all > I have a list containing up to 100,000 strings > between 10 and 200 characters in length. I want to > produce a list of those that match a regular > expression. It seems that the obvious way is to > List.filter with a predicate returning true if the > string matches, however in my case this can take up to > 15 seconds. Has anyone got any ideas for speeding this > up? > > thanks > > Andrew This would probably require rewriting whatever you're using to do the regexes, but if you use a trie to store all the strings, you could maintain a list of nodes which matched at each stage of the regex. This should be a fair bit faster... mrak ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr