caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Searching large lists
@ 2001-11-08 14:06 Andrew Lawson
  2001-11-08 14:14 ` Mark Wotton
  2001-11-08 15:24 ` Jerome Vouillon
  0 siblings, 2 replies; 3+ messages in thread
From: Andrew Lawson @ 2001-11-08 14:06 UTC (permalink / raw)
  To: caml-list

Hi all
     I have a list containing up to 100,000 strings
between 10 and 200 characters in length. I want to
produce a list of those that match a regular
expression. It seems that the obvious way is to
List.filter with a predicate returning true if the
string matches, however in my case this can take up to
15 seconds. Has anyone got any ideas for speeding this
up?

     thanks

           Andrew

=====
Andrew Lawson
andrew@absentis.com
 www.absentis.com

__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] Searching large lists
  2001-11-08 14:06 [Caml-list] Searching large lists Andrew Lawson
@ 2001-11-08 14:14 ` Mark Wotton
  2001-11-08 15:24 ` Jerome Vouillon
  1 sibling, 0 replies; 3+ messages in thread
From: Mark Wotton @ 2001-11-08 14:14 UTC (permalink / raw)
  Cc: caml-list

On Thu, 8 Nov 2001, Andrew Lawson wrote:

> Hi all
>      I have a list containing up to 100,000 strings
> between 10 and 200 characters in length. I want to
> produce a list of those that match a regular
> expression. It seems that the obvious way is to
> List.filter with a predicate returning true if the
> string matches, however in my case this can take up to
> 15 seconds. Has anyone got any ideas for speeding this
> up?
> 
>      thanks
> 
>            Andrew

This would probably require rewriting whatever you're using to do the
regexes, but if you use a trie to store all the strings, you could
maintain a list of nodes which matched at each stage of the regex. This
should be a fair bit faster...

mrak


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] Searching large lists
  2001-11-08 14:06 [Caml-list] Searching large lists Andrew Lawson
  2001-11-08 14:14 ` Mark Wotton
@ 2001-11-08 15:24 ` Jerome Vouillon
  1 sibling, 0 replies; 3+ messages in thread
From: Jerome Vouillon @ 2001-11-08 15:24 UTC (permalink / raw)
  To: andrew; +Cc: caml-list

On Thu, Nov 08, 2001 at 06:06:57AM -0800, Andrew Lawson wrote:
>      I have a list containing up to 100,000 strings
> between 10 and 200 characters in length. I want to
> produce a list of those that match a regular
> expression. It seems that the obvious way is to
> List.filter with a predicate returning true if the
> string matches, however in my case this can take up to
> 15 seconds. Has anyone got any ideas for speeding this
> up?

The Str library is really slow.

For Unison (http://www.cis.upenn.edu/~bcpierce/unison/), we wrote our
own regular expression library to get acceptable performances.

You should try PCRE (http://sourceforge.net/projects/pcre-ocaml/) or maybe RE
(http://sourceforge.net/projects/libre/).

If you compile to native code, the RE library should be the fastest in
your case (probably about 5 to 10 times faster than PCRE).  It is
still under development though, so some features are missing.

-- Jerome
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-11-08 15:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-08 14:06 [Caml-list] Searching large lists Andrew Lawson
2001-11-08 14:14 ` Mark Wotton
2001-11-08 15:24 ` Jerome Vouillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).