caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Julian Assange <proff@iq.org>
To: gerd@gerd-stolpmann.de
Cc: eijiro_sumii@anet.ne.jp, caml-list@inria.fr,
	sumii@venus.is.s.u-tokyo.ac.jp
Cc: proff@iq.org
Subject: Re: substring match like "strstr"
Date: 13 Dec 2000 00:58:52 +1100	[thread overview]
Message-ID: <wxd7exivvn.fsf@suburbia.net> (raw)
In-Reply-To: Gerd Stolpmann's message of "Sun, 10 Dec 2000 16:39:12 +0100"

Gerd Stolpmann <gerd@gerd-stolpmann.de> writes:

> >The results (execution time in seconds) were as follows.
> >
> >  strstr   55.74
> >  regexp  154.37
> >  OCamlA  302.57
> >  OCamlB  129.23
> 
> I did not expect that OCamlB performs so well; so I suggested OcamlA and
> regexp (which are both fast for the bytecode interpreter, too). I think it
> depends also on the problem size (especially on the length of the substring).
> 
> Gerd

If you are really calling strstr 400,000,000 times, I strongly suggest
the use of Boyer-Moore. Is your alphabet amino-acids or base-pairs? If
the latter, I have developed parallel hashing version of Boyer-More
(lets call it Assange-Boyer-More :), which beats the pants off
Knuth-Moris-Pratt. 500 long substrings searches are only about 50%
slower than 1, although for this application, if you have the memory,
you might also want to look at suffix trees or suffix arrays.

--
 Julian Assange        |If you want to build a ship, don't drum up people
                       |together to collect wood or assign them tasks
 proff@iq.org          |and work, but rather teach them to long for the endless
 proff@gnu.ai.mit.edu  |immensity of the sea. -- Antoine de Saint Exupery



  parent reply	other threads:[~2000-12-14 18:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-12-08  6:04 eijiro_sumii
2000-12-08 12:57 ` Gerd Stolpmann
2000-12-10 13:16   ` eijiro_sumii
2000-12-10 15:39     ` Gerd Stolpmann
2000-12-11  3:57       ` eijiro_sumii
2000-12-12 13:58       ` Julian Assange [this message]
2000-12-11 21:07     ` Chris Hecker
2000-12-11 22:22       ` Gerd Stolpmann
2000-12-12  5:06         ` Chris Hecker
2000-12-12 12:28           ` Jean-Christophe Filliatre
2000-12-13 10:02             ` eijiro_sumii
2000-12-13 10:17               ` Eijiro Sumii
2000-12-13 10:53               ` Julian Assange
2000-12-13 13:28                 ` Eijiro Sumii
2000-12-12  3:28       ` eijiro_sumii
2000-12-13  1:12         ` John Prevost
2000-12-13  2:35           ` Chris Hecker
2000-12-12 10:07       ` Sven LUTHER
2000-12-14  3:36       ` eijiro_sumii
2000-12-14  6:48         ` Chris Hecker
2000-12-14  8:02           ` eijiro_sumii
2000-12-14 21:53             ` Stephan Tolksdorf
2000-12-14 21:12 Ruchira Datta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=wxd7exivvn.fsf@suburbia.net \
    --to=proff@iq.org \
    --cc=caml-list@inria.fr \
    --cc=eijiro_sumii@anet.ne.jp \
    --cc=gerd@gerd-stolpmann.de \
    --cc=sumii@venus.is.s.u-tokyo.ac.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).