From: Julian Assange <proff@iq.org>
To: gerd@gerd-stolpmann.de
Cc: eijiro_sumii@anet.ne.jp, caml-list@inria.fr,
sumii@venus.is.s.u-tokyo.ac.jp
Cc: proff@iq.org
Subject: Re: substring match like "strstr"
Date: 13 Dec 2000 00:58:52 +1100 [thread overview]
Message-ID: <wxd7exivvn.fsf@suburbia.net> (raw)
In-Reply-To: Gerd Stolpmann's message of "Sun, 10 Dec 2000 16:39:12 +0100"
Gerd Stolpmann <gerd@gerd-stolpmann.de> writes:
> >The results (execution time in seconds) were as follows.
> >
> > strstr 55.74
> > regexp 154.37
> > OCamlA 302.57
> > OCamlB 129.23
>
> I did not expect that OCamlB performs so well; so I suggested OcamlA and
> regexp (which are both fast for the bytecode interpreter, too). I think it
> depends also on the problem size (especially on the length of the substring).
>
> Gerd
If you are really calling strstr 400,000,000 times, I strongly suggest
the use of Boyer-Moore. Is your alphabet amino-acids or base-pairs? If
the latter, I have developed parallel hashing version of Boyer-More
(lets call it Assange-Boyer-More :), which beats the pants off
Knuth-Moris-Pratt. 500 long substrings searches are only about 50%
slower than 1, although for this application, if you have the memory,
you might also want to look at suffix trees or suffix arrays.
--
Julian Assange |If you want to build a ship, don't drum up people
|together to collect wood or assign them tasks
proff@iq.org |and work, but rather teach them to long for the endless
proff@gnu.ai.mit.edu |immensity of the sea. -- Antoine de Saint Exupery
next prev parent reply other threads:[~2000-12-14 18:11 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-12-08 6:04 eijiro_sumii
2000-12-08 12:57 ` Gerd Stolpmann
2000-12-10 13:16 ` eijiro_sumii
2000-12-10 15:39 ` Gerd Stolpmann
2000-12-11 3:57 ` eijiro_sumii
2000-12-12 13:58 ` Julian Assange [this message]
2000-12-11 21:07 ` Chris Hecker
2000-12-11 22:22 ` Gerd Stolpmann
2000-12-12 5:06 ` Chris Hecker
2000-12-12 12:28 ` Jean-Christophe Filliatre
2000-12-13 10:02 ` eijiro_sumii
2000-12-13 10:17 ` Eijiro Sumii
2000-12-13 10:53 ` Julian Assange
2000-12-13 13:28 ` Eijiro Sumii
2000-12-12 3:28 ` eijiro_sumii
2000-12-13 1:12 ` John Prevost
2000-12-13 2:35 ` Chris Hecker
2000-12-12 10:07 ` Sven LUTHER
2000-12-14 3:36 ` eijiro_sumii
2000-12-14 6:48 ` Chris Hecker
2000-12-14 8:02 ` eijiro_sumii
2000-12-14 21:53 ` Stephan Tolksdorf
2000-12-14 21:12 Ruchira Datta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=wxd7exivvn.fsf@suburbia.net \
--to=proff@iq.org \
--cc=caml-list@inria.fr \
--cc=eijiro_sumii@anet.ne.jp \
--cc=gerd@gerd-stolpmann.de \
--cc=sumii@venus.is.s.u-tokyo.ac.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).