From: eijiro_sumii@anet.ne.jp
To: caml-list@inria.fr
Cc: gerd@gerd-stolpmann.de, sumii@venus.is.s.u-tokyo.ac.jp
Subject: Re: substring match like "strstr"
Date: Sun, 10 Dec 2000 22:16:57 +0900 [thread overview]
Message-ID: <20001210221657W.sumii@yl.is.s.u-tokyo.ac.jp> (raw)
In-Reply-To: <00120814135508.00625@ice>
> You can write it yourself:
...
> If speed is important, I recommend
>
> let find_substring s1 s2 =
> Str.search_forward (Str.regexp (Str.quote s2)) s1 0
> ;;
OK, I benchmarked the following 4 implementations for the purpose of
comparison.
strstr
stub to call "strstr" in libc
regexp
combination of "Str.regexp_string" and "Str.search_forward"
OCamlA
the simple implementation in OCaml given in the previous message,
a little tuned for more efficiency
OCamlB
another straightforward implementation in OCaml, attached at the
end of this message
The results (execution time in seconds) were as follows.
strstr 55.74
regexp 154.37
OCamlA 302.57
OCamlB 129.23
The application is a kind of library for data mining of gene
sequences. It calls the substring matching function more than
400,000,000 times. The machine is a Sun UltraEnterprise 450 with 4
UltraSPARC II (480 MHz) processors and 2GB main memory, but the
program itself doesn't make any use of multiple processors or much
space.
Although I haven't checked the reasons for these results in detail,
this might hopefully be of some information for other people, too.
And probably I should also consider implementing more sophisticated,
efficient algorithm in OCaml...
// Eijiro Sumii (http://www.yl.is.s.u-tokyo.ac.jp/~sumii/)
//
// Ph.D. Student at Department of IS, University of Tokyo
// Visiting Scholar at Department of CIS, University of Pennsylvania
----------------------------------------------------------------------
let strstr pat str =
let patlen = String.length pat in
let strlen = String.length str in
let rec is_prefix pos spos epos =
if pos < 0 then true else
if String.unsafe_get pat pos <>
String.unsafe_get str epos then false else
is_prefix (pos - 1) spos (epos - 1) in
let rec search spos =
let epos = spos + patlen - 1 in
if epos >= strlen then raise Not_found else
if is_prefix (patlen - 1) spos epos then spos else
search (spos + 1) in
search 0
----------------------------------------------------------------------
next prev parent reply other threads:[~2000-12-11 17:36 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-12-08 6:04 eijiro_sumii
2000-12-08 12:57 ` Gerd Stolpmann
2000-12-10 13:16 ` eijiro_sumii [this message]
2000-12-10 15:39 ` Gerd Stolpmann
2000-12-11 3:57 ` eijiro_sumii
2000-12-12 13:58 ` Julian Assange
2000-12-11 21:07 ` Chris Hecker
2000-12-11 22:22 ` Gerd Stolpmann
2000-12-12 5:06 ` Chris Hecker
2000-12-12 12:28 ` Jean-Christophe Filliatre
2000-12-13 10:02 ` eijiro_sumii
2000-12-13 10:17 ` Eijiro Sumii
2000-12-13 10:53 ` Julian Assange
2000-12-13 13:28 ` Eijiro Sumii
2000-12-12 3:28 ` eijiro_sumii
2000-12-13 1:12 ` John Prevost
2000-12-13 2:35 ` Chris Hecker
2000-12-12 10:07 ` Sven LUTHER
2000-12-14 3:36 ` eijiro_sumii
2000-12-14 6:48 ` Chris Hecker
2000-12-14 8:02 ` eijiro_sumii
2000-12-14 21:53 ` Stephan Tolksdorf
2000-12-14 21:12 Ruchira Datta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20001210221657W.sumii@yl.is.s.u-tokyo.ac.jp \
--to=eijiro_sumii@anet.ne.jp \
--cc=caml-list@inria.fr \
--cc=gerd@gerd-stolpmann.de \
--cc=sumii@venus.is.s.u-tokyo.ac.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).