caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: eijiro_sumii@anet.ne.jp
To: caml-list@inria.fr
Cc: gerd@gerd-stolpmann.de, sumii@venus.is.s.u-tokyo.ac.jp
Subject: Re: substring match like "strstr"
Date: Sun, 10 Dec 2000 22:16:57 +0900	[thread overview]
Message-ID: <20001210221657W.sumii@yl.is.s.u-tokyo.ac.jp> (raw)
In-Reply-To: <00120814135508.00625@ice>

> You can write it yourself:
...
> If speed is important, I recommend 
> 
> let find_substring s1 s2 =
>   Str.search_forward (Str.regexp (Str.quote s2)) s1 0
> ;;

OK, I benchmarked the following 4 implementations for the purpose of
comparison.

  strstr
    stub to call "strstr" in libc

  regexp
    combination of "Str.regexp_string" and "Str.search_forward"

  OCamlA
    the simple implementation in OCaml given in the previous message,
    a little tuned for more efficiency

  OCamlB
    another straightforward implementation in OCaml, attached at the
    end of this message

The results (execution time in seconds) were as follows.

  strstr   55.74
  regexp  154.37
  OCamlA  302.57
  OCamlB  129.23

The application is a kind of library for data mining of gene
sequences.  It calls the substring matching function more than
400,000,000 times.  The machine is a Sun UltraEnterprise 450 with 4
UltraSPARC II (480 MHz) processors and 2GB main memory, but the
program itself doesn't make any use of multiple processors or much
space.

Although I haven't checked the reasons for these results in detail,
this might hopefully be of some information for other people, too.

And probably I should also consider implementing more sophisticated,
efficient algorithm in OCaml...

// Eijiro Sumii (http://www.yl.is.s.u-tokyo.ac.jp/~sumii/)
// 
// Ph.D. Student at Department of IS, University of Tokyo
// Visiting Scholar at Department of CIS, University of Pennsylvania

----------------------------------------------------------------------
let strstr pat str =
  let patlen = String.length pat in
  let strlen = String.length str in
  let rec is_prefix pos spos epos =
    if pos < 0 then true else
    if String.unsafe_get pat pos <>
      String.unsafe_get str epos then false else
    is_prefix (pos - 1) spos (epos - 1) in
  let rec search spos =
    let epos = spos + patlen - 1 in
    if epos >= strlen then raise Not_found else
    if is_prefix (patlen - 1) spos epos then spos else
    search (spos + 1) in
  search 0
----------------------------------------------------------------------



  reply	other threads:[~2000-12-11 17:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-12-08  6:04 eijiro_sumii
2000-12-08 12:57 ` Gerd Stolpmann
2000-12-10 13:16   ` eijiro_sumii [this message]
2000-12-10 15:39     ` Gerd Stolpmann
2000-12-11  3:57       ` eijiro_sumii
2000-12-12 13:58       ` Julian Assange
2000-12-11 21:07     ` Chris Hecker
2000-12-11 22:22       ` Gerd Stolpmann
2000-12-12  5:06         ` Chris Hecker
2000-12-12 12:28           ` Jean-Christophe Filliatre
2000-12-13 10:02             ` eijiro_sumii
2000-12-13 10:17               ` Eijiro Sumii
2000-12-13 10:53               ` Julian Assange
2000-12-13 13:28                 ` Eijiro Sumii
2000-12-12  3:28       ` eijiro_sumii
2000-12-13  1:12         ` John Prevost
2000-12-13  2:35           ` Chris Hecker
2000-12-12 10:07       ` Sven LUTHER
2000-12-14  3:36       ` eijiro_sumii
2000-12-14  6:48         ` Chris Hecker
2000-12-14  8:02           ` eijiro_sumii
2000-12-14 21:53             ` Stephan Tolksdorf
2000-12-14 21:12 Ruchira Datta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20001210221657W.sumii@yl.is.s.u-tokyo.ac.jp \
    --to=eijiro_sumii@anet.ne.jp \
    --cc=caml-list@inria.fr \
    --cc=gerd@gerd-stolpmann.de \
    --cc=sumii@venus.is.s.u-tokyo.ac.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).