caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Ruchira Datta <datta@math.berkeley.edu>
To: caml-list@inria.fr
Subject: Re: substring match like "strstr"
Date: Thu, 14 Dec 2000 13:12:04 -0800	[thread overview]
Message-ID: <20001214131204.A29664@blue1.berkeley.edu> (raw)

John Prevost wrote:
>I just grabbed the glibc version, and found the following comment at
>the head of it:

>/*
> * My personal strstr() implementation that beats most other algorithms.
> * Until someone tells me otherwise, I assume that this is the
> * fastest implementation of strstr() in C.
> * I deliberately chose not to comment it.  You should have at least
> * as much fun trying to understand it, as I had to write it :-).
> *
> * Stephen R. van den Berg, berg@pool.informatik.rwth-aachen.de */

>Since the text of the function involves not only nasty gotos
>(particularly frightening is "goto shloop;" but also pointer tricks
>and register declarations, I'd say it's to be expected that it'll
>outperform O'Caml.  I am, however, seeing if I can figure out the
>essential algorithm used (if there's really anything smart to it) and
>will let you know if I come up with anything extra special.

The frightening tricks are shocking to us functional programmers :-) but
once you get over your shock and read through it, you won't have any
problem understanding it.  At each point, don't ask yourself ``why is
this statement written in this obfuscated way?''  Just assume as an
article of faith that this incantation is invoked to propitiate the
Almighty Assembler :-).  Instead ask yourself ``what does this statement
do?''  That is fairly simple...

There is absolutely nothing smart about the algorithm; it is exactly
the brute-force one.  For example, the first character of the needle
is b, and the second character is c.  Suppose we have matched b and
c and then find that we don't match the whole needle.  Then we back
up our haystack pointer all the way back to the character after the
last potential match, i.e., to the character that matched c, and start 
from the beginning again by checking if it matches b.  We do this every 
time regardless of whether b and c are different.  I guess this is to 
save one register variable.

The speed comes from the low-level assembly language optimization; C is
being used as portable assembler.  I am sure it took a lot of ingenuity
and very intimate knowledge of the gcc compiler to do such a low-level
optimization while remaining in portable C.  But it is not something for
OCaml users to emulate.

Ruchira Datta
datta@math.berkeley.edu



             reply	other threads:[~2000-12-15 12:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-12-14 21:12 Ruchira Datta [this message]
  -- strict thread matches above, loose matches on Subject: below --
2000-12-08  6:04 eijiro_sumii
2000-12-08 12:57 ` Gerd Stolpmann
2000-12-10 13:16   ` eijiro_sumii
2000-12-10 15:39     ` Gerd Stolpmann
2000-12-11  3:57       ` eijiro_sumii
2000-12-12 13:58       ` Julian Assange
2000-12-11 21:07     ` Chris Hecker
2000-12-11 22:22       ` Gerd Stolpmann
2000-12-12  5:06         ` Chris Hecker
2000-12-12 12:28           ` Jean-Christophe Filliatre
2000-12-13 10:02             ` eijiro_sumii
2000-12-13 10:17               ` Eijiro Sumii
2000-12-13 10:53               ` Julian Assange
2000-12-13 13:28                 ` Eijiro Sumii
2000-12-12  3:28       ` eijiro_sumii
2000-12-13  1:12         ` John Prevost
2000-12-13  2:35           ` Chris Hecker
2000-12-12 10:07       ` Sven LUTHER
2000-12-14  3:36       ` eijiro_sumii
2000-12-14  6:48         ` Chris Hecker
2000-12-14  8:02           ` eijiro_sumii
2000-12-14 21:53             ` Stephan Tolksdorf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20001214131204.A29664@blue1.berkeley.edu \
    --to=datta@math.berkeley.edu \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).