caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* substring match like "strstr"
@ 2000-12-08  6:04 eijiro_sumii
  2000-12-08 12:57 ` Gerd Stolpmann
  0 siblings, 1 reply; 23+ messages in thread
From: eijiro_sumii @ 2000-12-08  6:04 UTC (permalink / raw)
  To: caml-list; +Cc: sumii

Hi all,

Is there a substring matching function (like "strstr" in libc) in the
standard OCaml library?  Of course there are many ways to implement it
(by writing it from scratch, using the Str library, interfacing
"strstr" in libc, etc.), but they are overkill for my purpose.  So I'm
wondering whether such a function already exists, but I couldn't find
it in the manual...

Eijiro



^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: substring match like "strstr"
@ 2000-12-14 21:12 Ruchira Datta
  0 siblings, 0 replies; 23+ messages in thread
From: Ruchira Datta @ 2000-12-14 21:12 UTC (permalink / raw)
  To: caml-list

John Prevost wrote:
>I just grabbed the glibc version, and found the following comment at
>the head of it:

>/*
> * My personal strstr() implementation that beats most other algorithms.
> * Until someone tells me otherwise, I assume that this is the
> * fastest implementation of strstr() in C.
> * I deliberately chose not to comment it.  You should have at least
> * as much fun trying to understand it, as I had to write it :-).
> *
> * Stephen R. van den Berg, berg@pool.informatik.rwth-aachen.de */

>Since the text of the function involves not only nasty gotos
>(particularly frightening is "goto shloop;" but also pointer tricks
>and register declarations, I'd say it's to be expected that it'll
>outperform O'Caml.  I am, however, seeing if I can figure out the
>essential algorithm used (if there's really anything smart to it) and
>will let you know if I come up with anything extra special.

The frightening tricks are shocking to us functional programmers :-) but
once you get over your shock and read through it, you won't have any
problem understanding it.  At each point, don't ask yourself ``why is
this statement written in this obfuscated way?''  Just assume as an
article of faith that this incantation is invoked to propitiate the
Almighty Assembler :-).  Instead ask yourself ``what does this statement
do?''  That is fairly simple...

There is absolutely nothing smart about the algorithm; it is exactly
the brute-force one.  For example, the first character of the needle
is b, and the second character is c.  Suppose we have matched b and
c and then find that we don't match the whole needle.  Then we back
up our haystack pointer all the way back to the character after the
last potential match, i.e., to the character that matched c, and start 
from the beginning again by checking if it matches b.  We do this every 
time regardless of whether b and c are different.  I guess this is to 
save one register variable.

The speed comes from the low-level assembly language optimization; C is
being used as portable assembler.  I am sure it took a lot of ingenuity
and very intimate knowledge of the gcc compiler to do such a low-level
optimization while remaining in portable C.  But it is not something for
OCaml users to emulate.

Ruchira Datta
datta@math.berkeley.edu



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2000-12-15 12:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-08  6:04 substring match like "strstr" eijiro_sumii
2000-12-08 12:57 ` Gerd Stolpmann
2000-12-10 13:16   ` eijiro_sumii
2000-12-10 15:39     ` Gerd Stolpmann
2000-12-11  3:57       ` eijiro_sumii
2000-12-12 13:58       ` Julian Assange
2000-12-11 21:07     ` Chris Hecker
2000-12-11 22:22       ` Gerd Stolpmann
2000-12-12  5:06         ` Chris Hecker
2000-12-12 12:28           ` Jean-Christophe Filliatre
2000-12-13 10:02             ` eijiro_sumii
2000-12-13 10:17               ` Eijiro Sumii
2000-12-13 10:53               ` Julian Assange
2000-12-13 13:28                 ` Eijiro Sumii
2000-12-12  3:28       ` eijiro_sumii
2000-12-13  1:12         ` John Prevost
2000-12-13  2:35           ` Chris Hecker
2000-12-12 10:07       ` Sven LUTHER
2000-12-14  3:36       ` eijiro_sumii
2000-12-14  6:48         ` Chris Hecker
2000-12-14  8:02           ` eijiro_sumii
2000-12-14 21:53             ` Stephan Tolksdorf
2000-12-14 21:12 Ruchira Datta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).