From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA04791 for caml-red; Thu, 14 Dec 2000 19:21:22 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id LAA19353 for ; Wed, 13 Dec 2000 11:02:19 +0100 (MET) Received: from dynabook ([133.11.12.224]) by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id eBDA2IL00112 for ; Wed, 13 Dec 2000 11:02:18 +0100 (MET) Received: from dynabook ([127.0.0.1] helo=localhost ident=sumii) by dynabook with esmtp (Exim 3.12 #1 (Debian)) id 1468jg-0000ru-00; Wed, 13 Dec 2000 19:02:08 +0900 To: caml-list@inria.fr From: eijiro_sumii@anet.ne.jp Cc: Jean-Christophe.Filliatre@lri.fr, proff@iq.org, jmp@arsdigita.com, sumii@venus.is.s.u-tokyo.ac.jp Subject: Re: substring match like "strstr" In-Reply-To: <14902.6640.301542.507040@pc803> References: <0012112341150J.00625@ice> <4.3.2.7.2.20001211202018.00b3f510@shell16.ba.best.com> <14902.6640.301542.507040@pc803> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20001213190207A.sumii@yl.is.s.u-tokyo.ac.jp> Date: Wed, 13 Dec 2000 19:02:07 +0900 X-Dispatcher: imput version 991025(IM133) Sender: weis@pauillac.inria.fr Hi everyone, Thank you very much for a lot of kind advice! Jean-Christophe Filliatre wrote: > I missed the beginning of the discussion, but implementing > Knuth-Morris-Pratt is quite easy. The code is given below (26 lines), Actually, I found the (almost) same code on the web and tried it. The results (again, execution time in seconds) were: SPARC Pentium strstr 52.68 57.050 kmp 111.52 143.490 The SPARC machine is the same as before. The Pentium machine is running Linux 2.2.17 inside VMWare 2.0.3 (with 48MB virtual main memory) on Windows 2000 with a Pentium II 400MHz processor and 128MB main memory. Since my program calls the function more than 3000 times for _each_ pattern, I did partial application, of course. Even so, kmp in OCaml still performed much worse than strstr in libc (both Sun's and GNU's), at least in this program. If you'd like to check it yourself, the source code is available at the same URL as before (http://www.yl.is.s.u-tokyo.ac.jp/~sumii/tmp/hc.tar.gz). Julian Assange wrote: > I'm interested in this library. Can you tell me a little more? What's > your ETA? It is explained a little at: http://www.hypothesiscreator.net/ It has been developed in C++, but we began to port it to OCaml for clarity and convenience (and fun:-). Julian Assange wrote: > If you are really calling strstr 400,000,000 times, I strongly suggest > the use of Boyer-Moore. Is your alphabet amino-acids or base-pairs? If Can Boyer-Moore (in OCaml) be _much_ faster than KMP, so that it beats strstr in libc? John Prevost wrote: > I just grabbed the glibc version, and found the following comment at > the head of it: That was exactly what I found too (and mentioned a bit in my previous message). Maybe I should also disassemble and check Sun's strstr...? Regards, Eijiro