From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA24143 for caml-red; Tue, 12 Dec 2000 10:29:00 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id EAA18078 for ; Tue, 12 Dec 2000 04:28:45 +0100 (MET) Received: from dynabook (h12-001.tokyu-net.catv.ne.jp [202.221.12.1]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id eBC3ShD10645 for ; Tue, 12 Dec 2000 04:28:44 +0100 (MET) Received: from dynabook ([127.0.0.1] helo=localhost ident=sumii) by dynabook with esmtp (Exim 3.12 #1 (Debian)) id 145g6p-0004xi-00; Tue, 12 Dec 2000 12:28:07 +0900 To: checker@d6.com From: eijiro_sumii@anet.ne.jp Cc: caml-list@inria.fr, gerd@gerd-stolpmann.de, sumii@venus.is.s.u-tokyo.ac.jp Subject: Re: substring match like "strstr" In-Reply-To: <4.3.2.7.2.20001211103237.00c12100@shell16.ba.best.com> References: <00120814135508.00625@ice> <20001210221657W.sumii@yl.is.s.u-tokyo.ac.jp> <4.3.2.7.2.20001211103237.00c12100@shell16.ba.best.com> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20001212122807K.sumii@yl.is.s.u-tokyo.ac.jp> Date: Tue, 12 Dec 2000 12:28:07 +0900 X-Dispatcher: imput version 991025(IM133) Sender: weis@pauillac.inria.fr > Any ideas why strstr blows the others away? What's the libc strstr > look like? I have no idea, unfortunately... > I just looked in the MSVC source and it's a braindead while loop > (copied below), so it's not like it's doing a fancy Boyer-Moore or > anything. I don't know anything about strstr in Sun's strstr, but I checked strstr in GNU libc. It is a quite complicated program, but look like a brute-force algorithm (that is, no Knuth-Morris-Pratt or anything like that). > This is exactly the kind of problem on which I'd expect caml to come > within 10% of c. That was what I expected, too. > I'd say I'd do the tests myself, but I don't have a bunch of gene > sequences laying around. :) I've put the current version of my application program at: http://www.yl.is.s.u-tokyo.ac.jp/~sumii/tmp/hc.tar.gz If you like, you're welcome to check it yourself, of course.:) You can run it as "make ; time ./hc". The output should look like: score = 0.348672 score = 0.391356 (snip) score = 0.630415 The OCaml function "strstr" is in the file "strstr.ml". > Okay, I'm curious, so I'll port the code to caml and include it > below as well (as practice for myself). Can you try it in your test > harness? Sure, but please give me a little time. This is a kind of part-time, weekend job for me, and I can't tell when I have time to do it next... Eijiro