From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA25607 for caml-red; Thu, 14 Dec 2000 19:11:45 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA21628 for ; Tue, 12 Dec 2000 14:58:58 +0100 (MET) Received: from suburbia.net (suburbia.net [203.4.184.1]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id eBCDwtf06337 for ; Tue, 12 Dec 2000 14:58:56 +0100 (MET) Received: by suburbia.net (Postfix, from userid 110) id 5AFAB6C521; Wed, 13 Dec 2000 00:58:52 +1100 (EST) To: gerd@gerd-stolpmann.de Cc: eijiro_sumii@anet.ne.jp, caml-list@inria.fr, sumii@venus.is.s.u-tokyo.ac.jp Subject: Re: substring match like "strstr" References: <20001208150443I.sumii@yl.is.s.u-tokyo.ac.jp> <00120814135508.00625@ice> <20001210221657W.sumii@yl.is.s.u-tokyo.ac.jp> <0012101646090G.00625@ice> Cc: proff@iq.org From: Julian Assange Date: 13 Dec 2000 00:58:52 +1100 In-Reply-To: Gerd Stolpmann's message of "Sun, 10 Dec 2000 16:39:12 +0100" Message-ID: User-Agent: Gnus/5.0802 (Gnus v5.8.2) XEmacs/21.1 (Big Bend) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: weis@pauillac.inria.fr Gerd Stolpmann writes: > >The results (execution time in seconds) were as follows. > > > > strstr 55.74 > > regexp 154.37 > > OCamlA 302.57 > > OCamlB 129.23 > > I did not expect that OCamlB performs so well; so I suggested OcamlA and > regexp (which are both fast for the bytecode interpreter, too). I think it > depends also on the problem size (especially on the length of the substring). > > Gerd If you are really calling strstr 400,000,000 times, I strongly suggest the use of Boyer-Moore. Is your alphabet amino-acids or base-pairs? If the latter, I have developed parallel hashing version of Boyer-More (lets call it Assange-Boyer-More :), which beats the pants off Knuth-Moris-Pratt. 500 long substrings searches are only about 50% slower than 1, although for this application, if you have the memory, you might also want to look at suffix trees or suffix arrays. -- Julian Assange |If you want to build a ship, don't drum up people |together to collect wood or assign them tasks proff@iq.org |and work, but rather teach them to long for the endless proff@gnu.ai.mit.edu |immensity of the sea. -- Antoine de Saint Exupery