From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id MAA18727; Thu, 18 Apr 2002 12:56:07 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id MAA18723 for ; Thu, 18 Apr 2002 12:55:57 +0200 (MET DST) Received: from lri.lri.fr (lri.lri.fr [129.175.15.1]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g3IArtb22223 for ; Thu, 18 Apr 2002 12:53:55 +0200 (MET DST) Received: from pc803.lri.fr (mail@pc803 [129.175.8.114]) by lri.lri.fr (8.11.6/jtpda-5.3.2) with ESMTP id g3IArre21166 for ; Thu, 18 Apr 2002 12:53:53 +0200 (MEST) Received: from filliatr by pc803.lri.fr with local (Exim 3.33 #1 (Debian)) id 16y9Y1-0002XY-00 for ; Thu, 18 Apr 2002 12:53:53 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <15550.36266.927283.406730@pc803.lri.fr> In-Reply-To: <20020417.202615.87280210.debian00@tiscalinet.be> References: <20020417.202615.87280210.debian00@tiscalinet.be> X-Mailer: VM 6.49 under Emacs 20.7.2 Reply-To: Jean-Christophe.Filliatre@lri.fr (Jean-Christophe Filliatre) From: Jean-Christophe Filliatre To: Christophe TROESTLER Subject: Re: [Caml-list] The invert Benchmark Date: Thu, 18 Apr 2002 11:11:06 +0200 (MEST) Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Christophe TROESTLER writes: > Dear Caml riders, > > I found by chance the "The invert Benchmark" > (http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/). As you > will notice the Caml code (even compiled) performs poorly. I guess > part of the problem is due to using Map when Hashtbl is more suited. > So I tried to rewrite the code using Hashtbl (attached to this mail). > What I got some trouble to figure out is how to get a list of the keys > where each of the keys appears only once. I eventually went the easy > way. Anybody got better ideas to improve efficiency? Could a "keys" > function be an interesting addition to Hashtbl??? As suggested by Xavier, String.concat substituted for the join function really helps. A simple profiling with gprof also showed that both in your code and in the original one, a lot of time was spent building regexps! Factorizing out Str.{regexp,regexp_string} "\t" (i.e. doing it only once in a global variable) really improves performances. Anyway, using Str.split to cut a string in two is a bit costly; doing it directly as Markus did also improves performances. -- Jean-Christophe Filliātre (http://www.lri.fr/~filliatr) ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners