From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id WAA05256; Wed, 17 Apr 2002 22:30:43 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id WAA05252 for ; Wed, 17 Apr 2002 22:30:42 +0200 (MET DST) Received: from fichte.ai.univie.ac.at (fichte.ai.univie.ac.at [131.130.174.156]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g3HKUfb28177 for ; Wed, 17 Apr 2002 22:30:41 +0200 (MET DST) Received: (from markus@localhost) by fichte.ai.univie.ac.at (8.9.3/8.9.3/Debian 8.9.3-21) id WAA03455; Wed, 17 Apr 2002 22:30:39 +0200 Date: Wed, 17 Apr 2002 22:30:39 +0200 From: Markus Mottl To: Christophe TROESTLER Cc: "O'Caml Mailing List" Subject: Re: [Caml-list] The invert Benchmark Message-ID: <20020417203039.GA3339@fichte.ai.univie.ac.at> Mail-Followup-To: Christophe TROESTLER , O'Caml Mailing List References: <20020417.202615.87280210.debian00@tiscalinet.be> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020417.202615.87280210.debian00@tiscalinet.be> User-Agent: Mutt/1.3.26i Organization: Austrian Research Institute for Artificial Intelligence Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Wed, 17 Apr 2002, Christophe TROESTLER wrote: > I found by chance the "The invert Benchmark" > (http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/). As you > will notice the Caml code (even compiled) performs poorly. Looking at the code this does not surprise me... > Anybody got better ideas to improve efficiency? Could a "keys" > function be an interesting addition to Hashtbl??? This should be about 10x faster than the initial version: --------------------------------------------------------------------------- let report_url url = print_string ("\t" ^ url) let report (file, url_lst) = let url_ar = Array.of_list url_lst in Array.sort compare url_ar; print_string file; Array.iter report_url url_ar; print_char '\n' let _ = let table = Hashtbl.create 10000 in let cnt_ref = ref 0 in try while true do let line = input_line stdin in let ix = String.index line '\t' in let url = String.sub line 0 ix in let file = String.sub line (ix + 1) (String.length line - ix - 1) in try let urls_ref = Hashtbl.find table file in urls_ref := url :: !urls_ref with Not_found -> incr cnt_ref; Hashtbl.add table file (ref [url]) done with | End_of_file -> let ar = Array.make !cnt_ref ("", []) in let coll file urls_ref cnt = ar.(cnt) <- file, !urls_ref; cnt - 1 in ignore (Hashtbl.fold coll table (!cnt_ref - 1)); Array.sort compare ar; Array.iter report ar | Not_found -> failwith "bad data" --------------------------------------------------------------------------- > Another related question that popped up is: how to _efficiently_ > implement a join operation (join : string -> string list -> string is > defined by: join c [s1;...;sn] = s1 ^ c ^ ... ^ c ^ sn) ? Look up the function "String.concat" in the standard library... Regards, Markus Mottl -- Markus Mottl markus@oefai.at Austrian Research Institute for Artificial Intelligence http://www.oefai.at/~markus ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners