From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA02887; Wed, 17 Apr 2002 20:25:02 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA02883 for ; Wed, 17 Apr 2002 20:25:01 +0200 (MET DST) Received: from abel (nat.umh.ac.be [193.190.193.1]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id g3HIP0X16748 for ; Wed, 17 Apr 2002 20:25:01 +0200 (MET DST) Received: from abel ([127.0.0.1] helo=localhost ident=trch) by abel with esmtp (Exim 3.35 #1 (Debian)) id 16xu8F-0001GL-00; Wed, 17 Apr 2002 20:26:15 +0200 Date: Wed, 17 Apr 2002 20:26:15 +0200 (CEST) Message-Id: <20020417.202615.87280210.debian00@tiscalinet.be> To: "O'Caml Mailing List" Subject: [Caml-list] The invert Benchmark From: Christophe TROESTLER X-Spook: basement lock picking Aladdin anarchy USCODE David John Oates nitrate COSCO number key UFO X-Mailer-URL: http://www.mew.org/ X-Operating-System: GNU/Linux (http://www.linux.org/) X-Blessing: Om Ah Hum Vajra Guru Pema Siddhi Hum X-Mailer: Mew version 2.2 on Emacs 21.2 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Wed_Apr_17_20:26:15_2002_871)--" Content-Transfer-Encoding: 7bit Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk ----Next_Part(Wed_Apr_17_20:26:15_2002_871)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Dear Caml riders, I found by chance the "The invert Benchmark" (http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/). As you will notice the Caml code (even compiled) performs poorly. I guess part of the problem is due to using Map when Hashtbl is more suited. So I tried to rewrite the code using Hashtbl (attached to this mail). What I got some trouble to figure out is how to get a list of the keys where each of the keys appears only once. I eventually went the easy way. Anybody got better ideas to improve efficiency? Could a "keys" function be an interesting addition to Hashtbl??? Another related question that popped up is: how to _efficiently_ implement a join operation (join : string -> string list -> string is defined by: join c [s1;...;sn] = s1 ^ c ^ ... ^ c ^ sn) ? Cheers, ChriS ----Next_Part(Wed_Apr_17_20:26:15_2002_871)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Description: Code for "The invert Benchmark" Content-Disposition: inline; filename="invert2.ml" open Printf let process_input init_size = let table = Hashtbl.create init_size and split line = match Str.split (Str.regexp_string "\t") line with | [a; b] -> (b, a) | _ -> failwith "Bad file format" in let rec loop() = let line = read_line() in let b, a = split line in (Hashtbl.add table b a; loop()) in try loop() with | End_of_file -> table let get_keys t = let module S = Set.Make (struct type t = string let compare = compare end) in S.elements(Hashtbl.fold (fun k d l -> S.add k l) t S.empty) let print table = let keys = List.sort compare (get_keys table) in List.iter (fun b -> printf "%s" b; let l = List.sort compare (Hashtbl.find_all table b) in List.iter (fun s -> printf "\t%s" s) l; print_newline() ) keys let () = print(process_input 2000) ----Next_Part(Wed_Apr_17_20:26:15_2002_871)---- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners