From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA20345; Fri, 23 Apr 2004 18:01:43 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA20335; Fri, 23 Apr 2004 18:01:42 +0200 (MET DST) Received: from herd.plethora.net (herd.plethora.net [205.166.146.1]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i3NG1dYM031390; Fri, 23 Apr 2004 18:01:41 +0200 Received: from bhurt.plethora.net (bhurt.plethora.net [205.166.146.49]) by herd.plethora.net (8.11.6/8.10.1) with ESMTP id i3NG1TR14447; Fri, 23 Apr 2004 11:01:30 -0500 (CDT) Date: Fri, 23 Apr 2004 11:06:06 -0500 (CDT) From: Brian Hurt X-X-Sender: bhurt@localhost.localdomain To: Xavier Leroy cc: Oliver Bandel , Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys In-Reply-To: <20040423145141.B3686@pauillac.inria.fr> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Miltered: at concorde by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 hashtbl:01 hash:01 quadratic:01 hash:01 accum:01 accum:01 uniqueness:01 hashtables:01 rec:01 wrote:03 algorithm:03 bunch:03 data:03 data:03 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Fri, 23 Apr 2004, Xavier Leroy wrote: > > I think a good addition to the Hashtbl-module > > would be a function, that gives back a list of keys > > that are in the hash. > > With your specification (no repetitions in the list), that function > would run in quadratic time, which is a sure sign that lists aren't > the right data structure here. (More generally speaking, "lists > without repetitions" is almost always the wrong data structure.) No, I think creating such a list would take O(n log n) time. OK, we're starting with a hash table. That means we have a set of buckets, each bucket is a set of key/data pairs. Assume the same key can be inserted multiple times (can it?)- in this case, all duplicate keys should be in the same bucket. So, for each bucket, I sort all entries in the bucket by key (worst case I only have one bucket and sorting is O(n log n)). Once sorted, I go throught and eliminate duplicates, which is now an O(n) algorithm: let uniq lst = let rec loop accum = function | [] -> List.rev accum | x :: [] -> List.rev (x :: accum) | x :: y :: t -> if (x = y) then loop accum (x :: t) else loop (x :: accum) (y :: t) in loop [] lst ;; (You can do it more efficiently that this, but this gets the idea across) Viola- uniqueness in subquadratic time. And in practice, approaching linear time- hashtables with lots of elements in a single bucket are computationally expensive, so you're likely to be sorting a whole bunch of short (1 and 2 element) lists. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners