From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA20345; Fri, 23 Apr 2004 18:01:43 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA20335; Fri, 23 Apr 2004 18:01:42 +0200 (MET DST)
Received: from herd.plethora.net (herd.plethora.net [205.166.146.1])
	by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i3NG1dYM031390;
	Fri, 23 Apr 2004 18:01:41 +0200
Received: from bhurt.plethora.net (bhurt.plethora.net [205.166.146.49])
	by herd.plethora.net (8.11.6/8.10.1) with ESMTP id i3NG1TR14447;
	Fri, 23 Apr 2004 11:01:30 -0500 (CDT)
Date: Fri, 23 Apr 2004 11:06:06 -0500 (CDT)
From: Brian Hurt <bhurt@spnz.org>
X-X-Sender: bhurt@localhost.localdomain
To: Xavier Leroy <xavier.leroy@inria.fr>
cc: Oliver Bandel <oliver@first.in-berlin.de>, <caml-list@inria.fr>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
In-Reply-To: <20040423145141.B3686@pauillac.inria.fr>
Message-ID: <Pine.LNX.4.44.0404231054160.9460-100000@localhost.localdomain>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Miltered: at concorde by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")!
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; caml-list:01 hashtbl:01 hash:01 quadratic:01 hash:01 accum:01 accum:01 uniqueness:01 hashtables:01 rec:01 wrote:03 algorithm:03 bunch:03 data:03 data:03 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

On Fri, 23 Apr 2004, Xavier Leroy wrote:

> > I think a good addition to the Hashtbl-module
> > would be a function, that gives back a list of keys
> > that are in the hash.
> 
> With your specification (no repetitions in the list), that function
> would run in quadratic time, which is a sure sign that lists aren't
> the right data structure here.  (More generally speaking, "lists
> without repetitions" is almost always the wrong data structure.)

No, I think creating such a list would take O(n log n) time.

OK, we're starting with a hash table.  That means we have a set of 
buckets, each bucket is a set of key/data pairs.  Assume the same key can 
be inserted multiple times (can it?)- in this case, all duplicate keys 
should be in the same bucket.  So, for each bucket, I sort all entries in 
the bucket by key (worst case I only have one bucket and sorting is O(n 
log n)).  Once sorted, I go throught and eliminate duplicates, which is 
now an O(n) algorithm:

let uniq lst =
   let rec loop accum = function
       | [] -> List.rev accum
       | x :: [] -> List.rev (x :: accum)
       | x :: y :: t ->
           if (x = y) then
               loop accum (x :: t)
           else
               loop (x :: accum) (y :: t)
    in
    loop [] lst
;;

(You can do it more efficiently that this, but this gets the idea across)
Viola- uniqueness in subquadratic time.  And in practice, approaching 
linear time- hashtables with lots of elements in a single bucket are 
computationally expensive, so you're likely to be sorting a whole bunch of 
short (1 and 2 element) lists.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners