From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id WAA06319; Fri, 23 Apr 2004 22:20:27 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id WAA06385 for ; Fri, 23 Apr 2004 22:20:25 +0200 (MET DST) Received: from hirsch.in-berlin.de (hirsch.in-berlin.de [192.109.42.6]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i3NKKOYM003354 for ; Fri, 23 Apr 2004 22:20:25 +0200 X-Envelope-From: oliver@first.in-berlin.de X-Envelope-To: Received: from hirsch.in-berlin.de (localhost [127.0.0.1]) by hirsch.in-berlin.de (8.12.11/8.12.11/Debian-3) with ESMTP id i3NKK1Ig012403 for ; Fri, 23 Apr 2004 22:20:24 +0200 Received: from first.UUCP (uucp@localhost) by hirsch.in-berlin.de (8.12.11/8.12.11/Debian-3) with UUCP id i3NKF3MP012015 for inria.fr!caml-list; Fri, 23 Apr 2004 22:15:03 +0200 Received: by first.in-berlin.de via sendmail from stdin id (Debian Smail3.2.0.114) Fri, 23 Apr 2004 22:09:23 +0200 (CEST) From: oliver@first.in-berlin.de (Oliver Bandel) Date: Fri, 23 Apr 2004 22:09:23 +0200 To: caml-list@inria.fr Subject: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys] Message-ID: <20040423200923.GA271@first.in-berlin.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline User-Agent: Mutt/1.3.28i X-Miltered: at concorde by Joe's j-chkmail ("http://j-chkmail.ensmp.fr")! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; oliver:01 in-berlin:01 oliver:01 bandel:01 caml-list:01 hashtbl:01 caml-list:01 hashtbl:01 2004:99 hash:01 hash:01 stupid:01 stupid:01 hastbl:01 estimating:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk ----- Forwarded message from oliver ----- To: Xavier Leroy Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote: > > Perhaps I'm all wrong, but when I have to get rid of repetitions in a > > list, I first sort it in O(n log n), then remove the repetitions in O(n). > > Jean-Baptiste. > > OK, fair enough :-) The point I was trying to make (not very well, I > agree) is that "list without repetition" or "sorted list without > repetition" is often not the best data structure for the job. [...] NOT the datastructure FOR THE JOB BUT FOR THE RESULT! So, ok, because people now are posting their code, I send one of me. I had many different implementations, maybe one ore the other is more suited to do the task. The one that I invented some minutes ago looks like this: let keys_of_hash hash = let keys = Hashtbl.create 10000 in (* stupid hard coded value *) let sample k v = Hashtbl.replace keys k true in Hashtbl.iter sample hash; Hashtbl.fold (fun k v init -> k::init) keys [] Because there is no Hashtbl.size in the standard lib, which gives back the number of over-all bindings in the hash, I had to insert some stupid value. (I was to lazy to write it by myself. It also does not make sense to walk through the whole hash to get the Hashtbl.size and then to allocate the keys-.Hash and then To iterate and then to fold... ...there will be more efficient solutions than that... btw: I don't know how inefficient it would be to use Hashtbl.create 1 and then let grow the hash itself...) OK, it's possible to calculate Hashtbl.size with Hastbl.iter, but when Hashtbl.size would be part of the standard Hashtbl-lib, this would make sense. I hope we can agree on that Hashtbl.size would return an int?! ;-) Hashtbl.create needs an int for the initial size, and doing things like the above stuff, would be more consistently when there would be a Hashtbl.size inside the standard Hastbl-module. a) The return type is clear (isn't it?!) b) Estimating initial size of hashtables can be predicted in similar cases like the above one. But nevertheless I insist in Hashtbl.keys as a very useful function. You then can use Hastbl.find as well as Hashtbl.find_all for retrieval of all contents, depending on your needs. Why I think Hashtbl.keys makes more sense than Hashtbl.values? Because it reflects the sense of a Hashtable, to have direct access to the values VIA THE KEYS. And that also is the reason, why I think Hastbl.keys should necessarily be part of the lib. Also I think it will be (much) more efficient to do it inside the Hashtbl-lib instead of doing it with such outside-approaches. The direct way inside the lib seems to make more sense to me here. Ciao, Oliver ----- End forwarded message ----- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners