caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:09 Oliver Bandel
  2004-04-24  3:00 ` skaller
  0 siblings, 1 reply; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:09 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: Xavier Leroy <xavier.leroy@inria.fr>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote:
> > Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
> > list, I first sort it in O(n log n), then remove the repetitions in O(n).
> > Jean-Baptiste.
> 
> OK, fair enough :-)  The point I was trying to make (not very well, I
> agree) is that "list without repetition" or "sorted list without
> repetition" is often not the best data structure for the job.
[...]

NOT the datastructure FOR THE JOB BUT FOR THE RESULT!


So, ok, because people now are posting their code, I send
one of me. I had many different implementations, maybe
one ore the other is more suited to do the task.
The one that I invented some minutes ago looks like this:



let keys_of_hash hash =
     let keys = Hashtbl.create 10000 in (* stupid hard coded value *)
     let sample k v = Hashtbl.replace keys k true in
     Hashtbl.iter sample hash;
     Hashtbl.fold (fun k v init -> k::init) keys []


Because there is no Hashtbl.size in the standard lib,
which gives back the number of over-all bindings in
the hash, I had to insert some stupid value.
(I was to lazy to write it by myself. It also
 does not make sense to walk through the whole
 hash to get the Hashtbl.size and then to allocate
 the keys-.Hash and then To iterate and then to fold...
 ...there will be more efficient solutions than that...
 btw: I don't know how inefficient it would be to use
 Hashtbl.create 1  and then let grow the hash itself...)


OK, it's possible to calculate Hashtbl.size with
Hastbl.iter, but when Hashtbl.size would be part
of the standard Hashtbl-lib, this would make sense.

I hope we can agree on that Hashtbl.size would return
an int?! ;-)

Hashtbl.create needs an int for the initial size, and
doing things like the above stuff, would be more
consistently when there would be a Hashtbl.size
inside the standard Hastbl-module.
a) The return type is clear (isn't it?!)
b) Estimating initial size of hashtables can be predicted in
   similar cases like the above one.

But nevertheless I insist in Hashtbl.keys as a very useful
function.
You then can use Hastbl.find as well as Hashtbl.find_all
for retrieval of all contents, depending on your needs.

Why I think Hashtbl.keys makes more sense than Hashtbl.values?
Because it reflects the sense of a Hashtable, to have direct
access to the values VIA THE KEYS.
And that also is the reason, why I think Hastbl.keys should
necessarily be part of the lib.

Also I think it will be (much) more efficient to do it inside the
Hashtbl-lib instead of doing it with such outside-approaches.
The direct way inside the lib seems to make more sense to me here.


Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread
* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:09 Oliver Bandel
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:09 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: Richard Jones <rich@annexia.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 04:03:22PM +0100, Richard Jones wrote:
> Maybe there's no good behaviour, but I keep writing a 'keys' function
> in almost every program I've written which uses Hashtbl, which
> certainly suggests that something is missing.

FULL ACK

Hashtables are an ideal tool for many applications,
where retrieving BY KEYs is necessary.
I mean it seems like they were imvented to do this.
So, why should a person write a keys-function by him/herself,
if this is the MAIN work for what Hashtables are useful?

If it only makes sense to collect some data, no Hashtbl
would be necessary and instead a list could be used.

But Hashtables are there to do jobs, where we have
keys and get values.

Hashtbl.keys gives a complete access to all valid
Hastbl.find/Hashtbl.find_all retrievals.
So, when there is a Hashtbl.keys not trying stochastical
"let's look if there is something inside" must be done,
but certain retrieves for data.


Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread
* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:09 Oliver Bandel
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:09 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: John Goerzen <jgoerzen@complete.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 01:29:35PM -0500, John Goerzen wrote:
> On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote:
> > > Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
> > > list, I first sort it in O(n log n), then remove the repetitions in O(n).
> > > Jean-Baptiste.
> > 
> > OK, fair enough :-)  The point I was trying to make (not very well, I
> > agree) is that "list without repetition" or "sorted list without
> > repetition" is often not the best data structure for the job.
> 
> And a keys function need not necessarily avoid repetition anyway.
> 
[...]

It ONLY makes sense to have a keys function that avoids repitition!


Anything else can be done with List.iter and List.fold
from the outside of the Hashtbl-module implementation!

But explicitly ONLY a keys-function that reports keys only ONCE
is a valid candidate for going into the standard lib!

Ciao,
   Oliver

P.S.: If you want to have the keys reported as often as there are
      bindings to them, you can use
         List.length (Hashtbl.find_all ht key)
      to get the number of bindings for the key.

      ( And then you can print several times that key_n is
        used inside the hash. IMHO this is called redundance. :) )

      This is above not much code and does not looks like a performance
      problem, when doing outside the Hash-implementation.

      But to retrieve a non-repitional keys-list, it is much more
      work to do and it also seems to be much more efficient implemented
      when done inside the Hashtbl-lib's code.
      
      What keys to use in the above code-snippet?

      Well, you could do it like this:

      Get the keys with Hashtbl.keys and for each of
      the keys use the above code snippet to get the
      number of entries and report this ONCE. :)
      (or many times, if you really want to ;-))


----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread
* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:10 Oliver Bandel
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:10 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: John Goerzen <jgoerzen@complete.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 01:29:06PM -0500, John Goerzen wrote:
> On Fri, Apr 23, 2004 at 02:51:41PM +0200, Xavier Leroy wrote:
> > With your specification (no repetitions in the list), that function
> > would run in quadratic time, which is a sure sign that lists aren't
> > the right data structure here.  (More generally speaking, "lists
> > without repetitions" is almost always the wrong data structure.)
> 
> But perhaps you don't really care how fast it runs, since your entire
> program consumes about 1/10 of a CPU second anyway?
> 

Depends on number of keys...

n ^ 2  is less when n is low,
n = 10 and n = 10000 seems not to matter much...
but 10^2 and 10000^2 are a differrence that matters a lot!

Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread
* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:10 Oliver Bandel
  2004-04-24  7:30 ` Martin Jambon
  0 siblings, 1 reply; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:10 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: Richard Jones <rich@annexia.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Wed, Apr 21, 2004 at 09:39:47AM +0100, Richard Jones wrote:
> On Wed, Apr 21, 2004 at 03:19:04AM +0200, Oliver Bandel wrote:
> > I think a good addition to the Hashtbl-module
> > would be a function, that gives back a list of keys
> > that are in the hash.
> 
> And also a 'values' function which returns all the values.

Well, ok, here is a problem then: for a keys-function
it makes sense to only give back them once, so that means
multiple occurrences (because of more than one binding)
of a key will given back only once.

But to have a values-function, it is not clear, if they
should be reported once or as often as they appear.

This is individually different, depending on what you want to get.
You can use Hashtbl.iter and Hashtbl.fold for creating
each version of them.

But it does not makes sense to have a keys-function that
reports keys as often as there are bindings to it in the
hashtable, because you easily can get all bindings of it
with find_all and the the current binding with find.

So, wheras the keys function makes only sense to be implemented
in a certain way (report each key only once), there are
at least two kinds of a value-function possible.
So, IMHO this is a reason, why a Hashtbl.values function
does not necessarily makes sense in the standard lib itself!



> 
> Both functions, BTW, are easily written using the Hashtbl's fold
> function; but they ought to be part of the standard library.

When you want to report each key only once, you need a temporarily
hash to sort out multiple stuff. This, because the fold-function
reports EACH binding (and it good that it does it), and therefore
multiple bindings to a key will occur more than once.
The function has to sort them out.

I'm shure it's more efficient to implement that inside the
hashtbl-module. (I've not looked into the sources, but suppose
that it will be more performant, when a Hashtbl.keys will
be implemented inside the Hashtbl-module.)

WHICH values function would make more sense than the other,
and why should it be used instead the other one?

(Or maybe it makes sense to have a boolean parameter to decide,
 which of both values-function should be used?!)

Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-04-26 17:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-23 20:09 [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys] Oliver Bandel
2004-04-24  3:00 ` skaller
2004-04-24  4:46   ` Jon Harrop
2004-04-24 10:39     ` Oliver Bandel
2004-04-25 18:54     ` Marcin 'Qrczak' Kowalczyk
2004-04-25 20:06       ` Jon Harrop
2004-04-26 10:14         ` Richard Jones
2004-04-26 17:13           ` Jon Harrop
2004-04-24  6:42   ` Basile STARYNKEVITCH
2004-04-24 19:12     ` skaller
2004-04-24  8:56   ` Oliver Bandel
2004-04-23 20:09 Oliver Bandel
2004-04-23 20:09 Oliver Bandel
2004-04-23 20:10 Oliver Bandel
2004-04-23 20:10 Oliver Bandel
2004-04-24  7:30 ` Martin Jambon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).