caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:09 Oliver Bandel
  2004-04-24  3:00 ` skaller
  0 siblings, 1 reply; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:09 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: Xavier Leroy <xavier.leroy@inria.fr>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote:
> > Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
> > list, I first sort it in O(n log n), then remove the repetitions in O(n).
> > Jean-Baptiste.
> 
> OK, fair enough :-)  The point I was trying to make (not very well, I
> agree) is that "list without repetition" or "sorted list without
> repetition" is often not the best data structure for the job.
[...]

NOT the datastructure FOR THE JOB BUT FOR THE RESULT!


So, ok, because people now are posting their code, I send
one of me. I had many different implementations, maybe
one ore the other is more suited to do the task.
The one that I invented some minutes ago looks like this:



let keys_of_hash hash =
     let keys = Hashtbl.create 10000 in (* stupid hard coded value *)
     let sample k v = Hashtbl.replace keys k true in
     Hashtbl.iter sample hash;
     Hashtbl.fold (fun k v init -> k::init) keys []


Because there is no Hashtbl.size in the standard lib,
which gives back the number of over-all bindings in
the hash, I had to insert some stupid value.
(I was to lazy to write it by myself. It also
 does not make sense to walk through the whole
 hash to get the Hashtbl.size and then to allocate
 the keys-.Hash and then To iterate and then to fold...
 ...there will be more efficient solutions than that...
 btw: I don't know how inefficient it would be to use
 Hashtbl.create 1  and then let grow the hash itself...)


OK, it's possible to calculate Hashtbl.size with
Hastbl.iter, but when Hashtbl.size would be part
of the standard Hashtbl-lib, this would make sense.

I hope we can agree on that Hashtbl.size would return
an int?! ;-)

Hashtbl.create needs an int for the initial size, and
doing things like the above stuff, would be more
consistently when there would be a Hashtbl.size
inside the standard Hastbl-module.
a) The return type is clear (isn't it?!)
b) Estimating initial size of hashtables can be predicted in
   similar cases like the above one.

But nevertheless I insist in Hashtbl.keys as a very useful
function.
You then can use Hastbl.find as well as Hashtbl.find_all
for retrieval of all contents, depending on your needs.

Why I think Hashtbl.keys makes more sense than Hashtbl.values?
Because it reflects the sense of a Hashtable, to have direct
access to the values VIA THE KEYS.
And that also is the reason, why I think Hastbl.keys should
necessarily be part of the lib.

Also I think it will be (much) more efficient to do it inside the
Hashtbl-lib instead of doing it with such outside-approaches.
The direct way inside the lib seems to make more sense to me here.


Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-23 20:09 [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys] Oliver Bandel
@ 2004-04-24  3:00 ` skaller
  2004-04-24  4:46   ` Jon Harrop
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: skaller @ 2004-04-24  3:00 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Sat, 2004-04-24 at 06:09, Oliver Bandel wrote:

> Because there is no Hashtbl.size in the standard lib,

which sux because O(n) is a high price to pay
for an integer the Hashtbl could keep track of
easily.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-24  3:00 ` skaller
@ 2004-04-24  4:46   ` Jon Harrop
  2004-04-24 10:39     ` Oliver Bandel
  2004-04-25 18:54     ` Marcin 'Qrczak' Kowalczyk
  2004-04-24  6:42   ` Basile STARYNKEVITCH
  2004-04-24  8:56   ` Oliver Bandel
  2 siblings, 2 replies; 16+ messages in thread
From: Jon Harrop @ 2004-04-24  4:46 UTC (permalink / raw)
  To: caml-list

On Saturday 24 April 2004 4:00 am, skaller wrote:
> On Sat, 2004-04-24 at 06:09, Oliver Bandel wrote:
> > Because there is no Hashtbl.size in the standard lib,
>
> which sux because O(n) is a high price to pay
> for an integer the Hashtbl could keep track of
> easily.

I don't want the overhead of hash tables carrying around and updating an int. 
I was miffed when they put an O(1) criterion on length() in the STL "list" in 
C++...

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-24  3:00 ` skaller
  2004-04-24  4:46   ` Jon Harrop
@ 2004-04-24  6:42   ` Basile STARYNKEVITCH
  2004-04-24 19:12     ` skaller
  2004-04-24  8:56   ` Oliver Bandel
  2 siblings, 1 reply; 16+ messages in thread
From: Basile STARYNKEVITCH @ 2004-04-24  6:42 UTC (permalink / raw)
  To: skaller, caml-list

On Sat, Apr 24, 2004 at 01:00:39PM +1000, skaller wrote:
> On Sat, 2004-04-24 at 06:09, Oliver Bandel wrote:
> 
> > Because there is no Hashtbl.size in the standard lib,
> 

In the CVS of Ocaml,
http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/stdlib/hashtbl.mli?rev=1.35&content-type=text/x-cvsweb-markup
 Hashtbl does have a 

val length : ('a, 'b) t -> int
(** [Hashtbl.length tbl] returns the number of bindings in [tbl].
   Multiple bindings are counted multiply, so [Hashtbl.length]
   gives the number of times [Hashtbl.iter] calls it first argument. *)

which I suppose you would have wanted to be  Hashtbl.size 




-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-24  3:00 ` skaller
  2004-04-24  4:46   ` Jon Harrop
  2004-04-24  6:42   ` Basile STARYNKEVITCH
@ 2004-04-24  8:56   ` Oliver Bandel
  2 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-24  8:56 UTC (permalink / raw)
  To: caml-list; +Cc: caml-list

On Sat, Apr 24, 2004 at 01:00:39PM +1000, skaller wrote:
> On Sat, 2004-04-24 at 06:09, Oliver Bandel wrote:
> 
> > Because there is no Hashtbl.size in the standard lib,
> 
> which sux because O(n) is a high price to pay
> for an integer the Hashtbl could keep track of
> easily.

What is the difference between both?

Do I understand you correctly?
You say, keeping track of a count does decrement the
performance dramaticaly?

How big is the performance difference between using a
count and using no count?

Ciao,
   Oliver

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-24  4:46   ` Jon Harrop
@ 2004-04-24 10:39     ` Oliver Bandel
  2004-04-25 18:54     ` Marcin 'Qrczak' Kowalczyk
  1 sibling, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-24 10:39 UTC (permalink / raw)
  To: caml-list

On Sat, Apr 24, 2004 at 05:46:17AM +0100, Jon Harrop wrote:
> On Saturday 24 April 2004 4:00 am, skaller wrote:
> > On Sat, 2004-04-24 at 06:09, Oliver Bandel wrote:
> > > Because there is no Hashtbl.size in the standard lib,
> >
> > which sux because O(n) is a high price to pay
> > for an integer the Hashtbl could keep track of
> > easily.
> 
> I don't want the overhead of hash tables carrying around and updating an int. 
> I was miffed when they put an O(1) criterion on length() in the STL "list" in 
> C++...
> 

OK, if it is too much performance critical, then forget the idea with
the counter.

So Hashtbl.size should be done with Hashtbl.iter and incrementing
a counter-ref then.

But maybe it nevertheless makes sense to put it inside the
Hashtbl-module. (But not necissarily.)

Ciao,
   Oliver

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-24  6:42   ` Basile STARYNKEVITCH
@ 2004-04-24 19:12     ` skaller
  0 siblings, 0 replies; 16+ messages in thread
From: skaller @ 2004-04-24 19:12 UTC (permalink / raw)
  To: Basile STARYNKEVITCH; +Cc: caml-list

On Sat, 2004-04-24 at 16:42, Basile STARYNKEVITCH wrote:

> In the CVS of Ocaml,
> http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/stdlib/hashtbl.mli?rev=1.35&content-type=text/x-cvsweb-markup
>  Hashtbl does have a 
> 
> val length : ('a, 'b) t -> int
> (** [Hashtbl.length tbl] returns the number of bindings in [tbl].
>    Multiple bindings are counted multiply, so [Hashtbl.length]
>    gives the number of times [Hashtbl.iter] calls it first argument. *)
> 
> which I suppose you would have wanted to be  Hashtbl.size 

The name 'length' agrees with other Ocaml libraries.

I didn't notice this was added: I use CVS Ocaml 
-- and am willing to demand my clients do too at the moment --
and missed that this has been added. 

Thanks for this info -- and thanks for adding this
function!

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-24  4:46   ` Jon Harrop
  2004-04-24 10:39     ` Oliver Bandel
@ 2004-04-25 18:54     ` Marcin 'Qrczak' Kowalczyk
  2004-04-25 20:06       ` Jon Harrop
  1 sibling, 1 reply; 16+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2004-04-25 18:54 UTC (permalink / raw)
  To: caml-list

W liście z sob, 24-04-2004, godz. 05:46 +0100, Jon Harrop napisał:

> I don't want the overhead of hash tables carrying around and updating an int.

But it already does it! (to know when to rehash)

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-25 18:54     ` Marcin 'Qrczak' Kowalczyk
@ 2004-04-25 20:06       ` Jon Harrop
  2004-04-26 10:14         ` Richard Jones
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Harrop @ 2004-04-25 20:06 UTC (permalink / raw)
  To: caml-list

On Sunday 25 April 2004 7:54 pm, Marcin 'Qrczak' Kowalczyk wrote:
> W liście z sob, 24-04-2004, godz. 05:46 +0100, Jon Harrop napisał:
> > I don't want the overhead of hash tables carrying around and updating an
> > int.
>
> But it already does it! (to know when to rehash)

Another int, then. ;-)

In case anyone is interested, my peeve about carrying the length of a list 
around inside the list in the case of the STL came from my use of a data 
structure for molecular dynamics: The atomic structure is described by 
particle coordinates within a cubic box under periodic boundary conditions. 
Conventionally, the box is partitioned into voxels so that you can lookup 
neighbouring atoms efficiently. Each voxel contains a list of atoms within 
the voxel. I was often using ~10^6 voxels, so the STL list was carrying 4Mb 
of unecessary data with my data structure.

Thus, in the context of ocaml, I'd like to see "low-level" libraries which 
carry as little as possible. This isn't just for cache coherence but also 
because I think it is good design. If someone wants a Hashtbl which carries 
(and updates) baggage around with it to save you from having to record things 
yourself then they can create a "higher-level" implementation from the 
original. So, in summary, I like Hashtbl just the way it is... lean and 
mean. :-)

Cheers,
Jon.

PS: I tried to do the same thing in Mathematica but it can only pack the whole 
(4-dimensions of my) nested list into an array, or none of it. :(

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-25 20:06       ` Jon Harrop
@ 2004-04-26 10:14         ` Richard Jones
  2004-04-26 17:13           ` Jon Harrop
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Jones @ 2004-04-26 10:14 UTC (permalink / raw)
  Cc: caml-list

On Sun, Apr 25, 2004 at 09:06:33PM +0100, Jon Harrop wrote:
> In case anyone is interested, my peeve about carrying the length of a list 
> around inside the list in the case of the STL came from my use of a data 
> structure for molecular dynamics: The atomic structure is described by 
> particle coordinates within a cubic box under periodic boundary conditions. 
> Conventionally, the box is partitioned into voxels so that you can lookup 
> neighbouring atoms efficiently. Each voxel contains a list of atoms within 
> the voxel. I was often using ~10^6 voxels, so the STL list was carrying 4Mb 
> of unecessary data with my data structure.

Don't you think that this is a rather special case, and probably you
should have built your own data structure?

For general purpose programming having the extra int (in STL) probably
made sense [not that I've used C++ for years, and I don't think I'll
be using it again].

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
Learning Objective CAML for C, C++, Perl and Java programmers:
http://www.merjis.com/richj/computers/ocaml/tutorial/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-26 10:14         ` Richard Jones
@ 2004-04-26 17:13           ` Jon Harrop
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Harrop @ 2004-04-26 17:13 UTC (permalink / raw)
  To: caml-list

On Monday 26 April 2004 11:14 am, Richard Jones wrote:
> > ...
> Don't you think that this is a rather special case, and probably you
> should have built your own data structure?

No, that applies whenever you have a program which handles a lot of 
containers. I think containers should be "light" for that reason.

> For general purpose programming having the extra int (in STL) probably
> made sense [not that I've used C++ for years, and I don't think I'll
> be using it again].

It is of very little advantage, IMHO. If it were for a trait which required a 
similar amount of storage but which could not be easily or efficiently 
maintained from the outside then yes, sure.

IIRC, the committee's explanation for making lists carry around and update 
their own length was that newbies might not expect "size()" to be O(n). I 
don't think such arguments are constructive when designing a standard 
library...

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
  2004-04-23 20:10 Oliver Bandel
@ 2004-04-24  7:30 ` Martin Jambon
  0 siblings, 0 replies; 16+ messages in thread
From: Martin Jambon @ 2004-04-24  7:30 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Fri, 23 Apr 2004, Oliver Bandel wrote:

> When you want to report each key only once, you need a temporarily
> hash to sort out multiple stuff. This, because the fold-function
> reports EACH binding (and it good that it does it), and therefore
> multiple bindings to a key will occur more than once.
> The function has to sort them out.
>
> I'm shure it's more efficient to implement that inside the
> hashtbl-module. (I've not looked into the sources, but suppose
> that it will be more performant, when a Hashtbl.keys will
> be implemented inside the Hashtbl-module.)

Please, tell me why you don't like my solution:
  http://caml.inria.fr/archives/200404/msg00527.html

Best regards,

Martin

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:10 Oliver Bandel
  2004-04-24  7:30 ` Martin Jambon
  0 siblings, 1 reply; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:10 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: Richard Jones <rich@annexia.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Wed, Apr 21, 2004 at 09:39:47AM +0100, Richard Jones wrote:
> On Wed, Apr 21, 2004 at 03:19:04AM +0200, Oliver Bandel wrote:
> > I think a good addition to the Hashtbl-module
> > would be a function, that gives back a list of keys
> > that are in the hash.
> 
> And also a 'values' function which returns all the values.

Well, ok, here is a problem then: for a keys-function
it makes sense to only give back them once, so that means
multiple occurrences (because of more than one binding)
of a key will given back only once.

But to have a values-function, it is not clear, if they
should be reported once or as often as they appear.

This is individually different, depending on what you want to get.
You can use Hashtbl.iter and Hashtbl.fold for creating
each version of them.

But it does not makes sense to have a keys-function that
reports keys as often as there are bindings to it in the
hashtable, because you easily can get all bindings of it
with find_all and the the current binding with find.

So, wheras the keys function makes only sense to be implemented
in a certain way (report each key only once), there are
at least two kinds of a value-function possible.
So, IMHO this is a reason, why a Hashtbl.values function
does not necessarily makes sense in the standard lib itself!



> 
> Both functions, BTW, are easily written using the Hashtbl's fold
> function; but they ought to be part of the standard library.

When you want to report each key only once, you need a temporarily
hash to sort out multiple stuff. This, because the fold-function
reports EACH binding (and it good that it does it), and therefore
multiple bindings to a key will occur more than once.
The function has to sort them out.

I'm shure it's more efficient to implement that inside the
hashtbl-module. (I've not looked into the sources, but suppose
that it will be more performant, when a Hashtbl.keys will
be implemented inside the Hashtbl-module.)

WHICH values function would make more sense than the other,
and why should it be used instead the other one?

(Or maybe it makes sense to have a boolean parameter to decide,
 which of both values-function should be used?!)

Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:10 Oliver Bandel
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:10 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: John Goerzen <jgoerzen@complete.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 01:29:06PM -0500, John Goerzen wrote:
> On Fri, Apr 23, 2004 at 02:51:41PM +0200, Xavier Leroy wrote:
> > With your specification (no repetitions in the list), that function
> > would run in quadratic time, which is a sure sign that lists aren't
> > the right data structure here.  (More generally speaking, "lists
> > without repetitions" is almost always the wrong data structure.)
> 
> But perhaps you don't really care how fast it runs, since your entire
> program consumes about 1/10 of a CPU second anyway?
> 

Depends on number of keys...

n ^ 2  is less when n is low,
n = 10 and n = 10000 seems not to matter much...
but 10^2 and 10000^2 are a differrence that matters a lot!

Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:09 Oliver Bandel
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:09 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: John Goerzen <jgoerzen@complete.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 01:29:35PM -0500, John Goerzen wrote:
> On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote:
> > > Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
> > > list, I first sort it in O(n log n), then remove the repetitions in O(n).
> > > Jean-Baptiste.
> > 
> > OK, fair enough :-)  The point I was trying to make (not very well, I
> > agree) is that "list without repetition" or "sorted list without
> > repetition" is often not the best data structure for the job.
> 
> And a keys function need not necessarily avoid repetition anyway.
> 
[...]

It ONLY makes sense to have a keys function that avoids repitition!


Anything else can be done with List.iter and List.fold
from the outside of the Hashtbl-module implementation!

But explicitly ONLY a keys-function that reports keys only ONCE
is a valid candidate for going into the standard lib!

Ciao,
   Oliver

P.S.: If you want to have the keys reported as often as there are
      bindings to them, you can use
         List.length (Hashtbl.find_all ht key)
      to get the number of bindings for the key.

      ( And then you can print several times that key_n is
        used inside the hash. IMHO this is called redundance. :) )

      This is above not much code and does not looks like a performance
      problem, when doing outside the Hash-implementation.

      But to retrieve a non-repitional keys-list, it is much more
      work to do and it also seems to be much more efficient implemented
      when done inside the Hashtbl-lib's code.
      
      What keys to use in the above code-snippet?

      Well, you could do it like this:

      Get the keys with Hashtbl.keys and for each of
      the keys use the above code snippet to get the
      number of entries and report this ONCE. :)
      (or many times, if you really want to ;-))


----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys]
@ 2004-04-23 20:09 Oliver Bandel
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2004-04-23 20:09 UTC (permalink / raw)
  To: caml-list

----- Forwarded message from oliver -----

To: Richard Jones <rich@annexia.org>
Subject: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys

On Fri, Apr 23, 2004 at 04:03:22PM +0100, Richard Jones wrote:
> Maybe there's no good behaviour, but I keep writing a 'keys' function
> in almost every program I've written which uses Hashtbl, which
> certainly suggests that something is missing.

FULL ACK

Hashtables are an ideal tool for many applications,
where retrieving BY KEYs is necessary.
I mean it seems like they were imvented to do this.
So, why should a person write a keys-function by him/herself,
if this is the MAIN work for what Hashtables are useful?

If it only makes sense to collect some data, no Hashtbl
would be necessary and instead a list could be used.

But Hashtables are there to do jobs, where we have
keys and get values.

Hashtbl.keys gives a complete access to all valid
Hastbl.find/Hashtbl.find_all retrievals.
So, when there is a Hashtbl.keys not trying stochastical
"let's look if there is something inside" must be done,
but certain retrieves for data.


Ciao,
   Oliver

----- End forwarded message -----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-04-26 17:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-23 20:09 [oliver: Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys] Oliver Bandel
2004-04-24  3:00 ` skaller
2004-04-24  4:46   ` Jon Harrop
2004-04-24 10:39     ` Oliver Bandel
2004-04-25 18:54     ` Marcin 'Qrczak' Kowalczyk
2004-04-25 20:06       ` Jon Harrop
2004-04-26 10:14         ` Richard Jones
2004-04-26 17:13           ` Jon Harrop
2004-04-24  6:42   ` Basile STARYNKEVITCH
2004-04-24 19:12     ` skaller
2004-04-24  8:56   ` Oliver Bandel
2004-04-23 20:09 Oliver Bandel
2004-04-23 20:09 Oliver Bandel
2004-04-23 20:10 Oliver Bandel
2004-04-23 20:10 Oliver Bandel
2004-04-24  7:30 ` Martin Jambon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).