caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
@ 2004-04-21  1:19 Oliver Bandel
  2004-04-21  8:39 ` Richard Jones
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Oliver Bandel @ 2004-04-21  1:19 UTC (permalink / raw)
  To: caml-list

Hi,

I think a good addition to the Hashtbl-module
would be a function, that gives back a list of keys
that are in the hash.

Well, ok, I've written code for doing this task,
before I've mailed this idea to the list,
so I'm not looking for code / for help with sources here.


I only think, this is such an often used function, that
it should be integrated into the Hastbl-module.

Well, should be something like this one:


           val keys : ('a, 'b) t -> 'a list

  usage:
      Hashtbl.keys <hash>  returns a list of all keys of the
      hash "hash", that exists and therefore have values bound
      to it. If a key has several bindings, it will occur only
      once in this list.
   

IMHO it should be possible to implement a Hashtbl.keys function
inside the Hashtbl-module-sources much more efficient than
doing it via the already given Hashtbl-functions of that module,
that are exported.

But even if this would not be true, I think, giving back a list of
all keys in a hash (with each key reported only once, even if it has
more than one binding) is a rudimentary and necessary function,
that should be integrated in the standard-Hashtbl-Lib-module.

Best regards to all OCaml core developers (and all other readers too) here.

Ciao,
   Oliver Bandel

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-21  1:19 [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys Oliver Bandel
@ 2004-04-21  8:39 ` Richard Jones
  2004-04-21  9:13 ` Martin Jambon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Richard Jones @ 2004-04-21  8:39 UTC (permalink / raw)
  Cc: caml-list

On Wed, Apr 21, 2004 at 03:19:04AM +0200, Oliver Bandel wrote:
> I think a good addition to the Hashtbl-module
> would be a function, that gives back a list of keys
> that are in the hash.

And also a 'values' function which returns all the values.

Both functions, BTW, are easily written using the Hashtbl's fold
function; but they ought to be part of the standard library.

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
http://www.winwinsales.co.uk/ - CRM improvement consultancy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-21  1:19 [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys Oliver Bandel
  2004-04-21  8:39 ` Richard Jones
@ 2004-04-21  9:13 ` Martin Jambon
  2004-04-23 12:51 ` Xavier Leroy
  2004-04-23 18:28 ` John Goerzen
  3 siblings, 0 replies; 25+ messages in thread
From: Martin Jambon @ 2004-04-21  9:13 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Wed, 21 Apr 2004, Oliver Bandel wrote:

>       Hashtbl.keys <hash>  returns a list of all keys of the
>       hash "hash", that exists and therefore have values bound
>       to it. If a key has several bindings, it will occur only
>       once in this list.

For that purpose, it is possible instead of a (key, data) Hashtbl.t to use
a (key, data list ref) Hashtbl.t:

let add tbl key data =
  let r =
    try Hashtbl.find tbl key
    with Not_found ->
      let r = ref [] in
      Hashtbl.add tbl key r;
      r in
  r := data :: !r

let fold f tbl init =
  Hashtbl.fold (fun key r accu -> f key (List.hd !r) accu) tbl init

let keys tbl =
  fold (fun key _ accu -> key :: accu) tbl []


Martin

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-21  1:19 [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys Oliver Bandel
  2004-04-21  8:39 ` Richard Jones
  2004-04-21  9:13 ` Martin Jambon
@ 2004-04-23 12:51 ` Xavier Leroy
  2004-04-23 13:05   ` Jean-Baptiste Rouquier
                     ` (3 more replies)
  2004-04-23 18:28 ` John Goerzen
  3 siblings, 4 replies; 25+ messages in thread
From: Xavier Leroy @ 2004-04-23 12:51 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

> I think a good addition to the Hashtbl-module
> would be a function, that gives back a list of keys
> that are in the hash.

With your specification (no repetitions in the list), that function
would run in quadratic time, which is a sure sign that lists aren't
the right data structure here.  (More generally speaking, "lists
without repetitions" is almost always the wrong data structure.)

If you shoot for decent efficiency, you have two choices: 1- a list of
keys with possible repetitions, and 2- a set of keys.
1- is linear-time but rarely useful.  2- requires a suitable set
module over the keys, and thus can't be done inside the Hashtbl library.

As you see, there is no single "good" behavior and interface for the
function you outline.  (For another example, refer to the discussion
about String.map earlier on this list; every poster had their own
ideas about what type it should have.)  That's a sign that it doesn't
belong in the standard library, and you should program the behavior
that you need in your application using Hashtbl.fold.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 12:51 ` Xavier Leroy
@ 2004-04-23 13:05   ` Jean-Baptiste Rouquier
  2004-04-23 16:04     ` Xavier Leroy
  2004-04-23 15:03   ` Richard Jones
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Jean-Baptiste Rouquier @ 2004-04-23 13:05 UTC (permalink / raw)
  Cc: caml-list


>With your specification (no repetitions in the list), that function
>would run in quadratic time,
>
Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
list, I first sort it in O(n log n), then remove the repetitions in O(n).
Jean-Baptiste.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 12:51 ` Xavier Leroy
  2004-04-23 13:05   ` Jean-Baptiste Rouquier
@ 2004-04-23 15:03   ` Richard Jones
  2004-04-24  1:58     ` skaller
  2004-04-23 16:06   ` Brian Hurt
  2004-04-23 18:29   ` John Goerzen
  3 siblings, 1 reply; 25+ messages in thread
From: Richard Jones @ 2004-04-23 15:03 UTC (permalink / raw)
  To: caml-list

Maybe there's no good behaviour, but I keep writing a 'keys' function
in almost every program I've written which uses Hashtbl, which
certainly suggests that something is missing.

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
http://www.YouUnlimited.co.uk/ - management courses

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 13:05   ` Jean-Baptiste Rouquier
@ 2004-04-23 16:04     ` Xavier Leroy
  2004-04-23 18:21       ` Jon Harrop
  2004-04-23 18:29       ` John Goerzen
  0 siblings, 2 replies; 25+ messages in thread
From: Xavier Leroy @ 2004-04-23 16:04 UTC (permalink / raw)
  To: jrouquie; +Cc: caml-list

> Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
> list, I first sort it in O(n log n), then remove the repetitions in O(n).
> Jean-Baptiste.

OK, fair enough :-)  The point I was trying to make (not very well, I
agree) is that "list without repetition" or "sorted list without
repetition" is often not the best data structure for the job.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 12:51 ` Xavier Leroy
  2004-04-23 13:05   ` Jean-Baptiste Rouquier
  2004-04-23 15:03   ` Richard Jones
@ 2004-04-23 16:06   ` Brian Hurt
  2004-04-23 16:31     ` Martin Jambon
  2004-04-23 17:27     ` Christoph Bauer
  2004-04-23 18:29   ` John Goerzen
  3 siblings, 2 replies; 25+ messages in thread
From: Brian Hurt @ 2004-04-23 16:06 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Oliver Bandel, caml-list

On Fri, 23 Apr 2004, Xavier Leroy wrote:

> > I think a good addition to the Hashtbl-module
> > would be a function, that gives back a list of keys
> > that are in the hash.
> 
> With your specification (no repetitions in the list), that function
> would run in quadratic time, which is a sure sign that lists aren't
> the right data structure here.  (More generally speaking, "lists
> without repetitions" is almost always the wrong data structure.)

No, I think creating such a list would take O(n log n) time.

OK, we're starting with a hash table.  That means we have a set of 
buckets, each bucket is a set of key/data pairs.  Assume the same key can 
be inserted multiple times (can it?)- in this case, all duplicate keys 
should be in the same bucket.  So, for each bucket, I sort all entries in 
the bucket by key (worst case I only have one bucket and sorting is O(n 
log n)).  Once sorted, I go throught and eliminate duplicates, which is 
now an O(n) algorithm:

let uniq lst =
   let rec loop accum = function
       | [] -> List.rev accum
       | x :: [] -> List.rev (x :: accum)
       | x :: y :: t ->
           if (x = y) then
               loop accum (x :: t)
           else
               loop (x :: accum) (y :: t)
    in
    loop [] lst
;;

(You can do it more efficiently that this, but this gets the idea across)
Viola- uniqueness in subquadratic time.  And in practice, approaching 
linear time- hashtables with lots of elements in a single bucket are 
computationally expensive, so you're likely to be sorting a whole bunch of 
short (1 and 2 element) lists.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 16:06   ` Brian Hurt
@ 2004-04-23 16:31     ` Martin Jambon
  2004-04-23 17:27     ` Christoph Bauer
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Jambon @ 2004-04-23 16:31 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Xavier Leroy, Oliver Bandel, caml-list

On Fri, 23 Apr 2004, Brian Hurt wrote:

> On Fri, 23 Apr 2004, Xavier Leroy wrote:
>
> > > I think a good addition to the Hashtbl-module
> > > would be a function, that gives back a list of keys
> > > that are in the hash.
> >
> > With your specification (no repetitions in the list), that function
> > would run in quadratic time, which is a sure sign that lists aren't
> > the right data structure here.  (More generally speaking, "lists
> > without repetitions" is almost always the wrong data structure.)
>
> No, I think creating such a list would take O(n log n) time.
>
> OK, we're starting with a hash table.  That means we have a set of
> buckets, each bucket is a set of key/data pairs.  Assume the same key can
> be inserted multiple times (can it?)

Yes it can.
Wouldn't it be easier to avoid duplicated keys???
See http://martin.jambon.free.fr/hashtbl2/Hashtbl2.html
and http://martin.jambon.free.fr/hashtbl2.ml


Martin

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 16:06   ` Brian Hurt
  2004-04-23 16:31     ` Martin Jambon
@ 2004-04-23 17:27     ` Christoph Bauer
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Bauer @ 2004-04-23 17:27 UTC (permalink / raw)
  To: OCaml List

Brian Hurt <bhurt@spnz.org> writes:

> let uniq lst =
>    let rec loop accum = function
>        | [] -> List.rev accum
>        | x :: [] -> List.rev (x :: accum)
>        | x :: y :: t ->
>            if (x = y) then
>                loop accum (x :: t)
>            else
>                loop (x :: accum) (y :: t)
>     in
>     loop [] lst
> ;;

I found these lines in my code:

let unique list =
  let h = Hashtbl.create (2*List.length list) in
    List.filter
      (fun a -> 
           let known =  Hashtbl.mem h a in
           Hashtbl.add h a true;     
           not known) list

On average Hashtbl.mem should take 2 accesses. So this algorithm
runs in O(n). (But the worst case is very bad.)

Regards,
Christoph Bauer	   

-- 
beginfig(1)u=3cm;draw fullcircle scaled 2u;x0=x1=y1=x2=y3=0;-y0=y2=x3=1u;
filldraw z0..{left}z1{left}..z2{curl 1}..z3..z0..cycle;def t(expr p)=fullcircle
scaled .25u shifted(0,p*u);enddef;unfill t(.5);fill t(-.5);endfig;bye

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 16:04     ` Xavier Leroy
@ 2004-04-23 18:21       ` Jon Harrop
  2004-04-23 21:31         ` Jon Harrop
  2004-04-23 18:29       ` John Goerzen
  1 sibling, 1 reply; 25+ messages in thread
From: Jon Harrop @ 2004-04-23 18:21 UTC (permalink / raw)
  To: caml-list


On Friday 23 April 2004 2:05 pm, Jean-Baptiste Rouquier wrote:
> >With your specification (no repetitions in the list), that function
> >would run in quadratic time,
>
> Perhaps I'm all wrong, but when I have to get rid of repetitions in a
> list, I first sort it in O(n log n), then remove the repetitions in O(n).
> Jean-Baptiste.

I have no idea what I'm talking about, but does this function not do what you 
want in O(n):

let hashtbl_keys tbl =
  let rec helper k d l = match l with
    [] -> [k]
  | ok::t -> if k=ok then l else k::l
  in
  Hashtbl.fold helper tbl []

because Hashtbl's iter and fold present duplicate keys subsequently?

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-21  1:19 [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys Oliver Bandel
                   ` (2 preceding siblings ...)
  2004-04-23 12:51 ` Xavier Leroy
@ 2004-04-23 18:28 ` John Goerzen
  3 siblings, 0 replies; 25+ messages in thread
From: John Goerzen @ 2004-04-23 18:28 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Wed, Apr 21, 2004 at 03:19:04AM +0200, Oliver Bandel wrote:
> I only think, this is such an often used function, that
> it should be integrated into the Hastbl-module.

You may be interested that Missinglib, which I will announce here
shortly, has this in its Hashtblutil module:

let map f hash = Hashtbl.fold (fun key value l -> (f key value) :: l) hash [];;
let keys hash = map (fun key value -> key) hash;;
let values hash = map (fun key value -> value) hash;;
let items hash = map (fun key value -> key, value)  hash;;
let length hash = List.length (keys hash);;

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 12:51 ` Xavier Leroy
                     ` (2 preceding siblings ...)
  2004-04-23 16:06   ` Brian Hurt
@ 2004-04-23 18:29   ` John Goerzen
       [not found]     ` <20040423191010.GB1506@first.in-berlin.de>
  3 siblings, 1 reply; 25+ messages in thread
From: John Goerzen @ 2004-04-23 18:29 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Oliver Bandel, caml-list

On Fri, Apr 23, 2004 at 02:51:41PM +0200, Xavier Leroy wrote:
> With your specification (no repetitions in the list), that function
> would run in quadratic time, which is a sure sign that lists aren't
> the right data structure here.  (More generally speaking, "lists
> without repetitions" is almost always the wrong data structure.)

But perhaps you don't really care how fast it runs, since your entire
program consumes about 1/10 of a CPU second anyway?

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 16:04     ` Xavier Leroy
  2004-04-23 18:21       ` Jon Harrop
@ 2004-04-23 18:29       ` John Goerzen
       [not found]         ` <20040423190710.GA1506@first.in-berlin.de>
  1 sibling, 1 reply; 25+ messages in thread
From: John Goerzen @ 2004-04-23 18:29 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: jrouquie, caml-list

On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote:
> > Perhaps I'm all wrong, but when I have to get rid of repetitions in a 
> > list, I first sort it in O(n log n), then remove the repetitions in O(n).
> > Jean-Baptiste.
> 
> OK, fair enough :-)  The point I was trying to make (not very well, I
> agree) is that "list without repetition" or "sorted list without
> repetition" is often not the best data structure for the job.

And a keys function need not necessarily avoid repetition anyway.

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
       [not found]     ` <20040423191010.GB1506@first.in-berlin.de>
@ 2004-04-23 20:41       ` John Goerzen
       [not found]         ` <20040424080904.GA821@first.in-berlin.de>
  0 siblings, 1 reply; 25+ messages in thread
From: John Goerzen @ 2004-04-23 20:41 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Fri, Apr 23, 2004 at 09:10:11PM +0200, Oliver Bandel wrote:
> On Fri, Apr 23, 2004 at 01:29:06PM -0500, John Goerzen wrote:
> > On Fri, Apr 23, 2004 at 02:51:41PM +0200, Xavier Leroy wrote:
> > > With your specification (no repetitions in the list), that function
> > > would run in quadratic time, which is a sure sign that lists aren't
> > > the right data structure here.  (More generally speaking, "lists
> > > without repetitions" is almost always the wrong data structure.)
> > 
> > But perhaps you don't really care how fast it runs, since your entire
> > program consumes about 1/10 of a CPU second anyway?
> 
> Depends on number of keys...
> 
> n ^ 2  is less when n is low,
> n = 10 and n = 10000 seems not to matter much...
> but 10^2 and 10000^2 are a differrence that matters a lot!

OK, but why should we eliminate a useful function because 1% of uses
of it will be slow?

Make a note in the docs (like that notes that are there already that say
"this function is not tail-recursive") and put it in for those that will
find it useful.

On another note, I really don't understand why hashtables permit
duplicate keys anyway; it just doesn't seem useful to me.  All my
Hashtbl code uses Hashtbl.replace....

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
       [not found]         ` <20040423190710.GA1506@first.in-berlin.de>
@ 2004-04-23 20:42           ` John Goerzen
  0 siblings, 0 replies; 25+ messages in thread
From: John Goerzen @ 2004-04-23 20:42 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Fri, Apr 23, 2004 at 09:07:10PM +0200, Oliver Bandel wrote:
> On Fri, Apr 23, 2004 at 01:29:35PM -0500, John Goerzen wrote:
> > On Fri, Apr 23, 2004 at 06:04:07PM +0200, Xavier Leroy wrote:
> > > repetition" is often not the best data structure for the job.
> > 
> > And a keys function need not necessarily avoid repetition anyway.
> > 
> [...]
> 
> It ONLY makes sense to have a keys function that avoids repitition!

I don't buy that.

> Anything else can be done with List.iter and List.fold
> from the outside of the Hashtbl-module implementation!

And that doesn't mean that having the function is useless.

Hell, Array.sub can be done entirely from pure OCaml.  Ditto for
String.sub.  That doesn't mean that having it in the standard library is
useless.

It's a lot easier to read code that says "keys hash" than a fold
incarnation.  Not to mention being easier to write, maintain, and learn.

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 18:21       ` Jon Harrop
@ 2004-04-23 21:31         ` Jon Harrop
  2004-04-23 21:53           ` John Goerzen
  2004-04-26  6:28           ` Florian Hars
  0 siblings, 2 replies; 25+ messages in thread
From: Jon Harrop @ 2004-04-23 21:31 UTC (permalink / raw)
  To: caml-list


I just realised that we may all be using different defintions for the notion 
of time-complexity. My function performed "n" operations where "n" is the 
number of keys including duplicates. It may be possible to write a routine 
(which would need to be in Hashtbl, I think) which could do this in "n" 
operations where "n" is the number of unique keys (i.e. the size of the 
output list). I guess that was what the original poster meant. I suspect 
Hashtbl allows different keys in the same bin, so the actual number of 
operations will be somewhere between the two but could be made considerably 
faster than my external function.

I should also say that I have never had any need for this function. What do 
you guys use it for?

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 21:31         ` Jon Harrop
@ 2004-04-23 21:53           ` John Goerzen
  2004-04-26  6:28           ` Florian Hars
  1 sibling, 0 replies; 25+ messages in thread
From: John Goerzen @ 2004-04-23 21:53 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Fri, Apr 23, 2004 at 10:31:36PM +0100, Jon Harrop wrote:
> I should also say that I have never had any need for this function. What do 
> you guys use it for?

One example is the sections function at
http://gopher.quux.org:70/devel/missinglib/html/ConfigParser.rawConfigParser.html.
Internally, a configuration file is represented as a hash of section
names to a hash of option names to values.  (A hash of a hash of
strings, basically.)  Sometimes, you might not know what sections to
expect up-front.  Getting a list of the keys in the top-level hash will
return the needed information.  This could occur for many reasons:
configuring multiple ISP accounts in a network tool, configuring
multiple database sources in an ODBC configuration, etc.

Here's a contrived example, but one I could imagine encountering: Having
a hash mapping filenames to data about them (say, from stat(2).  Perhaps
you'd have a need sometimes to just get a list of files (maybe to copy
them elsewhere or display a list for the user).

Yes, one could use fold to do all of this.  Point is, it's often easier
to just call "keys" and get a list.  The raw performance may not always
matter.  (Just witness the popularity of Ruby <g>)

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 15:03   ` Richard Jones
@ 2004-04-24  1:58     ` skaller
  2004-04-24  9:20       ` Nicolas Cannasse
  2004-04-26  7:29       ` Jean-Christophe Filliatre
  0 siblings, 2 replies; 25+ messages in thread
From: skaller @ 2004-04-24  1:58 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

On Sat, 2004-04-24 at 01:03, Richard Jones wrote:
> Maybe there's no good behaviour, but I keep writing a 'keys' function
> in almost every program I've written which uses Hashtbl, which
> certainly suggests that something is missing.

The problem is that the Ocaml hashtbl isn't a conventional
hash table. It allows multiple bindings which means it
cannot be treated as a simple container.

I spend some time making sure I don't use that
feature by mistake. 

Similar 'too rich' functionality exists for
many data types, for example mutable strings and arrays.
On the other hand some functionality is missing,
for example variable length arrays.

The former can be fixed easily by the client
building a wrapper library with restricted functionality.

The latter *needs* to be added to the standard
library urgently because it cannot be implemented
at all in Ocaml without magic, and it is a very
useful and important data structure to have.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-24  1:58     ` skaller
@ 2004-04-24  9:20       ` Nicolas Cannasse
  2004-04-24 19:26         ` skaller
  2004-04-26  7:29       ` Jean-Christophe Filliatre
  1 sibling, 1 reply; 25+ messages in thread
From: Nicolas Cannasse @ 2004-04-24  9:20 UTC (permalink / raw)
  To: skaller, Richard Jones; +Cc: caml-list

> On Sat, 2004-04-24 at 01:03, Richard Jones wrote:
> > Maybe there's no good behaviour, but I keep writing a 'keys' function
> > in almost every program I've written which uses Hashtbl, which
> > certainly suggests that something is missing.
>
> The problem is that the Ocaml hashtbl isn't a conventional
> hash table. It allows multiple bindings which means it
> cannot be treated as a simple container.
>
> I spend some time making sure I don't use that
> feature by mistake.
>
> Similar 'too rich' functionality exists for
> many data types, for example mutable strings and arrays.
> On the other hand some functionality is missing,
> for example variable length arrays.
>
> The former can be fixed easily by the client
> building a wrapper library with restricted functionality.
>
> The latter *needs* to be added to the standard
> library urgently because it cannot be implemented
> at all in Ocaml without magic, and it is a very
> useful and important data structure to have.

You can have a look at the nice DynArray module from the ExtLib :
http://ocaml-lib.sf.net

Regards,
Nicolas Cannasse

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-24  9:20       ` Nicolas Cannasse
@ 2004-04-24 19:26         ` skaller
  0 siblings, 0 replies; 25+ messages in thread
From: skaller @ 2004-04-24 19:26 UTC (permalink / raw)
  To: Nicolas Cannasse; +Cc: Richard Jones, caml-list

On Sat, 2004-04-24 at 19:20, Nicolas Cannasse wrote:

> You can have a look at the nice DynArray module from the ExtLib :
> http://ocaml-lib.sf.net

I am (or at least was) one of the ExtLib developers..

I think the DynArray is good. I wish INRIA would
take it.

But I'm not happy with the lazy iteration,
which is a central feature of that library.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
       [not found]         ` <20040424080904.GA821@first.in-berlin.de>
@ 2004-04-24 20:59           ` John Goerzen
  2004-04-25  8:12             ` Oliver Bandel
  0 siblings, 1 reply; 25+ messages in thread
From: John Goerzen @ 2004-04-24 20:59 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Sat, Apr 24, 2004 at 10:09:04AM +0200, Oliver Bandel wrote:
> On Fri, Apr 23, 2004 at 03:41:49PM -0500, John Goerzen wrote:
> > OK, but why should we eliminate a useful function because 1% of uses
> > of it will be slow?
> > 
> > Make a note in the docs (like that notes that are there already that say
> > "this function is not tail-recursive") and put it in for those that will
> > find it useful.
> 
> 
> What does the O()-notation have to do with tail recursiveness?
> IMHO nothing. But I'm not a computer scientist. Maybe there
> is a linkage between. But the O()-notation says something about the

What I'm saying is this: a known problem with a function, whether it is
excessive stack use or slow performance, is not necessarily a reason to
keep it out of the standard library.  The flaw should be noted in the
documentation.  And that's just what has been done with the
non-tail-recursive functions.

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-24 20:59           ` John Goerzen
@ 2004-04-25  8:12             ` Oliver Bandel
  0 siblings, 0 replies; 25+ messages in thread
From: Oliver Bandel @ 2004-04-25  8:12 UTC (permalink / raw)
  To: caml-list

On Sat, Apr 24, 2004 at 03:59:36PM -0500, John Goerzen wrote:
> On Sat, Apr 24, 2004 at 10:09:04AM +0200, Oliver Bandel wrote:
> > On Fri, Apr 23, 2004 at 03:41:49PM -0500, John Goerzen wrote:
> > > OK, but why should we eliminate a useful function because 1% of uses
> > > of it will be slow?
> > > 
> > > Make a note in the docs (like that notes that are there already that say
> > > "this function is not tail-recursive") and put it in for those that will
> > > find it useful.
> > 
> > 
> > What does the O()-notation have to do with tail recursiveness?
> > IMHO nothing. But I'm not a computer scientist. Maybe there
> > is a linkage between. But the O()-notation says something about the
> 
> What I'm saying is this: a known problem with a function, whether it is
> excessive stack use or slow performance, is not necessarily a reason to
> keep it out of the standard library.  The flaw should be noted in the
> documentation.  And that's just what has been done with the
> non-tail-recursive functions.

OK, I see and I agree with that.

Nevertheless everybody is interested to have best stuff inside
the lib, which is possible.

Ciao,
   Oliver

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-23 21:31         ` Jon Harrop
  2004-04-23 21:53           ` John Goerzen
@ 2004-04-26  6:28           ` Florian Hars
  1 sibling, 0 replies; 25+ messages in thread
From: Florian Hars @ 2004-04-26  6:28 UTC (permalink / raw)
  To: caml-list

Jon Harrop wrote:
> I should also say that I have never had any need for this function. What do 
> you guys use it for?

Porting from perl (or the perl solution in your head).

Yours, Florian.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys
  2004-04-24  1:58     ` skaller
  2004-04-24  9:20       ` Nicolas Cannasse
@ 2004-04-26  7:29       ` Jean-Christophe Filliatre
  1 sibling, 0 replies; 25+ messages in thread
From: Jean-Christophe Filliatre @ 2004-04-26  7:29 UTC (permalink / raw)
  To: skaller; +Cc: Richard Jones, caml-list


skaller writes:
 > 
 > The problem is that the Ocaml hashtbl isn't a conventional
 > hash table. It allows multiple bindings which means it
 > cannot be treated as a simple container.

It's  probably done  on  purpose.  Indeed, it  is  very convenient  to
implement symbol tables --- in an imperative way --- because with this
semantics for Hashtbl.add a new  binding really hides any previous one
until the end of its  scope where Hashtbl.remove restores the previous
binding.

This has nothing to do with Ocaml; I'm pretty sure you can find C hash
tables implementations with the same feature.

-- 
Jean-Christophe

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-04-26  7:49 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-21  1:19 [Caml-list] Should be INSIDE STANDARD-LIB: Hashtbl.keys Oliver Bandel
2004-04-21  8:39 ` Richard Jones
2004-04-21  9:13 ` Martin Jambon
2004-04-23 12:51 ` Xavier Leroy
2004-04-23 13:05   ` Jean-Baptiste Rouquier
2004-04-23 16:04     ` Xavier Leroy
2004-04-23 18:21       ` Jon Harrop
2004-04-23 21:31         ` Jon Harrop
2004-04-23 21:53           ` John Goerzen
2004-04-26  6:28           ` Florian Hars
2004-04-23 18:29       ` John Goerzen
     [not found]         ` <20040423190710.GA1506@first.in-berlin.de>
2004-04-23 20:42           ` John Goerzen
2004-04-23 15:03   ` Richard Jones
2004-04-24  1:58     ` skaller
2004-04-24  9:20       ` Nicolas Cannasse
2004-04-24 19:26         ` skaller
2004-04-26  7:29       ` Jean-Christophe Filliatre
2004-04-23 16:06   ` Brian Hurt
2004-04-23 16:31     ` Martin Jambon
2004-04-23 17:27     ` Christoph Bauer
2004-04-23 18:29   ` John Goerzen
     [not found]     ` <20040423191010.GB1506@first.in-berlin.de>
2004-04-23 20:41       ` John Goerzen
     [not found]         ` <20040424080904.GA821@first.in-berlin.de>
2004-04-24 20:59           ` John Goerzen
2004-04-25  8:12             ` Oliver Bandel
2004-04-23 18:28 ` John Goerzen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).