Re: [Caml-list] Hashtbl iter semantics

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Re: [Caml-list] Hashtbl iter semantics
@ 2002-05-27 19:04 Damien Doligez
  2002-05-29 19:01 ` John Max Skaller
  0 siblings, 1 reply; 9+ messages in thread
From: Damien Doligez @ 2002-05-27 19:04 UTC (permalink / raw)
  To: skaller, xavier.leroy; +Cc: caml-list

>From: John Max Skaller <skaller@ozemail.com.au>

>For example: delete: if  a binding is deleted it won't be presented 
>subsequently.
>Add: the binding may or may not be presented.
>Replace:  if the binding is presented, it will have the data value used 
>in replace.
>If the binding was already presented, it will not be represented.
>Otherwise: the binding will be presented with its initial data value.

AFAICT from a quick look at the source, it's more like this:

delete: the binding may or may not be presented subsequently
add: the binding may or may not be presented
replace: the binding may be presented with the old or the new data
value; if it was already presented, it will not be represented
otherwise: the binding will be presented with its initial data value.

-- Damien
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Hashtbl iter semantics
  2002-05-27 19:04 [Caml-list] Hashtbl iter semantics Damien Doligez
@ 2002-05-29 19:01 ` John Max Skaller
  2002-05-30  6:53   ` Francois Pottier
  0 siblings, 1 reply; 9+ messages in thread
From: John Max Skaller @ 2002-05-29 19:01 UTC (permalink / raw)
  To: Damien Doligez; +Cc: xavier.leroy, caml-list

Damien Doligez wrote:

>AFAICT from a quick look at the source, it's more like this:
>
>delete: the binding may or may not be presented subsequently
>add: the binding may or may not be presented
>replace: the binding may be presented with the old or the new data
>value; if it was already presented, it will not be represented
>otherwise: the binding will be presented with its initial data value.
>
Ok: that's perfectly reasonable, it just needs to be documented.

[The replace semantics you describe are unfortunate for me: my unification
algorithm keeps a hashtable of variable -> term bindings into which
I have to incrementally subtitute while iterating through the table.
At present, I'm building a fresh table, but then I have to clear
the original one and copy the whole contents up. I was hoping
to just modify the table in place .. ~2.5 times faster ..]


-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850




-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Hashtbl iter semantics
  2002-05-29 19:01 ` John Max Skaller
@ 2002-05-30  6:53   ` Francois Pottier
  2002-05-31 17:29     ` John Max Skaller
  0 siblings, 1 reply; 9+ messages in thread
From: Francois Pottier @ 2002-05-30  6:53 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Thu, May 30, 2002 at 05:01:32AM +1000, John Max Skaller wrote:
> 
> [The replace semantics you describe are unfortunate for me: my unification
> algorithm keeps a hashtable of variable -> term bindings into which
> I have to incrementally subtitute while iterating through the table.

Do you need to maintain *several* such substitutions (i.e. tables) with
the same domain (i.e. the same set of variables)? If not (that is, if
you only have one hash table), then it is customary to use a mutable
field, within each variable record, to store the associated term. It's
simpler and the overhead for access is very small.

One nice way of doing so is to use a generic union/find algorithm, such
as the one I am attaching to this message. Simply define

  type term =
    descriptor UnionFind.point

  and descriptor =
  | TInteger
  | TArrow of term * term
  | ...

and you're all set.

-- 
François Pottier
Francois.Pottier@inria.fr
http://pauillac.inria.fr/~fpottier/

[-- Attachment #2: unionFind.mli --]
[-- Type: text/plain, Size: 1676 bytes --]

(* $Header: /home/pauillac/formel1/fpottier/cvs/modulo/unionFind.mli,v 1.1 2002/04/08 09:55:51 fpottier Exp $ *)

(* This module provides support for a basic union/find algorithm. *)

(* The abstraction defined by this module is a set of points,
   partitioned into equivalence classes. With each equivalence class,
   a piece of information, of abstract type ['a], is associated; we
   call it a descriptor. *)

type 'a point

(* [equivalent point1 point2] tells whether [point1] and [point2] belong to
   the same equivalence class. *)

val equivalent: 'a point -> 'a point -> bool

(* [describe point] returns the descriptor associated with [point]'s
   equivalence class. *)

val describe: 'a point -> 'a

(* [fresh desc] creates a fresh point and returns it. It forms an
   equivalence class of its own, whose descriptor is [desc]. *)

val fresh: 'a -> 'a point

(* [self desc] creates a fresh point [point] and returns it. It forms an
   equivalence class of its own, whose descriptor is [desc point]. In
   other words, the descriptor is allowed to recursively refer to
   the point that is being created. Use with caution -- the function
   [desc] is allowed to store a pointer to [point], but must not use it
   in any other way. *)

val self: ('a point -> 'a) -> 'a point

(* [alias point1 point2] merges the equivalence classes associated
   with [point1] and [point2], which must be distinct, into a single
   class, whose descriptor is that originally associated with
   [point2]'s equivalence class. *)

val alias: 'a point -> 'a point -> unit

(* [redundant] maps all members of an equivalence class, but one,
   to [true]. *)

val redundant: 'a point -> bool

[-- Attachment #3: unionFind.ml --]
[-- Type: text/plain, Size: 3809 bytes --]

(* $Header: /home/pauillac/formel1/fpottier/cvs/modulo/unionFind.ml,v 1.1 2002/04/08 09:55:50 fpottier Exp $ *)

(* This module provides support for a basic union/find algorithm. *)

(* The abstraction defined by this module is a set of points,
   partitioned into equivalence classes. With each equivalence class,
   a piece of information, of abstract type ['a], is associated; we
   call it a descriptor. *)

(* A point is implemented as a cell, whose (mutable) contents consist
   of a single link to either a descriptor, or another point. Thus,
   points form a graph, which must be acyclic, and whose connected
   components are the equivalence classes. In every equivalence class,
   exactly one point has no outgoing edge, and carries the class's
   descriptor instead. It is the class's representative element. *)

type 'a point = {
    mutable link: 'a link
  } 

and 'a link =
  | Immediate of 'a
  | Link of 'a point

(* [repr point] returns the representative element of [point]'s
   equivalence class. It is found by starting at [point] and following
   the links. For efficiency, the function performs path compression
   at the same time. *)

let rec repr point =
  match point.link with
  | Link point' ->
      let point'' = repr point' in
      if point'' != point' then

	(* [point''] is [point']'s representative element. Because we
	   just invoked [repr point'], [point'.link] must be [Link
	   point'']. We write this value into [point.link], thus
	   performing path compression. Note that this function never
	   performs memory allocation. *)

	point.link <- point'.link;
      point''
  | Immediate _ ->
      point

(* [equivalent point1 point2] tells whether [point1] and [point2] belong to
   the same equivalence class. *)

let equivalent point1 point2 =
  repr point1 == repr point2

(* [describe point] returns the descriptor associated with [point]'s
   equivalence class. *)

let rec describe point =

  (* By not calling [repr] immediately, we optimize the common cases
     where the path starting at [point] has length 0 or 1, at the
     expense of the general case. *)

  match point.link with
  | Immediate desc ->
      desc
  | Link { link = Immediate desc } ->
      desc
  | Link { link = Link _ } ->
      describe (repr point)

(* [fresh desc] creates a fresh point and returns it. It forms an
   equivalence class of its own, whose descriptor is [desc]. *)

let fresh desc = {
  link = Immediate desc
} 

(* [self desc] creates a fresh point [point] and returns it. It forms an
   equivalence class of its own, whose descriptor is [desc point]. In
   other words, the descriptor is allowed to recursively refer to
   the point that is being created. Use with caution -- the function
   [desc] is allowed to store a pointer to [point], but must not use it
   in any other way. *)

let self desc =

  (* Create a dangling point. We cannot (yet) create its descriptor. *)

  let rec point = {
    link = Link point (* this is the only placeholder available *)
  } in

  (* Create the descriptor and assign the point to it. *)

  point.link <- Immediate (desc point);
  point

(* [alias point1 point2] merges the equivalence classes associated
   with [point1] and [point2], which must be distinct, into a single
   class, whose descriptor is that originally associated with
   [point2]'s equivalence class.

   The fact that [point1] and [point2] do not originally belong to the
   same class guarantees that we do not create a cycle in the
   graph. *)

let alias point1 point2 =
  let point1 = repr point1 in
  assert (point1 != repr point2);
  point1.link <- Link point2

(* [redundant] maps all members of an equivalence class, but one,
   to [true]. *)

let redundant = function
  | { link = Link _ } ->
      true
  | { link = Immediate _ } ->
      false

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Hashtbl iter semantics
  2002-05-30  6:53   ` Francois Pottier
@ 2002-05-31 17:29     ` John Max Skaller
  2002-06-03  7:03       ` Francois Pottier
  0 siblings, 1 reply; 9+ messages in thread
From: John Max Skaller @ 2002-05-31 17:29 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: caml-list

Francois Pottier wrote:

>On Thu, May 30, 2002 at 05:01:32AM +1000, John Max Skaller wrote:
>
>>[The replace semantics you describe are unfortunate for me: my unification
>>algorithm keeps a hashtable of variable -> term bindings into which
>>I have to incrementally subtitute while iterating through the table.
>>
>
>Do you need to maintain *several* such substitutions (i.e. tables) with
>the same domain (i.e. the same set of variables)? 
>

Don't know. At present, I have a single global table, and the variables 
are 'unknowns' rather than
variables: they represent function return types (or components thereof). 
 So at present,
all the variables must be eliminated, and have a universal value. (and 
this is proving
much harder than I thought to get right ..)

However, I am just introducing parametric polymorphism ..  in that case 
a 'variable'
takes on a new value for each instantiation. On the other hand,  the client
will initially have to provide the values explicitly. I will still need 
subsumption,
however, to resolve overload ambiguity in favour of the most specialised
matching function.

[Things get really interesting when subtyping is introduced as well ..]

>If not (that is, if
>you only have one hash table), then it is customary to use a mutable
>field, within each variable record, to store the associated term. It's
>simpler and the overhead for access is very small.
>
Yes, I could do that (though it doesn't handle deletion so well .)

>One nice way of doing so is to use a generic union/find algorithm, such
>as the one I am attaching to this message.
>

Thx. I'm using the Robinson algorithm at present: simple, but not efficient.
[And it doesn't match equi-recursive terms correctly .. I can't figure out
a fast way to minimise such a term .. hmmm.. linearise and use a
string matching algorithm ..?]

-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Hashtbl iter semantics
  2002-05-31 17:29     ` John Max Skaller
@ 2002-06-03  7:03       ` Francois Pottier
  2002-06-03 17:07         ` John Max Skaller
  0 siblings, 1 reply; 9+ messages in thread
From: Francois Pottier @ 2002-06-03  7:03 UTC (permalink / raw)
  To: John Max Skaller; +Cc: François Pottier, caml-list

> Thx. I'm using the Robinson algorithm at present: simple, but not efficient.
> [And it doesn't match equi-recursive terms correctly .. I can't figure out
> a fast way to minimise such a term .. hmmm.. linearise and use a
> string matching algorithm ..?]

I don't know Robinson's algorithm, but the usual way to perform unification
in the presence of cyclic (equi-recursive) terms is to physically unify two
nodes *before* looking at their sub-terms (instead of *after*, or not at all,
in the non-cyclic case). This way, if you run into a cycle and you end up
attempting to unify the same two nodes, you will succeed immediately, provided
your first step is to check for physical equality. I can provide some sample
code if that helps.

That said, type inference in the presence of equi-recursive types is considered
difficult for the user to figure out... many intuitively `meaningless' programs
admit (weird) recursive types.

-- 
François Pottier
Francois.Pottier@inria.fr
http://pauillac.inria.fr/~fpottier/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Hashtbl iter semantics
  2002-06-03  7:03       ` Francois Pottier
@ 2002-06-03 17:07         ` John Max Skaller
  0 siblings, 0 replies; 9+ messages in thread
From: John Max Skaller @ 2002-06-03 17:07 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: caml-list

Francois Pottier wrote:

>but the usual way to perform unification
>in the presence of cyclic (equi-recursive) terms is to physically unify two
>nodes *before* looking at their sub-terms (instead of *after*, or not at all,
>in the non-cyclic case). This way, if you run into a cycle and you end up
>attempting to unify the same two nodes, you will succeed immediately, provided
>your first step is to check for physical equality. I can provide some sample
>code if that helps.
>
Should be ok: that's what Cardelli's comparison does (which I have 
implemented).

>
>That said, type inference in the presence of equi-recursive types is considered
>difficult for the user to figure out... many intuitively `meaningless' programs
>admit (weird) recursive types.
>
Sure, but I'm not doing "type inference", but "type deduction".
The types of function arguments *must* be declared.
Felix deduces the type of expressions, and thus variables,
and also function return types (in most cases :-))

I chose that strategy for two reasons: (1) full scale type inference
is difficult to implement, difficult to understand, and very difficult to
give good error messages. (2) Felix supports overloading.
If type inference with overloading is tractible, it is surely even
more difficult to implement and understand.

The current rule for generic functions is that type arguments
must be explicitly given. (I hope to relax that later ..)

Actually, I uave two kinds of type variables;

    1) initially unknown function return types
    2) variables in generics

The first kind aren't 'variable'  (they have single type, it just isn't
known yet). The second kind are just parameters which
have to be explicitly bound to types by the client.

So the scope of the 'unification' algorithm is (1) deducing
function return types, and (2) comparing two generic
types to see if one matches the other.

(1) is implemented, but the algorithm is very unsatisfying.
The problem is that during deduction it is necessary to
perform overload resoliution: I can't handle partial matching
here, so all the type variables have to be eliminated at that point.

-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Hashtbl iter semantics
  2002-05-27 14:38 ` Xavier Leroy
@ 2002-05-27 18:16   ` John Max Skaller
  0 siblings, 0 replies; 9+ messages in thread
From: John Max Skaller @ 2002-05-27 18:16 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Xavier Leroy wrote:

>>Could the ocaml team please clarify the semantics of Hashtbl.iter?
>>In particular, can I use Hashtbl.replace during iteration?
>>[What about add, removed, etc?]
>>
>
>You can use Hashtbl.{replace,add,remove} while iterating over the same
>hash table, in the sense that it will not corrupt the table.  However,
>there is no guarantee that the iteration will "see" the modification
>or not.  E.g. if you add an element, the function being iterated can
>be called on the newly added element, or not, depending on the
>relative positions of the new element and of the iteration point
>inside the hash table.
>
>If you need stronger guarantees 
>
No, I'd just like some text in the manual please, that I can use to
reason about the correctness of my algorithms.

For example, it isn't clear if I add a key, if the table might not
be reorganised so that 50 keys from the table are never presented.

The table isn't 'corrupted, though the scan pointer might be :-)
I guess nothing of the sort will happen with Hashtbl.iter.  

For example: delete: if  a binding is deleted it won't be presented 
subsequently.
Add: the binding may or may not be presented.
Replace:  if the binding is presented, it will have the data value used 
in replace.
If the binding was already presented, it will not be represented.
Otherwise: the binding will be presented with its initial data value.

-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Hashtbl iter semantics
  2002-05-26 15:43 John Max Skaller
@ 2002-05-27 14:38 ` Xavier Leroy
  2002-05-27 18:16   ` John Max Skaller
  0 siblings, 1 reply; 9+ messages in thread
From: Xavier Leroy @ 2002-05-27 14:38 UTC (permalink / raw)
  To: John Max Skaller; +Cc: caml-list

> Could the ocaml team please clarify the semantics of Hashtbl.iter?
> In particular, can I use Hashtbl.replace during iteration?
> [What about add, removed, etc?]

You can use Hashtbl.{replace,add,remove} while iterating over the same
hash table, in the sense that it will not corrupt the table.  However,
there is no guarantee that the iteration will "see" the modification
or not.  E.g. if you add an element, the function being iterated can
be called on the newly added element, or not, depending on the
relative positions of the new element and of the iteration point
inside the hash table.

If you need stronger guarantees (e.g. the iteration must operate on
the initial state of the table), consider replacing the hash table by
a reference to a binary tree (module Map).

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Caml-list] Hashtbl iter semantics
@ 2002-05-26 15:43 John Max Skaller
  2002-05-27 14:38 ` Xavier Leroy
  0 siblings, 1 reply; 9+ messages in thread
From: John Max Skaller @ 2002-05-26 15:43 UTC (permalink / raw)
  To: caml-list

Could the ocaml team please clarify the semantics of Hashtbl.iter?
In particular, can I use Hashtbl.replace during iteration?
[What about add, removed, etc?]

-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-06-03 17:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-27 19:04 [Caml-list] Hashtbl iter semantics Damien Doligez
2002-05-29 19:01 ` John Max Skaller
2002-05-30  6:53   ` Francois Pottier
2002-05-31 17:29     ` John Max Skaller
2002-06-03  7:03       ` Francois Pottier
2002-06-03 17:07         ` John Max Skaller
  -- strict thread matches above, loose matches on Subject: below --
2002-05-26 15:43 John Max Skaller
2002-05-27 14:38 ` Xavier Leroy
2002-05-27 18:16   ` John Max Skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).