Alternative generic hash function

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Alternative generic hash function
@ 2000-05-24 22:04 Manuel Fahndrich
  2000-05-25  7:00 ` Pierre Weis
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Manuel Fahndrich @ 2000-05-24 22:04 UTC (permalink / raw)
  To: caml-list

The generic hash function Hashtbl.hash traverses into mutable data
structures. So I get different hash values depending on updates of the data
structure. 

# let x = ref None;;
val x : '_a option ref = {contents=None}
# Hashtbl.hash x;;
- : int = 0
# x := Some 5;;
- : unit = ()
# Hashtbl.hash x;;
- : int = 5

As was pointed out times before on this list, Hashtbl.hash is thus
unsuitable for hashing on data structures that contain mutable data.

I'm wondering what the design rational is for this choice. I'd rather have a
generic hash function (Hashtbl.hash_pure) that never looks into mutable
structures (skips refs and mutable fields of records and classes).
The reason being that if I want to hash on data that contains references,
I'm more likely to want to find the structure independent of the contents of
any mutable subparts.

So yes, a hashtable containing int refs would pointless, because it maps
them all to the same bucket. But I frequently use hashing with equality
being ==. Whenever I do this, I need to actually add some unique identifier
to the data structure just for the hashing operation. Having an alternate
hash function (Hashtbl.hash_pure) would make such id's (and their managment)
unnecessary.

I see how the current working of Hashtbl.hash is useful for data structures
that contain mutable parts in the case where such structures don't change
between the time they are stored and later looked up in the hash table. But
I'm wondering how often this actually arises.

Furthermore, using Hashtbl.hash_pure in place of Hashtbl.hash would not
break any old code (except for performance), since all it does is make the
equivalence classes bigger. On the other hand, using Hashtbl.hash with the
misconception that it acts as Hashtbl.hash_pure breaks lots of code.

I rest my case.

	Manuel Fahndrich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-24 22:04 Alternative generic hash function Manuel Fahndrich
@ 2000-05-25  7:00 ` Pierre Weis
  2000-05-25  7:14 ` STARYNKEVITCH Basile
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Pierre Weis @ 2000-05-25  7:00 UTC (permalink / raw)
  To: Manuel Fahndrich; +Cc: caml-list

> The generic hash function Hashtbl.hash traverses into mutable data
> structures. So I get different hash values depending on updates of the data
> structure. 
[...]
> As was pointed out times before on this list, Hashtbl.hash is thus
> unsuitable for hashing on data structures that contain mutable data.
> 
> I'm wondering what the design rational is for this choice. I'd rather have a
> generic hash function (Hashtbl.hash_pure) that never looks into mutable
> structures (skips refs and mutable fields of records and classes).
[...]

The rationale is simple: there is no way to distinguish between
mutable and immutable data structures at runtime. So, there is no
possibility to implement the suggested hash_pure function in a
polymorphic way. If you really want to use keys that belong to a
mutable data structure (which is not a good idea anyway), you should
use the functorial interface to hash tables to specify your own
hashing function, that would implement hash_pure on your particular
key type.

Best regards,

-- 
Pierre Weis

INRIA, Projet Cristal, http://pauillac.inria.fr/~weis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Alternative generic hash function
  2000-05-24 22:04 Alternative generic hash function Manuel Fahndrich
  2000-05-25  7:00 ` Pierre Weis
@ 2000-05-25  7:14 ` STARYNKEVITCH Basile
  2000-05-25  7:38 ` Dmitri Lomov
  2000-05-25  8:24 ` Francois Pottier
  3 siblings, 0 replies; 12+ messages in thread
From: STARYNKEVITCH Basile @ 2000-05-25  7:14 UTC (permalink / raw)
  To: Manuel Fahndrich; +Cc: caml-list

>>>>> "Manuel" == Manuel Fahndrich <maf@microsoft.com> writes:

    Manuel> The generic hash function Hashtbl.hash traverses into
    Manuel> mutable data structures. [...]

    Manuel> As was pointed out times before on this list, Hashtbl.hash
    Manuel> is thus unsuitable for hashing on data structures that
    Manuel> contain mutable data.

    Manuel> I'm wondering what the design rational is for this
    Manuel> choice. I'd rather have a generic hash function
    Manuel> (Hashtbl.hash_pure) that never looks into mutable
    Manuel> structures (skips refs and mutable fields of records and
    Manuel> classes).  

I disagree. And it is probably difficult to implement in current Ocaml
(which lacks any reflective facilities). The runtime does not know
about mutable fields! If I want to hash mutable data structure, I will
provide a specific hashing function.

What I am lacking of is an hashing function accepting parametric
type. I would like to define

type 'a symbol_t = { sym_name: string; sym_hash: int; sym_val: 'a option}


Then have

let make_symbol s = { sym_name= s; sym_hash= Hashtbl.hash s; sym_val= None}

let hash_symbol { sym_hash= h } = h 

Unfortunately I cannot make an hashtable using hash_symbol as hashing
function 

Actually, hashing is a difficult subject. Perhaps there should be
several hashing modules in Ocaml stdlib?

Regards

N.B. Any opinions expressed here are only mine, and not of my organization.
N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA.

---------------------------------------------------------------------
Basile STARYNKEVITCH   ----  Commissariat à l Energie Atomique 
DTA/LETI/DEIN/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX * France
phone: 1,69.08.60.55; fax: 1.69.08.83.95 home: 1,46.65.45.53
email: Basile point Starynkevitch at cea point fr 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-24 22:04 Alternative generic hash function Manuel Fahndrich
  2000-05-25  7:00 ` Pierre Weis
  2000-05-25  7:14 ` STARYNKEVITCH Basile
@ 2000-05-25  7:38 ` Dmitri Lomov
  2000-05-25  8:24 ` Francois Pottier
  3 siblings, 0 replies; 12+ messages in thread
From: Dmitri Lomov @ 2000-05-25  7:38 UTC (permalink / raw)
  To: caml-list; +Cc: Manuel Fahndrich

AFA I understand, Hastable.hash_pure will be hard to implement in run-time,
as now at run-time it is not known whether given data structure, or 
given data structure field is mutable or not.
Such information is not propagated to run-time by ocaml compiler.

Possibly your option here is to divide your data structure into mutable
and immutable parts, and than use Hashtable.hash on immutable part.
Then all will work well.

This, however, does not cover the example you present. 
But actually the nature of hashing assumes the hashed object has
some immutable part. For example, when hashing strings you do not assume 
to get the same hash value for mutable string if you change some character of it.
In your example everything in an object changes - it does not have any 
immutable properties/elements/fields to "hook" hash to.

By the way, a rough hashing function for all O'Caml objects can be based
on block length and/or tag (they are accesible via Obj module).
Such hash function will tend to be very biased, due to biased
distribution of lengths and tags, but in very bad situations it 
may help.

I use such function in my (forthcoming :-)) dynamic code generation library 
for O'Caml to keep external references from code block. 

Regards,
Dmitri

Manuel Fahndrich wrote:
> 
> The generic hash function Hashtbl.hash traverses into mutable data
> structures. So I get different hash values depending on updates of the data
> structure.
> 
> # let x = ref None;;
> val x : '_a option ref = {contents=None}
> # Hashtbl.hash x;;
> - : int = 0
> # x := Some 5;;
> - : unit = ()
> # Hashtbl.hash x;;
> - : int = 5
> 
> As was pointed out times before on this list, Hashtbl.hash is thus
> unsuitable for hashing on data structures that contain mutable data.
> 
> I'm wondering what the design rational is for this choice. I'd rather have a
> generic hash function (Hashtbl.hash_pure) that never looks into mutable
> structures (skips refs and mutable fields of records and classes).
> The reason being that if I want to hash on data that contains references,
> I'm more likely to want to find the structure independent of the contents of
> any mutable subparts.
> 
> So yes, a hashtable containing int refs would pointless, because it maps
> them all to the same bucket. But I frequently use hashing with equality
> being ==. Whenever I do this, I need to actually add some unique identifier
> to the data structure just for the hashing operation. Having an alternate
> hash function (Hashtbl.hash_pure) would make such id's (and their managment)
> unnecessary.
> 
> I see how the current working of Hashtbl.hash is useful for data structures
> that contain mutable parts in the case where such structures don't change
> between the time they are stored and later looked up in the hash table. But
> I'm wondering how often this actually arises.
> 
> Furthermore, using Hashtbl.hash_pure in place of Hashtbl.hash would not
> break any old code (except for performance), since all it does is make the
> equivalence classes bigger. On the other hand, using Hashtbl.hash with the
> misconception that it acts as Hashtbl.hash_pure breaks lots of code.
> 
> I rest my case.
> 
>         Manuel Fahndrich

_________________________________________________________________
Dmitri S. Lomov
mailto:dsl@tepkom.ru    ICQ#: 20524819 (Rusty)
http://users.tepkom.ru/dsl, http://young.tepkom.ru
ftp://young.tepkom.ru
+7 (812) 428-46-57 (b)   +7 (812) 295-94-15 (h)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-24 22:04 Alternative generic hash function Manuel Fahndrich
                   ` (2 preceding siblings ...)
  2000-05-25  7:38 ` Dmitri Lomov
@ 2000-05-25  8:24 ` Francois Pottier
  2000-05-26 19:35   ` Dan Grossman
  3 siblings, 1 reply; 12+ messages in thread
From: Francois Pottier @ 2000-05-25  8:24 UTC (permalink / raw)
  To: caml-list

Hello,

Manuel Fähndrich wrote:

> I'd rather have a generic hash function (Hashtbl.hash_pure) that never
> looks into mutable structures

Currently, mutable and immutable structures cannot be distinguished at
runtime. This would require maintaining additional type information,
and would probably introduce a lot of overhead.

> I frequently use hashing with equality being ==. Whenever I do this, I
> need to actually add some unique identifier to the data structure just
> for the hashing operation. Having an alternate hash function
> (Hashtbl.hash_pure) would make such id's (and their managment)
> unnecessary.

This unique identifier trick is a common one, and, unfortunately, it
appears to be necessary. How would you expect this alternate hash
function to be implemented? Clearly, it will also have to add unique
tags to every value. (A value's address cannot be used as a hash code,
since it may change during garbage collection. Physical addresses can
only be compared for (dis)equality, but their numeric value is
meaningless.) Better let the programmer do this on a case-by-case basis
than add this overhead to every value creation.

Basile Starynkevitch wrote:

> Unfortunately I cannot make an hashtable using hash_symbol as hashing
> function

You can, but you have to have to choose a fixed (monomorphic) value
for the 'a parameter, e.g.

  module M = Hashtbl.Make(struct
        type t = int symbol_t
        let equal x1 x2 = x1.sym_name = x2.sym_name
        let hash = hash_symbol
      end);;

'a cannot remain unspecified, otherwise you would be allowed to create
a table with heterogeneous keys.

In fact, in your particular example, I don't see why you would want
to use values of type symbol_t as keys: obviously, you are only using
the symbol's name as a key. Why not use a hash table whose keys are
symbol names, and store the sym_val information someplace else?

-- 
François Pottier
Francois.Pottier@inria.fr
http://pauillac.inria.fr/~fpottier/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-25  8:24 ` Francois Pottier
@ 2000-05-26 19:35   ` Dan Grossman
  2000-05-29 13:13     ` Francois Pottier
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Grossman @ 2000-05-26 19:35 UTC (permalink / raw)
  To: caml-list

[Veuillez me pardonner; je ne donne pas une traduction en français parce
que j'ai peur de petites fautes qui peut changer sévèrement la
signification.]

Francois Pottier wrote:

> Currently, mutable and immutable structures cannot be distinguished at
> runtime. This would require maintaining additional type information,
> and would probably introduce a lot of overhead.

Why do you think the overhead would be high?

I agree that:
* Changing the compiler and runtime system is a fair amount of work.
* A slightly different hash function does not justify the cost.
* The cost for recording mutability __per-field__ is probably large.

But I do not see why the __per-object__ overhead would be large.  From
my reading of the OCaml sources, it seems that:
* It is trivial for the compiler to maintain this information all the
way down to an allocation site.  At that point, it suffices to use a
different bytecode or native-code sequence to record that the object 
has at least one mutable field.
* Recording this information would only require 1 header bit.  (I
personally could live with 21 size bits.)  Getting this bit into the
header word should not be hard.
* Checking mutability dynamically would also be cheap.

I ask because there are plenty of clever tricks you can put in the
runtime system that exploit the immutability of objects.  Is there a
deeper problem I am missing?

Thanks.
--Dan

-- 
| Dan Grossman     www.cs.cornell.edu/home/danieljg H:607 256 0724 |
| 5157 Upson Hall  danieljg@cs.cornell.edu          O:607 255 9834 |

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-26 19:35   ` Dan Grossman
@ 2000-05-29 13:13     ` Francois Pottier
  2000-05-29 23:45       ` Ken Wakita
  2000-05-30 17:25       ` Dan Grossman
  0 siblings, 2 replies; 12+ messages in thread
From: Francois Pottier @ 2000-05-29 13:13 UTC (permalink / raw)
  To: Dan Grossman; +Cc: caml-list

Dan,

> Why do you think the overhead would be high?

I was thinking of a recording mutability per-field, rather than
per-object. I agree that your proposal seems economical enough.

> there are plenty of clever tricks you can put in the runtime system
> that exploit the immutability of objects.

Would you care to elaborate? I am no compiler expert, but I'm not sure
why it would be so interesting to have this information at runtime,
where it is already too late to do code optimization.  That would allow
implementing the `pure' hash function proposed by Manuel Fähndrich, but
with a cost: the user would have to separate the mutable fields into a
sub-object, so as to allow the root object to be tagged as immutable.
Which other tricks do you have in mind?

-- 
François Pottier
Francois.Pottier@inria.fr
http://pauillac.inria.fr/~fpottier/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-29 13:13     ` Francois Pottier
@ 2000-05-29 23:45       ` Ken Wakita
  2000-05-30 17:25       ` Dan Grossman
  1 sibling, 0 replies; 12+ messages in thread
From: Ken Wakita @ 2000-05-29 23:45 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: danieljg, caml-list

In message (<20000529151357.24569@pauillac.inria.fr>)
from Francois Pottier <Francois.Pottier@inria.fr>,
talking about "Re: Alternative generic hash function",
on Mon, 29 May 2000 15:13:57 +0200

> I was thinking of a recording mutability per-field, rather than
> per-object. I agree that your proposal seems economical enough.
> 
> > there are plenty of clever tricks you can put in the runtime system
> > that exploit the immutability of objects.
> 
> Would you care to elaborate? I am no compiler expert, but I'm not sure
> why it would be so interesting to have this information at runtime,
> where it is already too late to do code optimization.

Mutability information is useful in distributed computing.  We have
been implementing a distributed Caml, which comes with distributed
shared memory.  To preserve assignment semantics over the network
(basically) we need to replace references to mutables with remote
pointers when they are remotely referenced.  In general, it is
impossible to tell remote/local reference at compile time.  Therefore
we need mutability information at runtime.

We modified the runtime representations for Caml mutable objects so
that the communication substrate can detect mutables (vectors and
mutable record fields).  For this purpose, our modifed compiler
assigns Mutable tag to vectors and arrays.  As for mutable record
field unboxed representation was replaced by boxed representation.
With the latter design, access to record fields can be slower but it
is negligible.

Ken Wakita

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-29 13:13     ` Francois Pottier
  2000-05-29 23:45       ` Ken Wakita
@ 2000-05-30 17:25       ` Dan Grossman
  1 sibling, 0 replies; 12+ messages in thread
From: Dan Grossman @ 2000-05-30 17:25 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: caml-list

Francois Pottier wrote:

> > there are plenty of clever tricks you can put in the runtime system
> > that exploit the immutability of objects.
> 
> Would you care to elaborate? I am no compiler expert, but I'm not sure
> why it would be so interesting to have this information at runtime,
> where it is already too late to do code optimization.  That would allow
> implementing the `pure' hash function proposed by Manuel Fähndrich, but
> with a cost: the user would have to separate the mutable fields into a
> sub-object, so as to allow the root object to be tagged as immutable.
> Which other tricks do you have in mind?

I think the tricks fall into two categories but none of the tricks may
apply to the current OCaml system.

The first category is optimizing code by dynamically choosing between 
a version that only works for immutable data and one that does not.  The
basic idea is that it is not "too late" so long as you have both
versions available.  The dynamic check is inexpensive, especially if it
is hoisted out of a loop.  Now with OCaml as your source language, I
believe having dynamic checks is worthless because the mutability of an
object is always determined for a program point.  That is, this
information is available statically.  But from a sufficiently low-level
viewpoint, we could consider this an artifact of the source language.

The second category is in the runtime system.  Here, most of the code
does not know whether the object it is manipulating is mutable.  How
could we use this information?  I could imagine having one nursery
(allocation area) for mutable and immutable objects, but then have the
minor collector segregate on mutability.  Segregation is particularly
useful in conjunction with "card marking" which the OCaml system does
not use.  But it could help the memory system performance of the OCaml
major collector as well.

Some more radical ideas I have been playing around with involve using
runtime value information to optimize data representation.  Knowing that
some values cannot change (due to the immutability bit) can enable the
run-time system to perform some transformations, perhaps during garbage
collection.

A final idea is that an incremental garbage collector could optimize its
treatment of immutable objects.  Assuming mutable objects are uncommon,
incremental collection costs could decrease significantly.

--Dan

-- 
| Dan Grossman     www.cs.cornell.edu/home/danieljg H:607 256 0724 |
| 5157 Upson Hall  danieljg@cs.cornell.edu          O:607 255 9834 |

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
  2000-05-29 15:00 Damien Doligez
@ 2000-05-30 18:04 ` Daniel Ortmann
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Ortmann @ 2000-05-30 18:04 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml-list

Damien Doligez <Damien.Doligez@inria.fr> writes:

> Some people are already complaining about the maximum size of arrays.  And
> it's true that 16 megabytes is quite small.

Microprocessor and ASIC engineering frequently needs to handle sets of data
larger than 20GB ... I've heard of some data requirements of 50GB.

And that is RAM, not just virtual memory.  Without the that much RAM the
programs thrash to death.

Perl hashes have been indispensable.

-- 
Daniel Ortmann, IBM Circuit Technology, Rochester, MN 55901-7829
ortmann@vnet.ibm.com or ortmann@us.ibm.com and 507.253.6795 (external)
ortmann@rchland.ibm.com and tieline 8.553.6795 (internal)
ortmann@isl.net and 507.288.7732 (home)

"The answers are so simple, and we all know where to look,
but it's easier just to avoid the question." -- Kansas




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Alternative generic hash function
@ 2000-05-29 15:00 Damien Doligez
  2000-05-30 18:04 ` Daniel Ortmann
  0 siblings, 1 reply; 12+ messages in thread
From: Damien Doligez @ 2000-05-29 15:00 UTC (permalink / raw)
  To: caml-list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

>From: Dan Grossman <danieljg@cs.cornell.edu>

>[Veuillez me pardonner; je ne donne pas une traduction en français parce

No need to apologize, really.


>* Recording this information would only require 1 header bit.  (I
>personally could live with 21 size bits.)  Getting this bit into the
>header word should not be hard.

Some people are already complaining about the maximum size of arrays.
And it's true that 16 megabytes is quite small.

-- Damien




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Alternative generic hash function
@ 2000-05-25 17:08 Manuel Fahndrich
  0 siblings, 0 replies; 12+ messages in thread
From: Manuel Fahndrich @ 2000-05-25 17:08 UTC (permalink / raw)
  To: 'dsl@tepkom.ru', caml-list

Thanks for all the answers. Yes, one clearly would need information
about mutable data fields at runtime in order to implement hash_pure.

Dmitri's suggestion on using block length and tag for computing the hash is
essentially the
0th approximation of hash_pure, since we know that the length and tag of a
block are immutable.

Noone has made any comments about the managing of explicit id's in order to
get around the problem I sketched. 
I've written a lot of code with such id generation and it is not an ideal
solution. First, there needs to be an id generator, which introduces a
global variable into your program. Second, what happens if you wrap around
in the id space (e.g. ints). I can speak from experience that wrap around
happens, namely when generating structures with id's repeatedly, where most
of them are garbage collected away. At any point, there are only few
structures live, but whenever we create a new one, we still need a fresh id.
In general, one would have to be able to reuse id's, and thus connect with
the garbage collector through weak pointers etc... It gets pretty
complicated.

My point is that the memory address of an object is an ideal id for equality
comparisons. The space of addresses is already managed by the language, such
that reusable id's can be generated cheaply. With copying garbage
collection, id's change over time, which is not a problem for equality
testing, but breaks hashing on them. The only reliable part of a
datastructure to use in computing a hash is the immutable part.

-Manuel

BTW, the same problem arises with the polymorphic comparison functions:

        Objective Caml version 3.00

# let x = ref 1;;
val x : int ref = {contents=1}
# let y = ref 5;;
val y : int ref = {contents=5}
# x < y;;
- : bool = true
# x := 10;;
- : unit = ()
# x < y;;
- : bool = false

Of course, using an analogous pure comparison operator would result in a
preorder, not a total order.

-----Original Message-----
From: Dmitri Lomov [mailto:dsl@tepkom.ru]
Sent: Thursday, May 25, 2000 12:38 AM
To: caml-list@pauillac.inria.fr
Cc: Manuel Fahndrich
Subject: Re: Alternative generic hash function

AFA I understand, Hastable.hash_pure will be hard to implement in run-time,
as now at run-time it is not known whether given data structure, or 
given data structure field is mutable or not.
Such information is not propagated to run-time by ocaml compiler.

Possibly your option here is to divide your data structure into mutable
and immutable parts, and than use Hashtable.hash on immutable part.
Then all will work well.

This, however, does not cover the example you present. 
But actually the nature of hashing assumes the hashed object has
some immutable part. For example, when hashing strings you do not assume 
to get the same hash value for mutable string if you change some character
of it.
In your example everything in an object changes - it does not have any 
immutable properties/elements/fields to "hook" hash to.

By the way, a rough hashing function for all O'Caml objects can be based
on block length and/or tag (they are accesible via Obj module).
Such hash function will tend to be very biased, due to biased
distribution of lengths and tags, but in very bad situations it 
may help.

I use such function in my (forthcoming :-)) dynamic code generation library 
for O'Caml to keep external references from code block. 

Regards,
Dmitri

Manuel Fahndrich wrote:
> 
> The generic hash function Hashtbl.hash traverses into mutable data
> structures. So I get different hash values depending on updates of the
data
> structure.
> 
> # let x = ref None;;
> val x : '_a option ref = {contents=None}
> # Hashtbl.hash x;;
> - : int = 0
> # x := Some 5;;
> - : unit = ()
> # Hashtbl.hash x;;
> - : int = 5
> 
> As was pointed out times before on this list, Hashtbl.hash is thus
> unsuitable for hashing on data structures that contain mutable data.
> 
> I'm wondering what the design rational is for this choice. I'd rather have
a
> generic hash function (Hashtbl.hash_pure) that never looks into mutable
> structures (skips refs and mutable fields of records and classes).
> The reason being that if I want to hash on data that contains references,
> I'm more likely to want to find the structure independent of the contents
of
> any mutable subparts.
> 
> So yes, a hashtable containing int refs would pointless, because it maps
> them all to the same bucket. But I frequently use hashing with equality
> being ==. Whenever I do this, I need to actually add some unique
identifier
> to the data structure just for the hashing operation. Having an alternate
> hash function (Hashtbl.hash_pure) would make such id's (and their
managment)
> unnecessary.
> 
> I see how the current working of Hashtbl.hash is useful for data
structures
> that contain mutable parts in the case where such structures don't change
> between the time they are stored and later looked up in the hash table.
But
> I'm wondering how often this actually arises.
> 
> Furthermore, using Hashtbl.hash_pure in place of Hashtbl.hash would not
> break any old code (except for performance), since all it does is make the
> equivalence classes bigger. On the other hand, using Hashtbl.hash with the
> misconception that it acts as Hashtbl.hash_pure breaks lots of code.
> 
> I rest my case.
> 
>         Manuel Fahndrich

_________________________________________________________________
Dmitri S. Lomov
mailto:dsl@tepkom.ru    ICQ#: 20524819 (Rusty)
http://users.tepkom.ru/dsl, http://young.tepkom.ru
ftp://young.tepkom.ru
+7 (812) 428-46-57 (b)   +7 (812) 295-94-15 (h)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2000-05-30 20:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-24 22:04 Alternative generic hash function Manuel Fahndrich
2000-05-25  7:00 ` Pierre Weis
2000-05-25  7:14 ` STARYNKEVITCH Basile
2000-05-25  7:38 ` Dmitri Lomov
2000-05-25  8:24 ` Francois Pottier
2000-05-26 19:35   ` Dan Grossman
2000-05-29 13:13     ` Francois Pottier
2000-05-29 23:45       ` Ken Wakita
2000-05-30 17:25       ` Dan Grossman
2000-05-25 17:08 Manuel Fahndrich
2000-05-29 15:00 Damien Doligez
2000-05-30 18:04 ` Daniel Ortmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).