caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Hashtbl.hash and Hashtbl.hash_param
@ 2002-08-23 15:24 sebastien FURIC
  2002-08-23 15:45 ` sebastien FURIC
  2002-08-27  8:24 ` Xavier Leroy
  0 siblings, 2 replies; 7+ messages in thread
From: sebastien FURIC @ 2002-08-23 15:24 UTC (permalink / raw)
  To: OCaml Mailing list


 What kind of algorithm is used to compute the hash code of objects in
O'Caml ?

 Hashtbl.hash (List.map (fun x -> Random.int 100)
[1;2;3;4;5;6;7;8;9;10]);;
 always returns 0 (Hashtbl.hash_param has the same properties) which is
a poor result !

 What can I do if I don't know the size of the list in advance ?

 Cheers,

 Sebastien.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param
  2002-08-23 15:24 [Caml-list] Hashtbl.hash and Hashtbl.hash_param sebastien FURIC
@ 2002-08-23 15:45 ` sebastien FURIC
  2002-08-23 16:27   ` Florian Douetteau
  2002-08-27  8:24 ` Xavier Leroy
  1 sibling, 1 reply; 7+ messages in thread
From: sebastien FURIC @ 2002-08-23 15:45 UTC (permalink / raw)
  To: OCaml Mailing list



sebastien FURIC a écrit :
> 
>  What kind of algorithm is used to compute the hash code of objects in
> O'Caml ?
> 
>  Hashtbl.hash (List.map (fun x -> Random.int 100)
> [1;2;3;4;5;6;7;8;9;10]);;
>  always returns 0 (Hashtbl.hash_param has the same properties) which is
> a poor result !
> 
>  What can I do if I don't know the size of the list in advance ?

 Of course, I know I could use functors to specify my own hash function.
 While prototyping I just would prefer an acceptable Hashtbl.hash
function to avoid writing "boring" code.

 Sebastien.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param
  2002-08-23 15:45 ` sebastien FURIC
@ 2002-08-23 16:27   ` Florian Douetteau
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Douetteau @ 2002-08-23 16:27 UTC (permalink / raw)
  To: sebastien FURIC; +Cc: OCaml Mailing list

On Fri, 23 Aug 2002, sebastien FURIC wrote:

After a quick inspection in the source byterun/hash.c,
it appears that hash_param C implementation
decrements a C static variable 'counter' whenever an object
is considered and  returns 0 whenever this counter reach 0
(that's a way to handle cycles).

As a consequence (..details omitted..), in the case of the list type,
if the length of your list
is greater that the initial value of the counter,
hash will always
return 0.

A quick fix is to use:
hash_param <number_bigger_than_the_length_of_any_list> 100
instead of
hash (== hash_param 10 100)

By the way, beware that hash_param is not thread-safe,
since it uses a C global variable.


--
Florian Douetteau



>
>
> sebastien FURIC a écrit :
> >
> >  What kind of algorithm is used to compute the hash code of objects in
> > O'Caml ?
> >
> >  Hashtbl.hash (List.map (fun x -> Random.int 100)
> > [1;2;3;4;5;6;7;8;9;10]);;
> >  always returns 0 (Hashtbl.hash_param has the same properties) which is
> > a poor result !
> >
> >  What can I do if I don't know the size of the list in advance ?
>
>  Of course, I know I could use functors to specify my own hash function.
>  While prototyping I just would prefer an acceptable Hashtbl.hash
> function to avoid writing "boring" code.
>
>  Sebastien.
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param
  2002-08-23 15:24 [Caml-list] Hashtbl.hash and Hashtbl.hash_param sebastien FURIC
  2002-08-23 15:45 ` sebastien FURIC
@ 2002-08-27  8:24 ` Xavier Leroy
  2002-08-27  9:59   ` jeanmarc.eber
  2002-08-27 16:12   ` Blair Zajac
  1 sibling, 2 replies; 7+ messages in thread
From: Xavier Leroy @ 2002-08-27  8:24 UTC (permalink / raw)
  To: sebastien FURIC; +Cc: caml-list

>  What kind of algorithm is used to compute the hash code of objects in
> O'Caml ?
> 
>  Hashtbl.hash (List.map (fun x -> Random.int 100)
> [1;2;3;4;5;6;7;8;9;10]);;
>  always returns 0 (Hashtbl.hash_param has the same properties) which is
> a poor result !

Yes, this is disappointing.  To understand what's going on, here is
how "Hashtbl.hash_param v count limit" works:

- v is traversed, depth-first.
- "Interesting" information found at each node is hashed, e.g.
      string node -> hash code of string
      integer     -> the integer itself
      constructor block -> integer tag of constructor
  (Some nodes have no interesting information, e.g. certain custom blocks.)  
- The hash values for each node are combined with a simple linear congruence.

Moreover, to prevent infinite descent in cyclic values, and ensure
that hashing doesn't take too long, the traversal is stopped when either
- "count" interesting nodes were found, or
- "limit" nodes (interesting or not) were traversed.

Now, for your example [1;2;3;4;5;6;7;8;9;10], the interesting nodes
and their associated hash values are
- the integers 1 to 10, with same hash values;
- and the 10 occurrences of the "::" constructor, which correspond to
  0-tagged blocks, with hash values 0.

The fly in the ointment is that the traversal is done right-to-left,
hence the hash values of interest are encountered in the following order:

  0 ......... 0 10 9 8 7 6 5 4 3 2 1

  ------------- --------------------
  the :: cells    the list contents

Hence, with count = 10, the traversal stops at the cons cells, and
doesn't even look at the list contents!  Result: a 0 hash value.

There are several ways to remedy this behavior, such as ignoring
zero-tagged blocks, or doing breadth-first traversal.

However, we need to think twice before changing the hashing function,
because this would cause trouble to users that store hashtables in
files using output_value/input_value: if the hash function changes
before writing and reading, the hashtable read becomes unusable.

Hence, a request for OCaml users: if you use hashtables whose keys are
structured data (not just strings or integers), *and* your program
stores hashtables to files, *and* it's important for you that these
persistent hashtables can be read back with future versions of OCaml,
then please drop me a line.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param
  2002-08-27  8:24 ` Xavier Leroy
@ 2002-08-27  9:59   ` jeanmarc.eber
  2002-08-27 10:58     ` Alain Frisch
  2002-08-27 16:12   ` Blair Zajac
  1 sibling, 1 reply; 7+ messages in thread
From: jeanmarc.eber @ 2002-08-27  9:59 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: sebastien FURIC, caml-list

Quoting Xavier Leroy <xavier.leroy@inria.fr>:

> 
> However, we need to think twice before changing the hashing function,
> because this would cause trouble to users that store hashtables in
> files using output_value/input_value: if the hash function changes
> before writing and reading, the hashtable read becomes unusable.
> 
> Hence, a request for OCaml users: if you use hashtables whose keys are
> structured data (not just strings or integers), *and* your program
> stores hashtables to files, *and* it's important for you that these
> persistent hashtables can be read back with future versions of OCaml,
> then please drop me a line.
> 

That is an important point that should, I think, at least be clearly said.
Personally, I always thought that values written with output_value (more
generally marshaled ocaml values) were only guaranteed to be compatible for a
given version of ocaml. I never considered output_value as a "long term" storing
solution, but only a "short term" one (good example: *.cmo ans *.cmi files
generated by the ocaml compiler), not to speak about calculated hash values...

Personally, I *want* the ocaml team to be able to change internal
representation, optimize hash functions etc in the hope that this produces an
even better system! (BTW, I may be wrong, but didn’t some tags change between
3.04 and 3.05, but maybe this didn’t change marshaled values ?)

More generally, the concept and importance of 100% backward compatibility
should be discussed. I can not hope for 100% backward compatibility and hope
for big progresses on the ocaml compiler... no ? If people really want 100%
compatibilty, they should stay with an ocaml version.

Conclusion: personally, I don’t want progress of the compiler made difficult by
a 100% backward compatibility "religion". What do other users of ocaml think
about it? (I agree that this is of course a question that is as old as the
existence of computer languages: its more a question about what stage of
development we think ocaml has reached now)

Jean-Marc Eber
LexiFi
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param
  2002-08-27  9:59   ` jeanmarc.eber
@ 2002-08-27 10:58     ` Alain Frisch
  0 siblings, 0 replies; 7+ messages in thread
From: Alain Frisch @ 2002-08-27 10:58 UTC (permalink / raw)
  To: jeanmarc.eber; +Cc: Caml list

On Tue, 27 Aug 2002 jeanmarc.eber@lexifi.com wrote:

> Conclusion: personally, I don’t want progress of the compiler made difficult by
> a 100% backward compatibility "religion". What do other users of ocaml think
> about it?

I strongly agree with you, especially concerning the situation at issue
(improving Hashtbl.hash).

You're right that it is explicitely stated in the manual, for Marshal:
<<
The format for the byte sequences is compatible across all machines for
a given version of Objective Caml.
>>


-- Alain

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param
  2002-08-27  8:24 ` Xavier Leroy
  2002-08-27  9:59   ` jeanmarc.eber
@ 2002-08-27 16:12   ` Blair Zajac
  1 sibling, 0 replies; 7+ messages in thread
From: Blair Zajac @ 2002-08-27 16:12 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: sebastien FURIC, caml-list

Xavier Leroy wrote:
> 
> However, we need to think twice before changing the hashing function,
> because this would cause trouble to users that store hashtables in
> files using output_value/input_value: if the hash function changes
> before writing and reading, the hashtable read becomes unusable.
> 
> Hence, a request for OCaml users: if you use hashtables whose keys are
> structured data (not just strings or integers), *and* your program
> stores hashtables to files, *and* it's important for you that these
> persistent hashtables can be read back with future versions of OCaml,
> then please drop me a line.

I agree with the sentiment that OCaml should not be held up by backwards
compatibility.  But I think adding a version header to the files that
indicates the particular hashing function used would allow newer OCaml's
to read older files.  The newer OCaml's would only write the new version
of the files.  This would require that OCaml binary keep the old hash
functions around, but this could be made a configure option to enable
this feature.

I'm reminded of Perl's Storable module which serves much the same function
and had a method to read old Storable files but only write new versioned
files.

Best,
Blair

-- 
Blair Zajac <blair@orcaware.com>
Web and OS performance plots - http://www.orcaware.com/orca/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-08-27 16:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-23 15:24 [Caml-list] Hashtbl.hash and Hashtbl.hash_param sebastien FURIC
2002-08-23 15:45 ` sebastien FURIC
2002-08-23 16:27   ` Florian Douetteau
2002-08-27  8:24 ` Xavier Leroy
2002-08-27  9:59   ` jeanmarc.eber
2002-08-27 10:58     ` Alain Frisch
2002-08-27 16:12   ` Blair Zajac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).