caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <xavier.leroy@inria.fr>
To: sebastien FURIC <sebastien.furic@tni-valiosys.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Hashtbl.hash and Hashtbl.hash_param
Date: Tue, 27 Aug 2002 10:24:35 +0200	[thread overview]
Message-ID: <20020827102435.A17823@pauillac.inria.fr> (raw)
In-Reply-To: <3D6653C0.F895EC59@tni.fr>; from sebastien.furic@tni-valiosys.com on Fri, Aug 23, 2002 at 05:24:48PM +0200

>  What kind of algorithm is used to compute the hash code of objects in
> O'Caml ?
> 
>  Hashtbl.hash (List.map (fun x -> Random.int 100)
> [1;2;3;4;5;6;7;8;9;10]);;
>  always returns 0 (Hashtbl.hash_param has the same properties) which is
> a poor result !

Yes, this is disappointing.  To understand what's going on, here is
how "Hashtbl.hash_param v count limit" works:

- v is traversed, depth-first.
- "Interesting" information found at each node is hashed, e.g.
      string node -> hash code of string
      integer     -> the integer itself
      constructor block -> integer tag of constructor
  (Some nodes have no interesting information, e.g. certain custom blocks.)  
- The hash values for each node are combined with a simple linear congruence.

Moreover, to prevent infinite descent in cyclic values, and ensure
that hashing doesn't take too long, the traversal is stopped when either
- "count" interesting nodes were found, or
- "limit" nodes (interesting or not) were traversed.

Now, for your example [1;2;3;4;5;6;7;8;9;10], the interesting nodes
and their associated hash values are
- the integers 1 to 10, with same hash values;
- and the 10 occurrences of the "::" constructor, which correspond to
  0-tagged blocks, with hash values 0.

The fly in the ointment is that the traversal is done right-to-left,
hence the hash values of interest are encountered in the following order:

  0 ......... 0 10 9 8 7 6 5 4 3 2 1

  ------------- --------------------
  the :: cells    the list contents

Hence, with count = 10, the traversal stops at the cons cells, and
doesn't even look at the list contents!  Result: a 0 hash value.

There are several ways to remedy this behavior, such as ignoring
zero-tagged blocks, or doing breadth-first traversal.

However, we need to think twice before changing the hashing function,
because this would cause trouble to users that store hashtables in
files using output_value/input_value: if the hash function changes
before writing and reading, the hashtable read becomes unusable.

Hence, a request for OCaml users: if you use hashtables whose keys are
structured data (not just strings or integers), *and* your program
stores hashtables to files, *and* it's important for you that these
persistent hashtables can be read back with future versions of OCaml,
then please drop me a line.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2002-08-27  8:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-23 15:24 sebastien FURIC
2002-08-23 15:45 ` sebastien FURIC
2002-08-23 16:27   ` Florian Douetteau
2002-08-27  8:24 ` Xavier Leroy [this message]
2002-08-27  9:59   ` jeanmarc.eber
2002-08-27 10:58     ` Alain Frisch
2002-08-27 16:12   ` Blair Zajac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020827102435.A17823@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=sebastien.furic@tni-valiosys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).