caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Hashtbl performance
Date: Thu, 28 Jul 2011 16:18:10 +0200	[thread overview]
Message-ID: <4E316FA2.7040108@inria.fr> (raw)
In-Reply-To: <20110728142529.ksntzfow5wo84404@www.recherche.enac.fr>

Fabrice Le Fessant <Fabrice.Le_fessant@inria.fr>:

> Thus, you should consider using your own hash function, probably only
> considering the ints in the list and its length. Then, use the functor
> in the Hashtbl module.

On 07/28/2011 02:25 PM, barnier@recherche.enac.fr wrote:

> You could also use the hash_param : int -> int -> 'a -> int
> function in the functor which allows to specify the number
> of nodes traversed to compute the key :

Both Fabrice's and Nicolas's suggestions are excellent.

Let me just add that this problem with lists as hashtable keys is one
of the known issues with OCaml's current generic hash function: the
other is that the mixing functions used are simplistic and exhibit
some statistical bias, even for strings.

The SVN trunk for OCaml contains a complete reimplementation of the
generic hash function that addresses both issues: lists and other
complex keys are traversed breadth-first in a more cautious manner
than before, and the mixing functions are based on MurmurHash 3, which
exhibits very good statistical properties.  All this should go in the
next major release 3.13.  So, everyone with an interest in efficient
hash tables is welcome to try the trunk sources and let me know of any
problems encountered.

- Xavier Leroy

  reply	other threads:[~2011-07-28 14:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-27 21:07 Toby Kelsey
2011-07-28  9:48 ` Fabrice Le Fessant
2011-07-28 12:25   ` barnier
2011-07-28 14:18     ` Xavier Leroy [this message]
2011-07-28 16:29       ` Toby Kelsey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E316FA2.7040108@inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).