From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p6SEIFde008293 for ; Thu, 28 Jul 2011 16:18:15 +0200 X-IronPort-AV: E=Sophos;i="4.67,282,1309730400"; d="scan'208";a="114336120" Received: from estephe.inria.fr (HELO [128.93.11.95]) ([128.93.11.95]) by mail1-relais-roc.national.inria.fr with ESMTP; 28 Jul 2011 16:18:09 +0200 Message-ID: <4E316FA2.7040108@inria.fr> Date: Thu, 28 Jul 2011 16:18:10 +0200 From: Xavier Leroy User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: caml-list@inria.fr References: <4E307DFE.3040502@gmail.com> <4E313071.7090309@inria.fr> <20110728142529.ksntzfow5wo84404@www.recherche.enac.fr> In-Reply-To: <20110728142529.ksntzfow5wo84404@www.recherche.enac.fr> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Caml-list] Hashtbl performance Fabrice Le Fessant : > Thus, you should consider using your own hash function, probably only > considering the ints in the list and its length. Then, use the functor > in the Hashtbl module. On 07/28/2011 02:25 PM, barnier@recherche.enac.fr wrote: > You could also use the hash_param : int -> int -> 'a -> int > function in the functor which allows to specify the number > of nodes traversed to compute the key : Both Fabrice's and Nicolas's suggestions are excellent. Let me just add that this problem with lists as hashtable keys is one of the known issues with OCaml's current generic hash function: the other is that the mixing functions used are simplistic and exhibit some statistical bias, even for strings. The SVN trunk for OCaml contains a complete reimplementation of the generic hash function that addresses both issues: lists and other complex keys are traversed breadth-first in a more cautious manner than before, and the mixing functions are based on MurmurHash 3, which exhibits very good statistical properties. All this should go in the next major release 3.13. So, everyone with an interest in efficient hash tables is welcome to try the trunk sources and let me know of any problems encountered. - Xavier Leroy