Dear OCaml users, to close this topic, here are some statistics for my use case with different hash functions. All hash methods I tried lead to identical statistics as random numbers. Simply using Hashtbl.hash on lists of hashes of substructures (Method 2 below) worked very well, is easy to write down and offers sufficient flexibility. NItems = 3277598 NBuckets = 16777216 | Theoretical | Method1 | Method2 | Method3 --------+-------------+----------+----------+--------- Empty | 13799905.0 | 13800062 | 13799864 | 13801452 Single | 2695950.5 | 2695830 | 2696131 | 2693130 2 items | 263340.5 | 263101 | 263042 | 264318 3 items | 17148.7 | 17350 | 17368 | 17463 4 items | 837.5 | 849 | 779 | 822 5 items | 32.7 | 24 | 29 | 31 6 items | 1.1 | 0 | 3 | 3 --------+-------------+----------+----------+--------- Runtime | | 119s | 122s | 110s Note 1: the expected random deviation of the counts is sqrt(count). Note 2: The speed differences between Method 1 and 2 are random – Method 2 should be faster. Note 3: using Hashtbl.hash on the complete structure doesn't work for me for two reasons: - parts of the structure should be ignored - I have maps as substructures and require that logically equal maps result in the same hash Theoretical: collision numbers for a random process with same parameters (used exact Bernoulli statistics, not Poisson approximation) Method1: Hashtbl.hash used on structures recursively (create list of hash values for substructures and use Hashtbl.hash on list). Maps (<10 elements) are converted to lists and the list is hashed with Hashtbl.hash. Multiply each hash list entry with a distinct prime. Use prime values for variants and prime multipliers for variant arguments. Method2: Hashtbl.hash used on structures recursively (create list of hash values for substructures and use Hashtbl.hash on list). Maps (<10 elements) are converted to lists and the list is hashed with Hashtbl.hash. Use prime values for variants and prime multipliers for variant arguments. Method3: ( ( ( (v1 * p1 + v2) * p2) + v3 ) *p3 + v4) * p4 where vi are values and pi are distinct prime numbers. For lists I use 4 primes alternatingly and 4 different primes for empty cases Btw.: I added a function “worst_bucketlist” to Hashtbl, which returns the keys of the largest bucket. This is useful for optimizing (or bug fixing) hash and equality functions – try to make a feature request for it. Best regards, Michael Intel Deutschland GmbH Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Christian Lamprechter Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928