From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id B2EDF7FA5C for ; Fri, 13 May 2016 18:17:55 +0200 (CEST) IronPort-PHdr: 9a23:izDQXBCOxL+62bEmNafDUyQJP3N1i/DPJgcQr6AfoPdwSP/zpMbcNUDSrc9gkEXOFd2CrakU2qyM6uuxBCQp2tWojjMrSNR0TRgLiMEbzUQLIfWuLgnFFsPsdDEwB89YVVVorDmROElRH9viNRWJ+iXhpQAbFhi3DwdpPOO9QteU1JTmkbrrsMyOKyxzxxODIppKZC2sqgvQssREyaBDEY0WjiXzn31TZu5NznlpL1/A1zz158O34YIxu38I46Fp34d6XK77Z6U1S6BDRHRjajhtpZ6jiR6WZA+G531UfH8XiRFIS1zM6Bj7WNH/qCrhvepV3CSKPMP3C7szXGLmp59qRQXyhW8sNzc8+mjNloQklKNWugis4Rd/yoveaZuJHP11d6bZZckdA2FGW5AVH2ZKC4a4Ko8OFPYpPOBCroC7qUFE5U+1DAyoQefu0SNgh3ns3KR83f53Qi/c2wl1VekJvXvIttLtcO83UOu1xaTMh32XavJd2T7w7M7TdR0uveuLRZpxd9bczQ8kEAaT3QbYkpDsIz7AjrdFiGOc9ec1ELv302M= Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=michael.soegtrop@intel.com; spf=Pass smtp.mailfrom=michael.soegtrop@intel.com; spf=None smtp.helo=postmaster@mga09.intel.com Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of michael.soegtrop@intel.com) identity=pra; client-ip=134.134.136.24; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="michael.soegtrop@intel.com"; x-sender="michael.soegtrop@intel.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of michael.soegtrop@intel.com designates 134.134.136.24 as permitted sender) identity=mailfrom; client-ip=134.134.136.24; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="michael.soegtrop@intel.com"; x-sender="michael.soegtrop@intel.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mga09.intel.com) identity=helo; client-ip=134.134.136.24; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="michael.soegtrop@intel.com"; x-sender="postmaster@mga09.intel.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0A2AAAf/TVXkBiIhoZbA4QMfga5UAENgXYahXcCgSw4FAEBAQEBAQEBEQEBAQEHDQkJIS+CLYIWAQEEOg9AAgEIIhQQMiUBAQQBGogmAcEzAQEBBwEBAQEBAQEghiaDSYEDhEEfJoJkgi4FmCIFAZ02j0EeAQGCRxIKgUtuh1sBfgEBAQ X-IPAS-Result: A0A2AAAf/TVXkBiIhoZbA4QMfga5UAENgXYahXcCgSw4FAEBAQEBAQEBEQEBAQEHDQkJIS+CLYIWAQEEOg9AAgEIIhQQMiUBAQQBGogmAcEzAQEBBwEBAQEBAQEghiaDSYEDhEEfJoJkgi4FmCIFAZ02j0EeAQGCRxIKgUtuh1sBfgEBAQ X-IronPort-AV: E=Sophos;i="5.24,614,1454972400"; d="scan'208";a="218213074" Received: from mga09.intel.com ([134.134.136.24]) by mail2-smtp-roc.national.inria.fr with ESMTP; 13 May 2016 18:17:53 +0200 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP; 13 May 2016 09:17:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,614,1455004800"; d="scan'208";a="979891705" Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3]) by fmsmga002.fm.intel.com with ESMTP; 13 May 2016 09:17:50 -0700 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.35]) by IRSMSX108.ger.corp.intel.com ([169.254.11.241]) with mapi id 14.03.0248.002; Fri, 13 May 2016 17:17:48 +0100 From: "Soegtrop, Michael" To: Alain Frisch , Pierre Chambart , "caml-list@inria.fr" Thread-Topic: [Caml-list] Specify the default hash function for a type Thread-Index: AdGs7HCGm9QdbZDXQnOXFTEowCMUzwAAdzOAAAnxa4AAADMcAAAFt8eA Date: Fri, 13 May 2016 16:17:47 +0000 Message-ID: <0F7D3B1B3C4B894D824F5B822E3E5A172CEF2D2F@IRSMSX102.ger.corp.intel.com> References: <0F7D3B1B3C4B894D824F5B822E3E5A172CEF1630@IRSMSX102.ger.corp.intel.com> <850765ff-25f1-ecb9-9707-74d3dc47675c@lexifi.com> <5735DBB9.5050101@ocamlpro.com> <44df5d5e-8ad9-1a86-7fda-bfc203bfb479@lexifi.com> In-Reply-To: <44df5d5e-8ad9-1a86-7fda-bfc203bfb479@lexifi.com> Accept-Language: de-DE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: RE: [Caml-list] Specify the default hash function for a type Dear Ocaml Users, a possibly interesting data point in the discussion on polymorphic hashes: I replaced the polymorphic hash function for a record, which consists of sm= allish integers, bools, lists of integers and short strings, with my own ha= sh function which looks like: ( ( ( (v1 * p1 + v2) * p2) + v3 ) *p3 + v4) * p4=20=20=20=20 where vi are values and pi are distinct prime numbers. For lists I use 4 pr= imes alternatingly and 4 different primes for empty cases (implemented as 4= mutual recursive functions). The primes are random primes between 2^28 and= 2^29, avoiding those primes where larger powers of 2 are close to 0 modulo= the prime. All arithmetic is done with plain ocaml ints and corresponding = overflows. For strings I used the polymorphic hash function. The hash histogram for this hash function looks like this: num_bindings=3D1803454 num_buckets=3D16777216 max_bucket_length=3D24=20 (the histogram is "number of items in bucket=3Dcount of buckets with this n= umber of items") 0=3D15356268 (empty buckets) 1=3D1185329 (non-colliding buckets) 2=3D135231 3=3D80194 4=3D11344 5=3D2367 6=3D2836 7=3D2630 8=3D244 9=3D49 10=3D20 11=3D25 12=3D64 13=3D39=20 14=3D100=20 15=3D75=20 16=3D33=20 17=3D5=20 18=3D149=20 19=3D15=20 20=3D3=20 21=3D180=20 22=3D14=20 24=3D2 This is roughly in line with random numbers: fill rate =3D 1803454/16777216 =3D 10.7% 2 collisions =3D 0.8% ~ fill rate ^2 3 collisions =3D 0.48% ~ 4 * fill rate ^3 4 collisions =3D 0.068% ~ 5 * fill rate ^4 The hash histogram for the same data and the polymorphic hash function look= s like this: num_bindings=3D1803454 num_buckets=3D16777216 max_bucket_length=3D3343 0=3D16730926 1=3D5520 2=3D3201 3=3D2779 4=3D1633 5=3D1079 6=3D3701 7=3D672 8=3D828 9=3D1584 10=3D600 11=3D384 12=3D2774 13=3D404 14=3D525 15=3D500 16=3D435 17=3D358 18=3D1406 19=3D244 20=3D504 21=3D427 22=3D316 23=3D369 24=3D837 25=3D153 26=3D250 27=3D417 28=3D191 29=3D222 30=3D501 31=3D76 32=3D196 33=3D530 34=3D142 35=3D153 36=3D859 37=3D88 38=3D178 39=3D310 40=3D147 41=3D173 42=3D313 43=3D102 44=3D235 45=3D123 46=3D149 47=3D152 48=3D244 49=3D107 50=3D173 : : 1001=3D1 1002=3D1 1005=3D1 1006=3D3 1009=3D1 1010=3D1 1013=3D1 1014=3D2 1019=3D1 1020=3D1 1029=3D1 1031=3D1 1034=3D1 1035=3D1 1036=3D1 1064=3D1 1082=3D1 1184=3D1 1191=3D1 1196=3D1 1197=3D1 1200=3D1 1206=3D1 1207=3D1 1219=3D1 1225=3D1 1228=3D1 1232=3D1 1566=3D1 1581=3D1 1636=3D1 1698=3D1 1720=3D2 1737=3D2 1740=3D1 1744=3D1 1752=3D1 1762=3D1 1789=3D1 2255=3D1 2271=3D1 2287=3D1 2319=3D1 2332=3D1 2345=3D1 3296=3D1 3325=3D1 3343=3D1 So there are many buckets with 1000s of collisions and only a few 1000 coll= ision free buckets. It might make sense to come up with a better polymorphic hashing function. Best regards, Michael Intel Deutschland GmbH Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Christian Lamprechter Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928