From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id A5B45BC6B for ; Fri, 11 Jan 2008 17:14:11 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4HABclh0dDWxLC/2dsb2JhbACBVqkN X-IronPort-AV: E=Sophos;i="4.24,272,1196636400"; d="scan'208";a="21134929" Received: from ip67-91-18-194.z18-91-67.customer.algx.net (HELO server1.bertec.net) ([67.91.18.194]) by mail4-smtp-sop.national.inria.fr with ESMTP; 11 Jan 2008 17:14:05 +0100 Received: from kuba.bertec.net (kuba.bertec.net [192.168.2.16]) by server1.bertec.net (Postfix) with ESMTP id A224C105830 for ; Fri, 11 Jan 2008 11:14:04 -0500 (EST) From: Kuba Ober To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Hash clash in polymorphic variants Date: Fri, 11 Jan 2008 11:14:03 -0500 User-Agent: KMail/1.9.6 (enterprise 0.20071123.740460) References: <200801101709.14110.jon@ffconsultancy.com> <200801110830.29988.ober.14@osu.edu> <200801111348.15452.jon@ffconsultancy.com> In-Reply-To: <200801111348.15452.jon@ffconsultancy.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801111114.03876.ober.14@osu.edu> X-Spam: no; 0.00; hash:01 variants:01 non-issue:01 show-stopper:01 hash:01 hashes:01 computed:01 runtime:01 hashing:01 amortized:01 hashing:01 bytecode:01 toplevel:01 polymorphic:01 wrote:01 On Friday 11 January 2008, Jon Harrop wrote: > On Friday 11 January 2008 13:30:29 Kuba Ober wrote: > > Are those collisions of any real importance? I mean, do they break > > anything? If all they do is imply linearly searching a list of a few > > elements, for the colliding entry, then it's a non-issue? > > It would prevent code from compiling so it would be a complete > show-stopper. So what you're saying is that the implementation uses the hash with bucket size of 1? That's kinda poor decision, methinks. Maybe perfect hashes should be used, computed at link time (and at runtime whenever a module is linked in). The pefect hashing function could probably implement some sort of a table, so that no real code would need to be generated, just recomputing of decision tree table. Gperf code could be adapted for that. The benefit is that there would be no collisions, the hashed data structure would be very compact, and the cost to regenerate the hash is amortized. Ideally, one would generate the actual perfect hashing function, but this is currently only possible in bytecode, right? I mean, toplevel won't run in native code? Or am I mistaken? Kuba