From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 5466BBBAF for ; Sat, 20 Feb 2010 13:39:12 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoBADtmf0vRVdzgkGdsb2JhbACDBY4kghOHRwgVAQEBAQkJDAcTAx+rQjKBVYRHiHIBAQMFB4ElglNkBA X-IronPort-AV: E=Sophos;i="4.49,508,1262559600"; d="scan'208";a="57412999" Received: from mail-fx0-f224.google.com ([209.85.220.224]) by mail4-smtp-sop.national.inria.fr with ESMTP; 20 Feb 2010 13:39:11 +0100 Received: by fxm24 with SMTP id 24so1021759fxm.17 for ; Sat, 20 Feb 2010 04:39:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=mYEl/j0+M0Gpi33IcRzqebZWaoKyCFbRVFjqogJe5J8=; b=JyXBtV/+79Y9luswDrY4rE/gQ1qDoXf8gr/8RTbzikwQ6bBV4VN/sTvrTrbjMYNOjc 5nrs/ZZDCqzI5ENzPRW+g84VjXDzHe5HIFgolTZC76VLUzMGq9CdGzn0vt1Bu1INil3+ EDzjqgijczI3I4RpFl2Q3Sw8yX6+1L7nk9SWc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=e+sXpQ/LEGBahWJ4GfHG76dbJHCVR9wIWbJfGltkhYzLuxwLExFP5MNQB1LnXQbK5X xrLeofhiXvqB9IlZzZg1EHtnGj0W34Vre8ddkC/VjVbAgmn/PFNTK7zbFISFhShSkpCw Jc+GIdoT7BH4djVC748pC4CT3qmPz7iE7P6fA= Received: by 10.103.85.28 with SMTP id n28mr1907545mul.121.1266669551105; Sat, 20 Feb 2010 04:39:11 -0800 (PST) Received: from lemon (144-66-133-95.pool.ukrtel.net [95.133.66.144]) by mx.google.com with ESMTPS id e10sm5552778muf.53.2010.02.20.04.39.09 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 20 Feb 2010 04:39:10 -0800 (PST) Date: Sat, 20 Feb 2010 14:39:04 +0200 From: ygrek To: caml-list Subject: Re: [Caml-list] range of hash function Message-Id: <20100220143904.dcffbf61.ygrekheretix@gmail.com> In-Reply-To: <87fx4w6uy2.fsf@frosties.localdomain> References: <1ae8fe881002200054s2e9752d8x620695c3c4e27939@mail.gmail.com> <87fx4w6uy2.fsf@frosties.localdomain> X-Mailer: Sylpheed 2.5.0 (GTK+ 2.12.12; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam: no; 0.00; hash:01 hash:01 hashes:01 hashtbl:01 hashtbl:01 uniformly:01 2009:98 wrote:01 caml-list:01 unsafe:01 probability:01 int:01 int:01 writes:01 strings:01 On Sat, 20 Feb 2010 11:52:05 +0100 Goswin von Brederlow wrote: > Grégoire Seux writes: > > > hello ! > > > > i would like to use the polymorhpic hash function on strings. But i would like > > to know what is the probability of a collision between two hashes.  > > > > my first question is about the range of the Hashtbl.hash function: what is its > > range ? ( string -> [1..N] ?) > > It is range [min_int..max_int]. Meaning 31 or 63 bits. > > > the second question is : can i assume that the result is a uniform distribution > > over [1..N] ? (for 10? words which is an estimation of the english vocabulary > > size) The best way is to test yourself. Or rely on someone else's tests :) http://alaska-kamtchatka.blogspot.com/2009/06/hash-n-mix.html > You can read the source, analyze the math and try to figure out a > collision. From past experience every widely used hash function so far > has been shown to be vulnerable. Like md5 is now considered compromised > due to the ease of creating a collision. Sha1 is deemed unsafe too. It > is just a matter of how much brainpower and cpu time you want to throw > at it. Or pure bad luck. Hashtbl.hash function is not required to be cryptographically strong, but rather uniformly random on "usual" input data and fast. -- ygrek http://ygrek.org.ua