caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: Grgoire Seux <kamaradclimber@gmail.com>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] range of hash function
Date: Sat, 20 Feb 2010 11:52:05 +0100	[thread overview]
Message-ID: <87fx4w6uy2.fsf@frosties.localdomain> (raw)
In-Reply-To: <1ae8fe881002200054s2e9752d8x620695c3c4e27939@mail.gmail.com> ("Grgoire Seux"'s message of "Sat, 20 Feb 2010 14:24:23 +0530")

Grégoire Seux <kamaradclimber@gmail.com> writes:

> hello !
>
> i would like to use the polymorhpic hash function on strings. But i would like
> to know what is the probability of a collision between two hashes. 
>
> my first question is about the range of the Hashtbl.hash function: what is its
> range ? ( string -> [1..N] ?)

It is range [min_int..max_int]. Meaning 31 or 63 bits.

> the second question is : can i assume that the result is a uniform distribution
> over [1..N] ? (for 10? words which is an estimation of the english vocabulary
> size)

If the function is any good it will be reasonably uniform. And if it is
crap then I bet someone have replaced it already.

> the third one is : is it possible to predict which will be the collision ? I
> mean collisions are between words which are very 'similar' (for ex: "boy" and
> "boys") or are completely unpredictable.

You can read the source, analyze the math and try to figure out a
collision. From past experience every widely used hash function so far
has been shown to be vulnerable. Like md5 is now considered compromised
due to the ease of creating a collision. Sha1 is deemed unsafe too. It
is just a matter of how much brainpower and cpu time you want to throw
at it. Or pure bad luck.

Is the Hashtbl.hash the same function used to hash polymorphic variant
types? If so then there are collisions between words which are verry
similar. I knew of an example once but google isn't verry helpfull
finding it again.

MfG
        Goswin


  reply	other threads:[~2010-02-20 10:52 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-20  8:54 Grégoire Seux
2010-02-20 10:52 ` Goswin von Brederlow [this message]
2010-02-20 12:39   ` [Caml-list] " ygrek
2010-02-20 13:54 ` Jacques Garrigue
2010-02-21 23:10   ` Richard Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fx4w6uy2.fsf@frosties.localdomain \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    --cc=kamaradclimber@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).