caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: Damien Pous <Damien.Pous@inria.fr>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] hashing big ints
Date: Tue, 06 Dec 2011 09:52:34 +0100	[thread overview]
Message-ID: <874nxersj1.fsf@frosties.localnet> (raw)
In-Reply-To: <CAMy6byWvDnvJj-Onxq1CZU09mA0BuqgR9=9gapp4FyXA0vkj9A@mail.gmail.com> (Damien Pous's message of "Tue, 6 Dec 2011 08:22:39 +0100")

Damien Pous <Damien.Pous@inria.fr> writes:

> Dear caml-list,
>
> Let me re-raise an apparently old and possibly dumb question : how to
> hash big ints ?
>
> Hashtbl.hash doesn't work (ocaml 3.12.1):
>
> # Hashtbl.hash (Big_int.big_int_of_int 42);;
> - : int = 1
> # Hashtbl.hash (Big_int.big_int_of_int 33);;
> - : int = 1
>
> (Certainly because big ints hide their content in custom blocks.)
>
> I found several discussions about this kind of problem (comparison of
> nums, etc...), which were more focused on how to let
> Pervasive.compare/equal and Hashtbl.hash behave correctly in a uniform
> way.
>
> My question is simpler : I just need to write my own hashing function,
> so that I can call Hashtbl.Make. I currently use something like:
>
> let rec hash x =
>   try int_of_big_int x
>   with _ -> hash (shift_right_big_int x Sys.word_size)
>
> Some of you certainly know how to get a better behaved function. (I
> don't know anything about hashing theory, except that my function is
> certainly not a good one: it doesn't contain any prime number...)
>
> Thanks for your help,
> Damien

A usualy good hash function for integers is to take the integer modulo a
prime number. This usualy results in an uniform distribution. So:

let big_prime = (big_int_of_int prime)
let hash x = mod_big_int x big_prime

We also want the output range of the hash function to be as large as
possible to minimize the number of collision. That means we want the
prime to be as big as possible. And there we hit a little snag. For best
results you want a different prime number for 32bit and 64bit systems.
Lookup the closest prime number smaller than 2^32 and 2^64 and use
those.

Note: It is too bad there isn't a mod_big_int_with_int or even better a
Big_int.hash function directly. Maybe you could write one that takes
advantage of knowing the internal structure of Big_int?

MfG
        Goswin

  parent reply	other threads:[~2011-12-06  8:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-06  7:22 Damien Pous
2011-12-06  7:44 ` Jérémie Dimino
2011-12-06  8:52 ` Goswin von Brederlow [this message]
2011-12-06 15:18   ` Damien Pous

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874nxersj1.fsf@frosties.localnet \
    --to=goswin-v-b@web.de \
    --cc=Damien.Pous@inria.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).