caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] hashing big ints
@ 2011-12-06  7:22 Damien Pous
  2011-12-06  7:44 ` Jérémie Dimino
  2011-12-06  8:52 ` Goswin von Brederlow
  0 siblings, 2 replies; 4+ messages in thread
From: Damien Pous @ 2011-12-06  7:22 UTC (permalink / raw)
  To: caml-list

Dear caml-list,

Let me re-raise an apparently old and possibly dumb question : how to
hash big ints ?

Hashtbl.hash doesn't work (ocaml 3.12.1):

# Hashtbl.hash (Big_int.big_int_of_int 42);;
- : int = 1
# Hashtbl.hash (Big_int.big_int_of_int 33);;
- : int = 1

(Certainly because big ints hide their content in custom blocks.)

I found several discussions about this kind of problem (comparison of
nums, etc...), which were more focused on how to let
Pervasive.compare/equal and Hashtbl.hash behave correctly in a uniform
way.

My question is simpler : I just need to write my own hashing function,
so that I can call Hashtbl.Make. I currently use something like:

let rec hash x =
  try int_of_big_int x
  with _ -> hash (shift_right_big_int x Sys.word_size)

Some of you certainly know how to get a better behaved function. (I
don't know anything about hashing theory, except that my function is
certainly not a good one: it doesn't contain any prime number...)

Thanks for your help,
Damien

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] hashing big ints
  2011-12-06  7:22 [Caml-list] hashing big ints Damien Pous
@ 2011-12-06  7:44 ` Jérémie Dimino
  2011-12-06  8:52 ` Goswin von Brederlow
  1 sibling, 0 replies; 4+ messages in thread
From: Jérémie Dimino @ 2011-12-06  7:44 UTC (permalink / raw)
  To: Damien Pous; +Cc: caml-list

Hi,

Le mardi 06 décembre 2011 à 08:22 +0100, Damien Pous a écrit : 
> Let me re-raise an apparently old and possibly dumb question : how to
> hash big ints ?

You can use the zarith library:

  http://forge.ocamlcore.org/projects/zarith/

It provides hashable big integers.

Cheers,

-- 
Jérémie



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] hashing big ints
  2011-12-06  7:22 [Caml-list] hashing big ints Damien Pous
  2011-12-06  7:44 ` Jérémie Dimino
@ 2011-12-06  8:52 ` Goswin von Brederlow
  2011-12-06 15:18   ` Damien Pous
  1 sibling, 1 reply; 4+ messages in thread
From: Goswin von Brederlow @ 2011-12-06  8:52 UTC (permalink / raw)
  To: Damien Pous; +Cc: caml-list

Damien Pous <Damien.Pous@inria.fr> writes:

> Dear caml-list,
>
> Let me re-raise an apparently old and possibly dumb question : how to
> hash big ints ?
>
> Hashtbl.hash doesn't work (ocaml 3.12.1):
>
> # Hashtbl.hash (Big_int.big_int_of_int 42);;
> - : int = 1
> # Hashtbl.hash (Big_int.big_int_of_int 33);;
> - : int = 1
>
> (Certainly because big ints hide their content in custom blocks.)
>
> I found several discussions about this kind of problem (comparison of
> nums, etc...), which were more focused on how to let
> Pervasive.compare/equal and Hashtbl.hash behave correctly in a uniform
> way.
>
> My question is simpler : I just need to write my own hashing function,
> so that I can call Hashtbl.Make. I currently use something like:
>
> let rec hash x =
>   try int_of_big_int x
>   with _ -> hash (shift_right_big_int x Sys.word_size)
>
> Some of you certainly know how to get a better behaved function. (I
> don't know anything about hashing theory, except that my function is
> certainly not a good one: it doesn't contain any prime number...)
>
> Thanks for your help,
> Damien

A usualy good hash function for integers is to take the integer modulo a
prime number. This usualy results in an uniform distribution. So:

let big_prime = (big_int_of_int prime)
let hash x = mod_big_int x big_prime

We also want the output range of the hash function to be as large as
possible to minimize the number of collision. That means we want the
prime to be as big as possible. And there we hit a little snag. For best
results you want a different prime number for 32bit and 64bit systems.
Lookup the closest prime number smaller than 2^32 and 2^64 and use
those.

Note: It is too bad there isn't a mod_big_int_with_int or even better a
Big_int.hash function directly. Maybe you could write one that takes
advantage of knowing the internal structure of Big_int?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] hashing big ints
  2011-12-06  8:52 ` Goswin von Brederlow
@ 2011-12-06 15:18   ` Damien Pous
  0 siblings, 0 replies; 4+ messages in thread
From: Damien Pous @ 2011-12-06 15:18 UTC (permalink / raw)
  To: caml-list

Thanks Jérémie, Goswin, and Boris.

I downloaded zarith and I'll switch to it, it looks really nice.

Just in case, I resume part of Boris' private answer: the generic hash
function is fixed for Big_int values in the trunk (bug 5290), it
should be ok from OCaml 3.13.

Damien


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-12-06 15:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-06  7:22 [Caml-list] hashing big ints Damien Pous
2011-12-06  7:44 ` Jérémie Dimino
2011-12-06  8:52 ` Goswin von Brederlow
2011-12-06 15:18   ` Damien Pous

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).