caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Gerd Stolpmann" <info@gerd-stolpmann.de>
To: "Goswin von Brederlow" <goswin-v-b@web.de>
Cc: "AUGER Cdric" <cedric.auger@lri.fr>, caml-list@inria.fr
Subject: Re: [Caml-list] Hashtbl and security
Date: Thu, 9 Feb 2012 15:48:55 +0100	[thread overview]
Message-ID: <be6eee4cb5a18d0145f2553b39f6ba67.squirrel@gps.dynxs.de> (raw)
In-Reply-To: <87bop8qgzn.fsf@frosties.localnet>


> AUGER Cédric <Cedric.Auger@lri.fr> writes:
>
>> More seriously, hashtables are here to store values, so you don't want
>> want to have a hash value bigger than (2^30)-1.
>>
>> In fact even such a value is too big for that purpouse IMHO, since on a
>> 4Gio RAM computer, it will take half it to store the table, assuming
>> each entry only takes 4 octets, that is it would be ok only for storing
>> just the structure of the table (each entry has to be a pointer),
>> so that in practice, either your table is ridiculously big to store
>> too few values either it is relevant to have such a size and in this
>> case your computer will spend all its time to swap since you won't
>> have enough RAM.
>
> But I have 128GB ram. :) That allows for a hashtable of size 2^33
> entries pointing at simple values (e.g. ints) without swapping. And ram
> sizes get bigger and bigger.

Fully agreed. The limitation to 30 bits is already unpractical for such
large computers. Maybe we need a second hash function

val Hashtbl.wide_hash : 'a -> int

without this constraint (on 64 bit systems).

Btw, you can already define your own hash functions, just use the
functorized interface. And you are not limited to 30 bits. It's a bit
inconvenient only.

Gerd



>
> [Doesn't a Hashtbl of ints store the ints directly? That would allow
> 2^34 entries.]
>
>>  Clearly you cannot a bijection from 63 bits
>> integers to 31 bits integers; so there are a lot of collisions. I do
>> not have a 64 arch, but I guesse that if you hash "2^48+168" you will
>> get "168". I guess such a function is ways to be ideal, since when
>> playing with bitvectors having two integers which differs by only one
>> bit is very common, but at least it is a fast function, and has a good
>> repartition (you have 2^32 collisions exactly per bucket).
>
> Ignoring the upper bits makes for a verry verry poor hash function. Also
> means it is 100% trivial to create a DOS by tuneing your input values to
> only differ in the upper bits.
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


-- 
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



  reply	other threads:[~2012-02-09 14:49 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-30 16:44 Gerd Stolpmann
2011-12-30 16:48 ` Yaron Minsky
2011-12-30 19:01   ` David Allsopp
2011-12-30 20:52     ` Yaron Minsky
2011-12-30 21:54       ` Gerd Stolpmann
2011-12-30 17:06 ` Xavier Leroy
2011-12-30 21:16   ` Gerd Stolpmann
2011-12-31  0:57   ` oliver
2011-12-31  0:59     ` oliver
2012-01-01 12:52   ` Richard W.M. Jones
2012-01-01 17:29     ` Xavier Leroy
2012-01-01 21:04       ` Gerd Stolpmann
2012-01-01 23:24         ` oliver
2012-01-01 23:58           ` Gerd Stolpmann
2012-01-02  1:43             ` oliver
2012-01-04 17:56               ` Damien Doligez
2012-01-04 21:52                 ` oliver
2012-01-02  9:34         ` David MENTRE
2012-01-30 10:54       ` Goswin von Brederlow
2011-12-30 17:40 ` rixed
2011-12-30 17:52   ` Edgar Friendly
2011-12-31  1:02   ` oliver
2011-12-31  0:33 ` oliver
2012-01-02  0:21 ` Shawn Wagner
2012-01-02 14:52   ` Gerd Stolpmann
2012-01-30 10:51 ` Goswin von Brederlow
2012-01-31 14:16   ` Gerd Stolpmann
2012-02-08  9:41     ` Goswin von Brederlow
2012-02-08 10:43       ` Philippe Wang
2012-02-08 10:46       ` AUGER Cédric
2012-02-09 13:22         ` Goswin von Brederlow
2012-02-09 14:48           ` Gerd Stolpmann [this message]
2012-02-08 11:12       ` Gerd Stolpmann
2012-02-09 13:11         ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be6eee4cb5a18d0145f2553b39f6ba67.squirrel@gps.dynxs.de \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=cedric.auger@lri.fr \
    --cc=goswin-v-b@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).