caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Gerd Stolpmann" <info@gerd-stolpmann.de>
To: "Goswin von Brederlow" <goswin-v-b@web.de>
Cc: "Gerd Stolpmann" <info@gerd-stolpmann.de>,
	"Goswin von Brederlow" <goswin-v-b@web.de>,
	caml-list@inria.fr
Subject: Re: [Caml-list] Hashtbl and security
Date: Wed, 8 Feb 2012 12:12:08 +0100	[thread overview]
Message-ID: <94bcc9a89c5835c1436752b9e59410ad.squirrel@gps.dynxs.de> (raw)
In-Reply-To: <8739alfys0.fsf@frosties.localnet>

Well, looks like that the upper 32 bits aren't considered at all for
computing the hash value. This is for sure surprising - I would have
expected that these bits somehow go in to the final value (e.g. that they
are xor-ed with the lower half).

My guess is that this was done to ensure that an int-to-something
hashtable can be marshalled from 64 to 32 bit systems. Not sure for what
this is important, but it looks like a nice symmetry.

In trunk the method for ensuring this seems to have changed (see function
caml_hash_mix_intnat in
http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072&view=markup).

Gerd

> "Gerd Stolpmann" <info@gerd-stolpmann.de> writes:
>
>>> Gerd Stolpmann <info@gerd-stolpmann.de> writes:
>>> You are forgetting a variable in this. To create a DOS in the hashtable
>>> you need to hit the same bucket again and again. And bucket = hash %
>>> size.
>>>
>>> You focused on the hash but lets talk about the size for a moment.
>>>
>>> The size is rather limited and large hashtabels simply won't increate
>>> size anymore and allways have lots of collisions. So even if someone
>>> isn't actively trying to create DOS the performace still tanks as you
>>> add more items.
>>
>> I cannot completely follow here. If you add more elements, the bucket
>> array is resized. The ocaml Hashtbl does this when the number of
>> elements
>> exceeds half of the array size. The argument of Hashtbl.create is only
>> the
>> initial size of the bucket array.
>>
>> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
>> platforms it is low (4M). But nobody of sound mind would run ocaml
>> programs on 32 bit for production purposes. (If you do, consider to
>> switch. You can use more memory and get a little performance boost, not
>> only for ocaml programs.)
>>
>> Gerd
>
> In practice I see a serious performance degradation with large
> hashtables. So much so that I had to use an array of hashtables or
> hashtable of hashtables and split the values across them.
>
> Maybe the size of the hashtable indeed isn't the problem but the hash
> function. Here is something odd:
>
> # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1
> lsl i)) done;;
> 2 2
> 4 4
> 8 8
> 16 16
> 32 32
> 64 64
> 128 128
> 256 256
> 512 512
> 1024 1024
> 2048 2048
> 4096 4096
> 8192 8192
> 16384 16384
> 32768 32768
> 65536 65536
> 131072 131072
> 262144 262144
> 524288 524288
> 1048576 1048576
> 2097152 2097152
> 4194304 4194304
> 8388608 8388608
> 16777216 16777216
> 33554432 33554432
> 67108864 67108864
> 134217728 134217728
> 268435456 268435456
> 536870912 536870912
> 1073741824 0
> 2147483648 0
> 4294967296 0
> 8589934592 0
> 17179869184 0
> 34359738368 0
> 68719476736 0
> 137438953472 0
> 274877906944 0
> 549755813888 0
> 1099511627776 0
> 2199023255552 0
> 4398046511104 0
> 8796093022208 0
> 17592186044416 0
> 35184372088832 0
> 70368744177664 0
> 140737488355328 0
> 281474976710656 0
> 562949953421312 0
> 1125899906842624 0
> 2251799813685248 0
> 4503599627370496 0
> 9007199254740992 0
> 18014398509481984 0
> 36028797018963968 0
> 72057594037927936 0
> 144115188075855872 0
> 288230376151711744 0
> 576460752303423488 0
> 1152921504606846976 0
> 2305843009213693952 0
> -4611686018427387904 0
> 0 0
> - : unit = ()
>
> Is every value over a billion hashed to 0?
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


-- 
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



  parent reply	other threads:[~2012-02-08 11:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-30 16:44 Gerd Stolpmann
2011-12-30 16:48 ` Yaron Minsky
2011-12-30 19:01   ` David Allsopp
2011-12-30 20:52     ` Yaron Minsky
2011-12-30 21:54       ` Gerd Stolpmann
2011-12-30 17:06 ` Xavier Leroy
2011-12-30 21:16   ` Gerd Stolpmann
2011-12-31  0:57   ` oliver
2011-12-31  0:59     ` oliver
2012-01-01 12:52   ` Richard W.M. Jones
2012-01-01 17:29     ` Xavier Leroy
2012-01-01 21:04       ` Gerd Stolpmann
2012-01-01 23:24         ` oliver
2012-01-01 23:58           ` Gerd Stolpmann
2012-01-02  1:43             ` oliver
2012-01-04 17:56               ` Damien Doligez
2012-01-04 21:52                 ` oliver
2012-01-02  9:34         ` David MENTRE
2012-01-30 10:54       ` Goswin von Brederlow
2011-12-30 17:40 ` rixed
2011-12-30 17:52   ` Edgar Friendly
2011-12-31  1:02   ` oliver
2011-12-31  0:33 ` oliver
2012-01-02  0:21 ` Shawn Wagner
2012-01-02 14:52   ` Gerd Stolpmann
2012-01-30 10:51 ` Goswin von Brederlow
2012-01-31 14:16   ` Gerd Stolpmann
2012-02-08  9:41     ` Goswin von Brederlow
2012-02-08 10:43       ` Philippe Wang
2012-02-08 10:46       ` AUGER Cédric
2012-02-09 13:22         ` Goswin von Brederlow
2012-02-09 14:48           ` Gerd Stolpmann
2012-02-08 11:12       ` Gerd Stolpmann [this message]
2012-02-09 13:11         ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94bcc9a89c5835c1436752b9e59410ad.squirrel@gps.dynxs.de \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=goswin-v-b@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).