caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: "Gerd Stolpmann" <info@gerd-stolpmann.de>
Cc: "Goswin von Brederlow" <goswin-v-b@web.de>, caml-list@inria.fr
Subject: Re: [Caml-list] Hashtbl and security
Date: Wed, 08 Feb 2012 10:41:03 +0100	[thread overview]
Message-ID: <8739alfys0.fsf@frosties.localnet> (raw)
In-Reply-To: <403c4e4bb2cfce801aad217c80365879.squirrel@gps.dynxs.de> (Gerd Stolpmann's message of "Tue, 31 Jan 2012 15:16:37 +0100")

"Gerd Stolpmann" <info@gerd-stolpmann.de> writes:

>> Gerd Stolpmann <info@gerd-stolpmann.de> writes:
>> You are forgetting a variable in this. To create a DOS in the hashtable
>> you need to hit the same bucket again and again. And bucket = hash % size.
>>
>> You focused on the hash but lets talk about the size for a moment.
>>
>> The size is rather limited and large hashtabels simply won't increate
>> size anymore and allways have lots of collisions. So even if someone
>> isn't actively trying to create DOS the performace still tanks as you
>> add more items.
>
> I cannot completely follow here. If you add more elements, the bucket
> array is resized. The ocaml Hashtbl does this when the number of elements
> exceeds half of the array size. The argument of Hashtbl.create is only the
> initial size of the bucket array.
>
> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
> platforms it is low (4M). But nobody of sound mind would run ocaml
> programs on 32 bit for production purposes. (If you do, consider to
> switch. You can use more memory and get a little performance boost, not
> only for ocaml programs.)
>
> Gerd

In practice I see a serious performance degradation with large
hashtables. So much so that I had to use an array of hashtables or
hashtable of hashtables and split the values across them.

Maybe the size of the hashtable indeed isn't the problem but the hash
function. Here is something odd:

# for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 lsl i)) done;;
2 2
4 4
8 8
16 16
32 32
64 64
128 128
256 256
512 512
1024 1024
2048 2048
4096 4096
8192 8192
16384 16384
32768 32768
65536 65536
131072 131072
262144 262144
524288 524288
1048576 1048576
2097152 2097152
4194304 4194304
8388608 8388608
16777216 16777216
33554432 33554432
67108864 67108864
134217728 134217728
268435456 268435456
536870912 536870912
1073741824 0
2147483648 0
4294967296 0
8589934592 0
17179869184 0
34359738368 0
68719476736 0
137438953472 0
274877906944 0
549755813888 0
1099511627776 0
2199023255552 0
4398046511104 0
8796093022208 0
17592186044416 0
35184372088832 0
70368744177664 0
140737488355328 0
281474976710656 0
562949953421312 0
1125899906842624 0
2251799813685248 0
4503599627370496 0
9007199254740992 0
18014398509481984 0
36028797018963968 0
72057594037927936 0
144115188075855872 0
288230376151711744 0
576460752303423488 0
1152921504606846976 0
2305843009213693952 0
-4611686018427387904 0
0 0
- : unit = ()

Is every value over a billion hashed to 0?

MfG
        Goswin

  reply	other threads:[~2012-02-08  9:41 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-30 16:44 Gerd Stolpmann
2011-12-30 16:48 ` Yaron Minsky
2011-12-30 19:01   ` David Allsopp
2011-12-30 20:52     ` Yaron Minsky
2011-12-30 21:54       ` Gerd Stolpmann
2011-12-30 17:06 ` Xavier Leroy
2011-12-30 21:16   ` Gerd Stolpmann
2011-12-31  0:57   ` oliver
2011-12-31  0:59     ` oliver
2012-01-01 12:52   ` Richard W.M. Jones
2012-01-01 17:29     ` Xavier Leroy
2012-01-01 21:04       ` Gerd Stolpmann
2012-01-01 23:24         ` oliver
2012-01-01 23:58           ` Gerd Stolpmann
2012-01-02  1:43             ` oliver
2012-01-04 17:56               ` Damien Doligez
2012-01-04 21:52                 ` oliver
2012-01-02  9:34         ` David MENTRE
2012-01-30 10:54       ` Goswin von Brederlow
2011-12-30 17:40 ` rixed
2011-12-30 17:52   ` Edgar Friendly
2011-12-31  1:02   ` oliver
2011-12-31  0:33 ` oliver
2012-01-02  0:21 ` Shawn Wagner
2012-01-02 14:52   ` Gerd Stolpmann
2012-01-30 10:51 ` Goswin von Brederlow
2012-01-31 14:16   ` Gerd Stolpmann
2012-02-08  9:41     ` Goswin von Brederlow [this message]
2012-02-08 10:43       ` Philippe Wang
2012-02-08 10:46       ` AUGER Cédric
2012-02-09 13:22         ` Goswin von Brederlow
2012-02-09 14:48           ` Gerd Stolpmann
2012-02-08 11:12       ` Gerd Stolpmann
2012-02-09 13:11         ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8739alfys0.fsf@frosties.localnet \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    --cc=info@gerd-stolpmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).