From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id q18BCFQB022570 for ; Wed, 8 Feb 2012 12:12:15 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ag4FABJXMk/U4xEJk2dsb2JhbABAA58LkF8iAQEBAQkJCwkUAySBcwEFbgsQBQYlIUUSBhMJCAGHbQa5GotGAQMsBgEqEwKCfgsCAgEEEwsEBlkKCAMyC4MiBI01F5o/ X-IronPort-AV: E=Sophos;i="4.73,383,1325458800"; d="scan'208";a="143370483" Received: from moutng.kundenserver.de ([212.227.17.9]) by mail1-smtp-roc.national.inria.fr with ESMTP; 08 Feb 2012 12:12:09 +0100 Received: from office1.lan.sumadev.de (dslb-084-059-064-038.pools.arcor-ip.net [84.59.64.38]) by mrelayeu.kundenserver.de (node=mreu2) with ESMTP (Nemesis) id 0MflMO-1S9Iwn0L8G-00NcSj; Wed, 08 Feb 2012 12:12:09 +0100 Received: from gps.dynxs.de (localhost [127.0.0.1]) by office1.lan.sumadev.de (Postfix) with ESMTP id 4644EC0714; Wed, 8 Feb 2012 12:12:08 +0100 (CET) Received: from 84.58.19.218 (SquirrelMail authenticated user gerd) by gps.dynxs.de with HTTP; Wed, 8 Feb 2012 12:12:08 +0100 Message-ID: <94bcc9a89c5835c1436752b9e59410ad.squirrel@gps.dynxs.de> In-Reply-To: <8739alfys0.fsf@frosties.localnet> References: <1325263446.5036.104.camel@samsung> <87zkd531kl.fsf@frosties.localnet> <403c4e4bb2cfce801aad217c80365879.squirrel@gps.dynxs.de> <8739alfys0.fsf@frosties.localnet> Date: Wed, 8 Feb 2012 12:12:08 +0100 From: "Gerd Stolpmann" To: "Goswin von Brederlow" Cc: "Gerd Stolpmann" , "Goswin von Brederlow" , caml-list@inria.fr User-Agent: SquirrelMail/1.4.21 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 X-Priority: 3 (Normal) Importance: Normal X-Provags-ID: V02:K0:uUydpX9OmhLCnBPx+EvXrJe+oi0vWvt/fqyZ9KrO9WY iKRj5pHyyfytTvAiFyqXFSDspaqFLxPhdkx1UHhjs9T9ek1+4U cMlgeKBQrLcaafdAsQN4wylv24XfwFm3ZRrrLn8pwvUBT4uvuZ Jfp98lsfQi2hH4DX6YzjuT+pc8FNOCfHtJEfGkm11oxBBC+G+n MHYaDxD51an29PqOcihaIENGne8mmpZXzvlGbQBHRiAhPc7hWh Wh64Fyml7lO9tFGJ6wuFIN0QjF8+6bFZlZTB7NbQwWKKAhyXPO Hs4i1maY7AkPyY59ctxjNGxazyCf+uGrSTh0kKEr7VnjrMOimu FVXm4WzFEISPogi/yjvFvMl+Mk7h8gH4iUcAPbrwuW4XQnsBzq x3lNbWlt6dKSw== Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id q18BCFQB022570 Subject: Re: [Caml-list] Hashtbl and security Well, looks like that the upper 32 bits aren't considered at all for computing the hash value. This is for sure surprising - I would have expected that these bits somehow go in to the final value (e.g. that they are xor-ed with the lower half). My guess is that this was done to ensure that an int-to-something hashtable can be marshalled from 64 to 32 bit systems. Not sure for what this is important, but it looks like a nice symmetry. In trunk the method for ensuring this seems to have changed (see function caml_hash_mix_intnat in http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072&view=markup). Gerd > "Gerd Stolpmann" writes: > >>> Gerd Stolpmann writes: >>> You are forgetting a variable in this. To create a DOS in the hashtable >>> you need to hit the same bucket again and again. And bucket = hash % >>> size. >>> >>> You focused on the hash but lets talk about the size for a moment. >>> >>> The size is rather limited and large hashtabels simply won't increate >>> size anymore and allways have lots of collisions. So even if someone >>> isn't actively trying to create DOS the performace still tanks as you >>> add more items. >> >> I cannot completely follow here. If you add more elements, the bucket >> array is resized. The ocaml Hashtbl does this when the number of >> elements >> exceeds half of the array size. The argument of Hashtbl.create is only >> the >> initial size of the bucket array. >> >> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit >> platforms it is low (4M). But nobody of sound mind would run ocaml >> programs on 32 bit for production purposes. (If you do, consider to >> switch. You can use more memory and get a little performance boost, not >> only for ocaml programs.) >> >> Gerd > > In practice I see a serious performance degradation with large > hashtables. So much so that I had to use an array of hashtables or > hashtable of hashtables and split the values across them. > > Maybe the size of the hashtable indeed isn't the problem but the hash > function. Here is something odd: > > # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 > lsl i)) done;; > 2 2 > 4 4 > 8 8 > 16 16 > 32 32 > 64 64 > 128 128 > 256 256 > 512 512 > 1024 1024 > 2048 2048 > 4096 4096 > 8192 8192 > 16384 16384 > 32768 32768 > 65536 65536 > 131072 131072 > 262144 262144 > 524288 524288 > 1048576 1048576 > 2097152 2097152 > 4194304 4194304 > 8388608 8388608 > 16777216 16777216 > 33554432 33554432 > 67108864 67108864 > 134217728 134217728 > 268435456 268435456 > 536870912 536870912 > 1073741824 0 > 2147483648 0 > 4294967296 0 > 8589934592 0 > 17179869184 0 > 34359738368 0 > 68719476736 0 > 137438953472 0 > 274877906944 0 > 549755813888 0 > 1099511627776 0 > 2199023255552 0 > 4398046511104 0 > 8796093022208 0 > 17592186044416 0 > 35184372088832 0 > 70368744177664 0 > 140737488355328 0 > 281474976710656 0 > 562949953421312 0 > 1125899906842624 0 > 2251799813685248 0 > 4503599627370496 0 > 9007199254740992 0 > 18014398509481984 0 > 36028797018963968 0 > 72057594037927936 0 > 144115188075855872 0 > 288230376151711744 0 > 576460752303423488 0 > 1152921504606846976 0 > 2305843009213693952 0 > -4611686018427387904 0 > 0 0 > - : unit = () > > Is every value over a billion hashed to 0? > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > -- Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you.