caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Hashtbl and security
@ 2011-12-30 16:44 Gerd Stolpmann
  2011-12-30 16:48 ` Yaron Minsky
                   ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Gerd Stolpmann @ 2011-12-30 16:44 UTC (permalink / raw)
  To: caml-list

Hi,

there was recently a security alert for web services that use hash
tables to store web form parameters sent via POST (so that millions of
such parameters can be sent in a single request). It is possible to keep
the web service busy for hours with such a DoS (denial of service)
attack. The type of attack boils down to a problem in most hash table
implementations, namely that the hash functions are invertible, and it
is possible for a malicious user to construct lots of keys that all map
to the same bucket of the hash table, creating a mass collision.

The text of the alert: 
http://www.nruns.com/_downloads/advisory28122011.pdf

I'd like to discuss this issue, because it is not restricted to the
processing of web requests, but may also occur for all other data coming
from untrusted sources. The web is only the most exposed area where this
issue exists.

So how is Ocaml affected? The hash functions used in recent Ocaml
releases are also insecure in the above mentioned sense (currently
MurmurHash3, and even a simpler hash function in previous releases). A
quick survey of the Internet revealed at least one site that tries to
break it. Probably a good cryptographer could do it in minutes. 

A pure Hashtbl.add of the constructed keys does not yet lead to the
performance degradation, but a Hashtbl.replace, and of course
Hashtbl.find after the table is built up will. So it depends very much
of the details of the programs whether they are affected or not.

I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
parameters, so it is not directly vulnerable. But if the crafted request
is actually served by a handler, the handler would get a degraded table,
and could show in turn bad performance (again leading to DoS).

What are possible fixes?

1) Avoid hash tables in contexts where security is relevant. The
alternative is Set (actually a balanced binary tree), which does not
show this problem.

2) Use cryptographically secure hash functions.

3) Use "randomized" hash tables. The trick here is that there is not a
single hash function h anymore, but a family h(1)...h(n). When the hash
table is created, one of the functions is picked randomly. This makes it
impossible to craft an attack request, because you cannot predict the
function.

I don't think 1) is viable - hash tables are too wide spread, and are
loved by programmers because of their good performance. 2) would be good
in extremely critical contexts - although it is then again questionable,
because it is likely to have worse performance than 1).

So, the question is how to do 3). I see two problems here:

a) how to define the family of hash functions. Is it e.g. sufficient to
introduce an initialization vector for the Murmurhash algorithm, and
fill it randomly? How to get a random number that is good enough?

b) the Hashtbl in the standard library does not allow it to set the hash
function dynamically. Maybe one can get this effect by using first-class
modules (haven't checked).

Anyway, I think these problems are difficult enough to deserve some
discussion here. At least I cannot solve them immediately, and this
problem may exist for lots of Ocaml applications.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 16:44 [Caml-list] Hashtbl and security Gerd Stolpmann
@ 2011-12-30 16:48 ` Yaron Minsky
  2011-12-30 19:01   ` David Allsopp
  2011-12-30 17:06 ` Xavier Leroy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Yaron Minsky @ 2011-12-30 16:48 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 4257 bytes --]

For just this reason, the hashtables in Core have been reimplemented to use
an AVL tree in the buckets.  That way, even when you have pathological
collisions, you degrade gracefully to O(log n) per operation, instead of
O(n), where n is the number of keys in the hashtable.

y

On Fri, Dec 30, 2011 at 11:44 AM, Gerd Stolpmann <info@gerd-stolpmann.de>wrote:

> Hi,
>
> there was recently a security alert for web services that use hash
> tables to store web form parameters sent via POST (so that millions of
> such parameters can be sent in a single request). It is possible to keep
> the web service busy for hours with such a DoS (denial of service)
> attack. The type of attack boils down to a problem in most hash table
> implementations, namely that the hash functions are invertible, and it
> is possible for a malicious user to construct lots of keys that all map
> to the same bucket of the hash table, creating a mass collision.
>
> The text of the alert:
> http://www.nruns.com/_downloads/advisory28122011.pdf
>
> I'd like to discuss this issue, because it is not restricted to the
> processing of web requests, but may also occur for all other data coming
> from untrusted sources. The web is only the most exposed area where this
> issue exists.
>
> So how is Ocaml affected? The hash functions used in recent Ocaml
> releases are also insecure in the above mentioned sense (currently
> MurmurHash3, and even a simpler hash function in previous releases). A
> quick survey of the Internet revealed at least one site that tries to
> break it. Probably a good cryptographer could do it in minutes.
>
> A pure Hashtbl.add of the constructed keys does not yet lead to the
> performance degradation, but a Hashtbl.replace, and of course
> Hashtbl.find after the table is built up will. So it depends very much
> of the details of the programs whether they are affected or not.
>
> I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
> parameters, so it is not directly vulnerable. But if the crafted request
> is actually served by a handler, the handler would get a degraded table,
> and could show in turn bad performance (again leading to DoS).
>
> What are possible fixes?
>
> 1) Avoid hash tables in contexts where security is relevant. The
> alternative is Set (actually a balanced binary tree), which does not
> show this problem.
>
> 2) Use cryptographically secure hash functions.
>
> 3) Use "randomized" hash tables. The trick here is that there is not a
> single hash function h anymore, but a family h(1)...h(n). When the hash
> table is created, one of the functions is picked randomly. This makes it
> impossible to craft an attack request, because you cannot predict the
> function.
>
> I don't think 1) is viable - hash tables are too wide spread, and are
> loved by programmers because of their good performance. 2) would be good
> in extremely critical contexts - although it is then again questionable,
> because it is likely to have worse performance than 1).
>
> So, the question is how to do 3). I see two problems here:
>
> a) how to define the family of hash functions. Is it e.g. sufficient to
> introduce an initialization vector for the Murmurhash algorithm, and
> fill it randomly? How to get a random number that is good enough?
>
> b) the Hashtbl in the standard library does not allow it to set the hash
> function dynamically. Maybe one can get this effect by using first-class
> modules (haven't checked).
>
> Anyway, I think these problems are difficult enough to deserve some
> discussion here. At least I cannot solve them immediately, and this
> problem may exist for lots of Ocaml applications.
>
> Gerd
> --
> ------------------------------------------------------------
> Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
> Creator of GODI and camlcity.org.
> Contact details:        http://www.camlcity.org/contact.html
> Company homepage:       http://www.gerd-stolpmann.de
> ------------------------------------------------------------
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

[-- Attachment #2: Type: text/html, Size: 5395 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 16:44 [Caml-list] Hashtbl and security Gerd Stolpmann
  2011-12-30 16:48 ` Yaron Minsky
@ 2011-12-30 17:06 ` Xavier Leroy
  2011-12-30 21:16   ` Gerd Stolpmann
                     ` (2 more replies)
  2011-12-30 17:40 ` rixed
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 34+ messages in thread
From: Xavier Leroy @ 2011-12-30 17:06 UTC (permalink / raw)
  To: caml-list

On 12/30/2011 05:44 PM, Gerd Stolpmann wrote:

> 1) Avoid hash tables in contexts where security is relevant. The
> alternative is Set (actually a balanced binary tree), which does not
> show this problem.

Highly recommended.  Nothing beats guaranteed O(log n) operations.

> 2) Use cryptographically secure hash functions.

Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits,
there are no cryptographically secure hash functions.

> 3) Use "randomized" hash tables. The trick here is that there is not a
> single hash function h anymore, but a family h(1)...h(n). When the hash
> table is created, one of the functions is picked randomly. This makes it
> impossible to craft an attack request, because you cannot predict the
> function. 

Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
this in the new implementation of Hashtbl (the one based on Murmur3).

> So, the question is how to do 3). I see two problems here:
> 
> a) how to define the family of hash functions. Is it e.g. sufficient to
> introduce an initialization vector for the Murmurhash algorithm, and
> fill it randomly?

IIRC, the Web pages for the Murmur family of hashes gives some
statistical evidence that this approach works.

> How to get a random number that is good enough?

Hmm.  /dev/random is your friend on the platforms that support it.
Otherwise, there's always the Random module, but Random.self_init
isn't very strong.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 16:44 [Caml-list] Hashtbl and security Gerd Stolpmann
  2011-12-30 16:48 ` Yaron Minsky
  2011-12-30 17:06 ` Xavier Leroy
@ 2011-12-30 17:40 ` rixed
  2011-12-30 17:52   ` Edgar Friendly
  2011-12-31  1:02   ` oliver
  2011-12-31  0:33 ` oliver
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 34+ messages in thread
From: rixed @ 2011-12-30 17:40 UTC (permalink / raw)
  To: caml-list

I will probably tell something very stupid, but HTML specs
do not prevent a client to post 1M values with the same name,
so whatever your hash function you cannot do much, can you?

The simplest solution I can think of that prevents all attacks
of this kind (but could reject some valid POST in theory) would
be to store the bucket lengths and use it to detect and reject
"obviously biaised" insertions.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 17:40 ` rixed
@ 2011-12-30 17:52   ` Edgar Friendly
  2011-12-31  1:02   ` oliver
  1 sibling, 0 replies; 34+ messages in thread
From: Edgar Friendly @ 2011-12-30 17:52 UTC (permalink / raw)
  To: rixed; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

Not a problem, other than memory waste (which doesn't matter the key).
Hashtbl.add prepends to the front of the list, so it'll be fast.  And
Hashtbl.find will match the head of the list, and return quickly too.
Hashtbl.find_all will have to go through the whole list, but that's
expected.

E.

On Fri, Dec 30, 2011 at 12:40 PM, <rixed@happyleptic.org> wrote:

> I will probably tell something very stupid, but HTML specs
> do not prevent a client to post 1M values with the same name,
> so whatever your hash function you cannot do much, can you?
>
> The simplest solution I can think of that prevents all attacks
> of this kind (but could reject some valid POST in theory) would
> be to store the bucket lengths and use it to detect and reject
> "obviously biaised" insertions.
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

[-- Attachment #2: Type: text/html, Size: 1610 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Caml-list] Hashtbl and security
  2011-12-30 16:48 ` Yaron Minsky
@ 2011-12-30 19:01   ` David Allsopp
  2011-12-30 20:52     ` Yaron Minsky
  0 siblings, 1 reply; 34+ messages in thread
From: David Allsopp @ 2011-12-30 19:01 UTC (permalink / raw)
  To: Yaron Minsky, Gerd Stolpmann; +Cc: caml-list

Yaron Minsky wrote:
> For just this reason, the hashtables in Core have been reimplemented to use an
> AVL tree in the buckets.  That way, even when you have pathological collisions,
> you degrade gracefully to O(log n) per operation, instead of O(n), where n is
> the number of keys in the hashtable.

I'm resisting the temptation to hack-it-and-see: does your implementation do anything clever to maintain Hashtbl's O(1) insertion time (e.g. Hashtbl.add updates a list and then the first call to Hashtbl.find or Hashtbl.mem "moves" any items from the list to the AVL). Or does doing that impact "general" performance too much?

In the POST web scenario (or processing HTTP request headers), for example, "degrading" Hashtbl.add from O(1) to O(log n) is only acceptable if you know that you'll query all the fields in the POST (which isn't necessarily true).


David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 19:01   ` David Allsopp
@ 2011-12-30 20:52     ` Yaron Minsky
  2011-12-30 21:54       ` Gerd Stolpmann
  0 siblings, 1 reply; 34+ messages in thread
From: Yaron Minsky @ 2011-12-30 20:52 UTC (permalink / raw)
  To: David Allsopp; +Cc: Gerd Stolpmann, caml-list

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

It's not clever in that way.  It does try to do a good job of keeping the
memory impact of the tree low, but you maintain O(1) by having a low load
factor, and therefore trees of constant size.  You can take a look at the
code here:

https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml

(Don't rely on that repo too much yet, btw.  We're probably going to blow
it away and create a new one in the next couple of days.  But going
forward, we plan on using bitbucket as a place to work together with the
community on Core.)

y

On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp <dra-news@metastack.com>wrote:

> Yaron Minsky wrote:
> > For just this reason, the hashtables in Core have been reimplemented to
> use an
> > AVL tree in the buckets.  That way, even when you have pathological
> collisions,
> > you degrade gracefully to O(log n) per operation, instead of O(n), where
> n is
> > the number of keys in the hashtable.
>
> I'm resisting the temptation to hack-it-and-see: does your implementation
> do anything clever to maintain Hashtbl's O(1) insertion time (e.g.
> Hashtbl.add updates a list and then the first call to Hashtbl.find or
> Hashtbl.mem "moves" any items from the list to the AVL). Or does doing that
> impact "general" performance too much?
>
> In the POST web scenario (or processing HTTP request headers), for
> example, "degrading" Hashtbl.add from O(1) to O(log n) is only acceptable
> if you know that you'll query all the fields in the POST (which isn't
> necessarily true).
>
>
> David
>

[-- Attachment #2: Type: text/html, Size: 2118 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 17:06 ` Xavier Leroy
@ 2011-12-30 21:16   ` Gerd Stolpmann
  2011-12-31  0:57   ` oliver
  2012-01-01 12:52   ` Richard W.M. Jones
  2 siblings, 0 replies; 34+ messages in thread
From: Gerd Stolpmann @ 2011-12-30 21:16 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Am Freitag, den 30.12.2011, 18:06 +0100 schrieb Xavier Leroy:
> > 3) Use "randomized" hash tables. The trick here is that there is not a
> > single hash function h anymore, but a family h(1)...h(n). When the hash
> > table is created, one of the functions is picked randomly. This makes it
> > impossible to craft an attack request, because you cannot predict the
> > function. 
> 
> Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
> this in the new implementation of Hashtbl (the one based on Murmur3).

I see. It will be available in 3.13:

val create : ?seed:int -> int -> ('a, 'b) t

There is also an additional functorized interface where this seed
argument exists (Hashtbl.MakeSeeded), and the hash functions seeded_hash
and seeded_hash_param. Well done!

Nevertheless, as we all don't know when 3.13 is ready, I'll have to find
a temporary fix for Ocamlnet. Maybe just a limit for the number of POST
parameters.

> > So, the question is how to do 3). I see two problems here:
> > 
> > a) how to define the family of hash functions. Is it e.g. sufficient to
> > introduce an initialization vector for the Murmurhash algorithm, and
> > fill it randomly?
> 
> IIRC, the Web pages for the Murmur family of hashes gives some
> statistical evidence that this approach works.
> 
> > How to get a random number that is good enough?
> 
> Hmm.  /dev/random is your friend on the platforms that support it.
> Otherwise, there's always the Random module, but Random.self_init
> isn't very strong.

Well, /dev/(u)random covers most Unix platforms nowadays. If you are
interested, I have a wrapper for Win32:

https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_c_win32.c

Scroll down until netsys_fill_random.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 20:52     ` Yaron Minsky
@ 2011-12-30 21:54       ` Gerd Stolpmann
  0 siblings, 0 replies; 34+ messages in thread
From: Gerd Stolpmann @ 2011-12-30 21:54 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: David Allsopp, caml-list

Am Freitag, den 30.12.2011, 15:52 -0500 schrieb Yaron Minsky:
> It's not clever in that way.  It does try to do a good job of keeping
> the memory impact of the tree low, but you maintain O(1) by having a
> low load factor, and therefore trees of constant size.  You can take a
> look at the code here:
> 
> 
> https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml
> 
> (Don't rely on that repo too much yet, btw.  We're probably going to
> blow it away and create a new one in the next couple of days.  But
> going forward, we plan on using bitbucket as a place to work together
> with the community on Core.)

Interesting solution, at least. When I see it right, the Avltree has
special representations for empty and 1-node trees, so with some luck
this covers 99% of the array cells. So, the memory footprint will
usually not be higher than for conventional hash tables.

So, I'd consider Core_hashtbl as a way when you need high protection
against pathological cases, but want to keep close to the performance
pattern of Hashtbl.

However, we are in an area where we need to make compromises. I guess
Core_hashtbl is (for the non-pathological cases) by a factor slower than
the more lightweight Hashtbl. This raises the question how it competes
to other solutions, e.g. a Map where the keys are first compared by
their hash before cmp is called, for instance

let (++) x f = if x<>0 then x else f()

module HString = struct
  type t = int * string
  let compare (h1,s1) (h2,s2) =
    compare h1 h2 ++ (fun () -> compare s1 s2)
end

module HStringMap = Map.Make(HString)

now use it as: HStringMap.add (Hashtbl.hash key, key) value map

This eliminates one of the drawbacks of the normal Map, namely that many
keys need to be compared (which can be costly).

Gerd

> 
> y
> 
> On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp
> <dra-news@metastack.com> wrote:
>         Yaron Minsky wrote:
>         > For just this reason, the hashtables in Core have been
>         reimplemented to use an
>         > AVL tree in the buckets.  That way, even when you have
>         pathological collisions,
>         > you degrade gracefully to O(log n) per operation, instead of
>         O(n), where n is
>         > the number of keys in the hashtable.
>         
>         
>         I'm resisting the temptation to hack-it-and-see: does your
>         implementation do anything clever to maintain Hashtbl's O(1)
>         insertion time (e.g. Hashtbl.add updates a list and then the
>         first call to Hashtbl.find or Hashtbl.mem "moves" any items
>         from the list to the AVL). Or does doing that impact "general"
>         performance too much?
>         
>         In the POST web scenario (or processing HTTP request headers),
>         for example, "degrading" Hashtbl.add from O(1) to O(log n) is
>         only acceptable if you know that you'll query all the fields
>         in the POST (which isn't necessarily true).
>         
>         
>         David
> 
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 16:44 [Caml-list] Hashtbl and security Gerd Stolpmann
                   ` (2 preceding siblings ...)
  2011-12-30 17:40 ` rixed
@ 2011-12-31  0:33 ` oliver
  2012-01-02  0:21 ` Shawn Wagner
  2012-01-30 10:51 ` Goswin von Brederlow
  5 siblings, 0 replies; 34+ messages in thread
From: oliver @ 2011-12-31  0:33 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

On Fri, Dec 30, 2011 at 05:44:06PM +0100, Gerd Stolpmann wrote:
> Hi,
> 
> there was recently a security alert for web services that use hash
> tables to store web form parameters sent via POST
[...]


Fixed in Perl in 2003.


28C3-Talk:
  http://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html

LQ Video of the talk:
  http://mirror.fem-net.de/CCC/28C3/mp4-h264-LQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264-iprod.mp4
  http://mirror.fem-net.de/CCC/28C3/mp4-h264-LQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264-iprod.mp4.sha1
  http://mirror.fem-net.de/CCC/28C3/mp4-h264-LQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264-iprod.mp4.torrent


HQ Video of the talk:
  http://mirror.fem-net.de/CCC/28C3/mp4-h264-HQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264.mp4
  http://mirror.fem-net.de/CCC/28C3/mp4-h264-HQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264.mp4.sha1
  http://mirror.fem-net.de/CCC/28C3/mp4-h264-HQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264.mp4.torrent



Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 17:06 ` Xavier Leroy
  2011-12-30 21:16   ` Gerd Stolpmann
@ 2011-12-31  0:57   ` oliver
  2011-12-31  0:59     ` oliver
  2012-01-01 12:52   ` Richard W.M. Jones
  2 siblings, 1 reply; 34+ messages in thread
From: oliver @ 2011-12-31  0:57 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
> On 12/30/2011 05:44 PM, Gerd Stolpmann wrote:
> 
> > 1) Avoid hash tables in contexts where security is relevant. The
> > alternative is Set (actually a balanced binary tree), which does not
> > show this problem.
> 
> Highly recommended.  Nothing beats guaranteed O(log n) operations.
> 
> > 2) Use cryptographically secure hash functions.
> 
> Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits,
> there are no cryptographically secure hash functions.
> 
> > 3) Use "randomized" hash tables. The trick here is that there is not a
> > single hash function h anymore, but a family h(1)...h(n). When the hash
> > table is created, one of the functions is picked randomly. This makes it
> > impossible to craft an attack request, because you cannot predict the
> > function. 
> 
> Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
> this in the new implementation of Hashtbl (the one based on Murmur3).
[...]


Where is this?

I found  Hashtbl.HashedType.hash  with type t -> int.

But I did not found an optional parameter to Hashtbl.create

Maybe that is part of the next release?

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-31  0:57   ` oliver
@ 2011-12-31  0:59     ` oliver
  0 siblings, 0 replies; 34+ messages in thread
From: oliver @ 2011-12-31  0:59 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

On Sat, Dec 31, 2011 at 01:57:21AM +0100, oliver wrote:
> On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
> > On 12/30/2011 05:44 PM, Gerd Stolpmann wrote:
> > 
> > > 1) Avoid hash tables in contexts where security is relevant. The
> > > alternative is Set (actually a balanced binary tree), which does not
> > > show this problem.
> > 
> > Highly recommended.  Nothing beats guaranteed O(log n) operations.
> > 
> > > 2) Use cryptographically secure hash functions.
> > 
> > Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits,
> > there are no cryptographically secure hash functions.
> > 
> > > 3) Use "randomized" hash tables. The trick here is that there is not a
> > > single hash function h anymore, but a family h(1)...h(n). When the hash
> > > table is created, one of the functions is picked randomly. This makes it
> > > impossible to craft an attack request, because you cannot predict the
> > > function. 
> > 
> > Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
> > this in the new implementation of Hashtbl (the one based on Murmur3).
> [...]
> 
> 
> Where is this?
> 
> I found  Hashtbl.HashedType.hash  with type t -> int.

And there is "val hash_param : int -> int -> 'a -> int"
but I'm not sure if this adresses the issue.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 17:40 ` rixed
  2011-12-30 17:52   ` Edgar Friendly
@ 2011-12-31  1:02   ` oliver
  1 sibling, 0 replies; 34+ messages in thread
From: oliver @ 2011-12-31  1:02 UTC (permalink / raw)
  To: rixed; +Cc: caml-list

On Fri, Dec 30, 2011 at 06:40:30PM +0100, rixed@happyleptic.org wrote:
> I will probably tell something very stupid, but HTML specs
> do not prevent a client to post 1M values with the same name,
> so whatever your hash function you cannot do much, can you?
[...]

That's a feature.


> 
> The simplest solution I can think of that prevents all attacks
> of this kind (but could reject some valid POST in theory) would
> be to store the bucket lengths and use it to detect and reject
> "obviously biaised" insertions.
[...]

How do you define "obvious" and "biased"?

Sometimes, the distinction between feature and bug
depends on the context...

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 17:06 ` Xavier Leroy
  2011-12-30 21:16   ` Gerd Stolpmann
  2011-12-31  0:57   ` oliver
@ 2012-01-01 12:52   ` Richard W.M. Jones
  2012-01-01 17:29     ` Xavier Leroy
  2 siblings, 1 reply; 34+ messages in thread
From: Richard W.M. Jones @ 2012-01-01 12:52 UTC (permalink / raw)
  Cc: caml-list

On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
> Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
> this in the new implementation of Hashtbl (the one based on Murmur3).

It may be worth noting that Perl solved this problem (back in 2003) by
unconditionally using a seed which is a global set to a random number
during interpreter initialization.  Each run of the interpreter
results in a different seed, and (it is supposed therefore) users of
Perl simply don't need to worry about algorithmic complexity attacks.

http://perl5.git.perl.org/perl.git/blob/HEAD:/hv.h#l104
http://dev.perl.org/perl5/news/2003/perl-5.8.1.html#Hash_Randomisation

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-01 12:52   ` Richard W.M. Jones
@ 2012-01-01 17:29     ` Xavier Leroy
  2012-01-01 21:04       ` Gerd Stolpmann
  2012-01-30 10:54       ` Goswin von Brederlow
  0 siblings, 2 replies; 34+ messages in thread
From: Xavier Leroy @ 2012-01-01 17:29 UTC (permalink / raw)
  To: caml-list

On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
> On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
>> Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
>> this in the new implementation of Hashtbl (the one based on Murmur3).
> 
> It may be worth noting that Perl solved this problem (back in 2003) by
> unconditionally using a seed which is a global set to a random number
> during interpreter initialization.  

That's how my initial reimplementation of Hashtbl worked, using the
Random module to produce seeds, but I was told (correctly) that in
security-sensitive applications it's better to leave the generation of
random numbers under control of the programmer.  For some applications
Random.self_init might be good enough and for others stronger
randomness is needed.

Of course, you can trivially emulate Perl's behavior using the new
Hashtbl implementation + the Random module.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-01 17:29     ` Xavier Leroy
@ 2012-01-01 21:04       ` Gerd Stolpmann
  2012-01-01 23:24         ` oliver
  2012-01-02  9:34         ` David MENTRE
  2012-01-30 10:54       ` Goswin von Brederlow
  1 sibling, 2 replies; 34+ messages in thread
From: Gerd Stolpmann @ 2012-01-01 21:04 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy:
> On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
> > On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
> >> Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
> >> this in the new implementation of Hashtbl (the one based on Murmur3).
> > 
> > It may be worth noting that Perl solved this problem (back in 2003) by
> > unconditionally using a seed which is a global set to a random number
> > during interpreter initialization.  
> 
> That's how my initial reimplementation of Hashtbl worked, using the
> Random module to produce seeds, but I was told (correctly) that in
> security-sensitive applications it's better to leave the generation of
> random numbers under control of the programmer.  For some applications
> Random.self_init might be good enough and for others stronger
> randomness is needed.
> 
> Of course, you can trivially emulate Perl's behavior using the new
> Hashtbl implementation + the Random module.

I understand it very well that adding support for cryptographically
secure random numbers to core Ocaml is a challenge. There is no POSIX
API, and /dev/random is, although widely available, still non-standard.
And it is certainly true that there are various levels of security, and
for some purposes the programmer should be free to install even better
generators. Nevertheless, Ocaml is now widely used in environments where
a certain minimum of security is demanded, and I think Ocaml should
provide this minimum at least, and use it for things like an
automatically chosen seed for hash tables.

My argument is: Even providing a half solution is in this area better
than leaving the unwary programmer completely alone. Because in the
latter case, nothing will be done to address the problems, and apps
would be easier to attack.

What I could imagine is a module Sys.Security where all security
features are accessible and configurable, e.g.

val set_default_seed : int -> unit
val fill_randomly : [`Fast|`Safe] -> string -> unit
val set_fill_randomly : ([`Fast|`Safe] -> string -> unit) -> unit

The configure script could check what is available on the system. The
`Fast mode is guaranteed to always exist, and would essentially
be /dev/urandom if available and otherwise Random-based. The `Safe mode
would require /dev/random (or comparable), and fail otherwise. (See
Wikipedia http://en.wikipedia.org/wiki/dev/random for a list of OS, and
for weaknesses.)

The user can override the RNG if the defaults are not ok, or if exact
control is required.

Of course, one could criticize such an interface in various ways. For
example, it only allows a global security profile, and not different
profiles for different parts of the program. Also, the distinction
between a fast and a safe generator sounds a bit arbitrary. My response
is simply: this is the price for the broad effect, and IMHO this counts
more than having a more elaborated but also more difficult solution, or
lots of user-specific solutions (as we would have to find today).

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-01 21:04       ` Gerd Stolpmann
@ 2012-01-01 23:24         ` oliver
  2012-01-01 23:58           ` Gerd Stolpmann
  2012-01-02  9:34         ` David MENTRE
  1 sibling, 1 reply; 34+ messages in thread
From: oliver @ 2012-01-01 23:24 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Xavier Leroy, caml-list

On Sun, Jan 01, 2012 at 10:04:03PM +0100, Gerd Stolpmann wrote:
> Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy:
> > On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
> > > On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
> > >> Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
> > >> this in the new implementation of Hashtbl (the one based on Murmur3).
> > > 
> > > It may be worth noting that Perl solved this problem (back in 2003) by
> > > unconditionally using a seed which is a global set to a random number
> > > during interpreter initialization.  
> > 
> > That's how my initial reimplementation of Hashtbl worked, using the
> > Random module to produce seeds, but I was told (correctly) that in
> > security-sensitive applications it's better to leave the generation of
> > random numbers under control of the programmer.  For some applications
> > Random.self_init might be good enough and for others stronger
> > randomness is needed.
> > 
> > Of course, you can trivially emulate Perl's behavior using the new
> > Hashtbl implementation + the Random module.
> 
> I understand it very well that adding support for cryptographically
> secure random numbers to core Ocaml is a challenge. There is no POSIX
> API, and /dev/random is, although widely available, still non-standard.
[...]

And also might not be good enough for some certain areas.

val Hashtbl.HashedType.hash: t -> int

allows at least providing your own hashing.function,
but that function then must be sophisticated enough
to provide some dynamic re-seeding.
Not sure if this does not rather conflict with referential transparency?!
Under the hood such a function of course good do some reading from
a random source... but that looks dirty to me.

So such a function with optional seed-parameters (as mentioned by Xavier leroy) might
make sense; when using the imperative features from my understanding it seems
to be much easier to address that hash-collision problem.


> And it is certainly true that there are various levels of security, and
> for some purposes the programmer should be free to install even better
> generators. Nevertheless, Ocaml is now widely used in environments where
> a certain minimum of security is demanded, and I think Ocaml should
> provide this minimum at least, and use it for things like an
> automatically chosen seed for hash tables.

That's already planned and even implemented, as was mentioned in this thread.
So urging for a new official release would make sense.

> 
> My argument is: Even providing a half solution is in this area better
> than leaving the unwary programmer completely alone. Because in the
> latter case, nothing will be done to address the problems, and apps
> would be easier to attack.

Maybe a "half solution" is, what already is done.
I doubt that hash collisions are a new topic,
so I wonder why such things were not already implemented.
Only that this might be used in attacks seems to be "new".


> 
> What I could imagine is a module Sys.Security where all security
> features are accessible and configurable, e.g.

I doubt that this makes sense.
Nearly anything that can be programmed can become a security
issue, if done wrong; especially things done on the
operating system level (like Unix module) could become
a security issue, if things are handled without care.

A mandatory (not optional) hash-function-parameter
that must be passed (plus some default hash functions
with elaborated documentation on the properties of those)
would make more sense to me.

Putting things that need tp be part of the Hash-module
aside into a Sys.security-module makes these things less
apparent to the programmer who just wants to use hashes,
and don't thinks about using hashes might be a security problem.

So, this solution IMHO would be counterproductive
and non-intuitive.


Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-01 23:24         ` oliver
@ 2012-01-01 23:58           ` Gerd Stolpmann
  2012-01-02  1:43             ` oliver
  0 siblings, 1 reply; 34+ messages in thread
From: Gerd Stolpmann @ 2012-01-01 23:58 UTC (permalink / raw)
  To: oliver; +Cc: Xavier Leroy, caml-list


Am Montag, den 02.01.2012, 00:24 +0100 schrieb oliver:
> > I understand it very well that adding support for cryptographically
> > secure random numbers to core Ocaml is a challenge. There is no POSIX
> > API, and /dev/random is, although widely available, still non-standard.
> [...]
> 
> And also might not be good enough for some certain areas.
> 
> val Hashtbl.HashedType.hash: t -> int
> 
> allows at least providing your own hashing.function,
> but that function then must be sophisticated enough
> to provide some dynamic re-seeding.

It does not allow this. Because of this, the new Hashtbl (only in svn so
far) has the additional seed parameter.

> > And it is certainly true that there are various levels of security, and
> > for some purposes the programmer should be free to install even better
> > generators. Nevertheless, Ocaml is now widely used in environments where
> > a certain minimum of security is demanded, and I think Ocaml should
> > provide this minimum at least, and use it for things like an
> > automatically chosen seed for hash tables.
> 
> That's already planned and even implemented, as was mentioned in this thread.
> So urging for a new official release would make sense.

Not exactly. Xavier mentioned that he withdrew the automatic seeding. If
nothing changes, you will have to provide a seed parameter to every
Hashtbl.create, coming from a rng of your taste. This is what I don't
like - programmers will in most cases simply ignore this possibility,
and I predict it will even be ignored when there are strong indications
for security threats. Also, there is no good access to rng's. Random is
not designed with security in mind. The OS typically provides a
generator which is much better (like /dev/random), but there is no
uniform interface to it that would make it easy to use it.

My concern is that the security options will simply be ignored unless
the standard library includes some half-way secure defaults.
 
> > What I could imagine is a module Sys.Security where all security
> > features are accessible and configurable, e.g.
> 
> I doubt that this makes sense.
> Nearly anything that can be programmed can become a security
> issue, if done wrong; especially things done on the
> operating system level (like Unix module) could become
> a security issue, if things are handled without care.

Thanks for your argumentative help - by being ignorant you prove my
thesis that typical programmers won't do anything by themselves. The
world is so much trouble, better put the head into the sand, and hope
the attacker won't pick you. (Well, sorry, maybe a bit too harsh, but I
hope you get my point that it is no excuse that also other security
issues may exist).

The dangerous thing here is that it is not always apparent which hash
tables are exposed in areas with security demands. So, it would be good
if all hash tables had some basic protection built-in.

> A mandatory (not optional) hash-function-parameter
> that must be passed (plus some default hash functions
> with elaborated documentation on the properties of those)
> would make more sense to me.

The seed is so far optional. Sure, requiring it mandatorily would ensure
it is passed, but I guess programmers would just become used to the
idiom "Hashtbl.create ~seed:0 ()". I'd like to rather see that at least
a default seed is automatically chosen at program startup (there might
even be better schemes like selecting a new default seed after every GC
cycle or so).

> Putting things that need tp be part of the Hash-module
> aside into a Sys.security-module makes these things less
> apparent to the programmer who just wants to use hashes,
> and don't thinks about using hashes might be a security problem.

If done right, this won't be a problem anymore for the typical user (who
e.g. programs a web page, and where nobody would spend millions to hack
it). Of course, if you want to run a trust center, you'll have higher
demands, but I guess you'll check then your code base thoroughly for
issues, and won't forget to pass seed parameters wherever needed.

Gerd

> So, this solution IMHO would be counterproductive
> and non-intuitive.
> 
> 
> Ciao,
>    Oliver
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 16:44 [Caml-list] Hashtbl and security Gerd Stolpmann
                   ` (3 preceding siblings ...)
  2011-12-31  0:33 ` oliver
@ 2012-01-02  0:21 ` Shawn Wagner
  2012-01-02 14:52   ` Gerd Stolpmann
  2012-01-30 10:51 ` Goswin von Brederlow
  5 siblings, 1 reply; 34+ messages in thread
From: Shawn Wagner @ 2012-01-02  0:21 UTC (permalink / raw)
  To: caml-list

On Fri, 30 Dec 2011 17:44:06 +0100
Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
> 
> What are possible fixes?
> 
> 1) Avoid hash tables in contexts where security is relevant. The
> alternative is Set (actually a balanced binary tree), which does not
> show this problem.
> 
> 2) Use cryptographically secure hash functions.
> 
> 3) Use "randomized" hash tables. The trick here is that there is not a
> single hash function h anymore, but a family h(1)...h(n). When the
> hash table is created, one of the functions is picked randomly. This
> makes it impossible to craft an attack request, because you cannot
> predict the function.
> 

There's also an option 4 that's barely been mentioned in any
discussion of this issue I've seen: Use a hash table implementation that
handles collisions in another way than having each bucket be a linked
list. Double hashing and cuckoo hashing come to mind, where an attacker
would have to find keys that map to the same value for not one, but two
or more different hash functions. 

-- 
Shawn Wagner
shawnw@speakeasy.org


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-01 23:58           ` Gerd Stolpmann
@ 2012-01-02  1:43             ` oliver
  2012-01-04 17:56               ` Damien Doligez
  0 siblings, 1 reply; 34+ messages in thread
From: oliver @ 2012-01-02  1:43 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Xavier Leroy, caml-list

On Mon, Jan 02, 2012 at 12:58:03AM +0100, Gerd Stolpmann wrote:
[...]
> > > What I could imagine is a module Sys.Security where all security
> > > features are accessible and configurable, e.g.
> > 
> > I doubt that this makes sense.
> > Nearly anything that can be programmed can become a security
> > issue, if done wrong; especially things done on the
> > operating system level (like Unix module) could become
> > a security issue, if things are handled without care.
> 
> Thanks for your argumentative help - by being ignorant you prove my
> thesis that typical programmers won't do anything by themselves. The
> world is so much trouble, better put the head into the sand, and hope
> the attacker won't pick you. (Well, sorry, maybe a bit too harsh, but I
> hope you get my point that it is no excuse that also other security
> issues may exist).
[...]

You completely did NOT see what I talked about.

I did not say, this problem should be ignored.
At the day when I heard from this bug I also was tempted to
post to this list here; just someone was faster then me.

What I meant was not to ignore the problenm, because there are other
problems; I meant, that there is no reason to ship the functionality
into Sys.Security, if it belongs to Hashtbl.

As you said, it is necessary to point the attention of the programmer
to crucial points, this means, that the Hashtbl-Initialization
needs to be in the Hashtbl module.
The docs for the Hashtbl are the place, where a programmer looks for 
information on Hashtbles of course, and not in some module in any other place.

Especially if the progranmmer is not aware of the problem, it's a necessity
to place the information at the place of the module that you want to use.
This is good style of documentation and also is rather FPL like.
Exporting the Seeds-functionality to some other module is very similar
to using global variables in a C program.

Put the things where they belong to.

And Hashtbl-init belongs to Hashtbl-module.



[...]
> > A mandatory (not optional) hash-function-parameter
> > that must be passed (plus some default hash functions
> > with elaborated documentation on the properties of those)
> > would make more sense to me.
> 
> The seed is so far optional. Sure, requiring it mandatorily would ensure
> it is passed, but I guess programmers would just become used to the
> idiom "Hashtbl.create ~seed:0 ()".

If the type is an abstract type, which comes from something like
Hashtbl.Randomseed
and has type t, not type int, this problem would vanish.

[...]
> > Putting things that need tp be part of the Hash-module
> > aside into a Sys.security-module makes these things less
> > apparent to the programmer who just wants to use hashes,
> > and don't thinks about using hashes might be a security problem.
> 
> If done right, this won't be a problem anymore for the typical user (who

You also mentioned, that programmers might be too lazy or are not aware of the
problem. Then putting the initialization to some unrelated module is not
supporting the programmer to become aware of the problem.
Even if the person already knows that there might be some security issues with the
hashtables, then I doubt, that the first idea of most programmers would be to look
out for something like Sys.Security.
Instead I would assume higher frequency of asking mails on this list,
about this problem.


Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-01 21:04       ` Gerd Stolpmann
  2012-01-01 23:24         ` oliver
@ 2012-01-02  9:34         ` David MENTRE
  1 sibling, 0 replies; 34+ messages in thread
From: David MENTRE @ 2012-01-02  9:34 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Xavier Leroy, caml-list

Hello,

2012/1/1 Gerd Stolpmann <info@gerd-stolpmann.de>:
> Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy:
>> On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
[...]
>> > It may be worth noting that Perl solved this problem (back in 2003) by
>> > unconditionally using a seed which is a global set to a random number
>> > during interpreter initialization.
>>
>> That's how my initial reimplementation of Hashtbl worked, using the
>> Random module to produce seeds, but I was told (correctly) that in
>> security-sensitive applications it's better to leave the generation of
>> random numbers under control of the programmer.  For some applications
>> Random.self_init might be good enough and for others stronger
>> randomness is needed.
>>
>> Of course, you can trivially emulate Perl's behavior using the new
>> Hashtbl implementation + the Random module.
[...]
> Nevertheless, Ocaml is now widely used in environments where
> a certain minimum of security is demanded, and I think Ocaml should
> provide this minimum at least, and use it for things like an
> automatically chosen seed for hash tables.

I share Gerd's opinion that OCaml should provide a "reasonable
default", even if this default my not be enough for applications that
need a strong security.

Another "solution" would be to flag this API as a potential security
issue in the documentation and/or provide a compiler warning to emit a
warning if Hashtbl is used without proper initialization.

Best regards,
david


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-02  0:21 ` Shawn Wagner
@ 2012-01-02 14:52   ` Gerd Stolpmann
  0 siblings, 0 replies; 34+ messages in thread
From: Gerd Stolpmann @ 2012-01-02 14:52 UTC (permalink / raw)
  To: Shawn Wagner; +Cc: caml-list

Am Sonntag, den 01.01.2012, 16:21 -0800 schrieb Shawn Wagner:
> On Fri, 30 Dec 2011 17:44:06 +0100
> Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
> > 
> > What are possible fixes?
> > 
> > 1) Avoid hash tables in contexts where security is relevant. The
> > alternative is Set (actually a balanced binary tree), which does not
> > show this problem.
> > 
> > 2) Use cryptographically secure hash functions.
> > 
> > 3) Use "randomized" hash tables. The trick here is that there is not a
> > single hash function h anymore, but a family h(1)...h(n). When the
> > hash table is created, one of the functions is picked randomly. This
> > makes it impossible to craft an attack request, because you cannot
> > predict the function.
> > 
> 
> There's also an option 4 that's barely been mentioned in any
> discussion of this issue I've seen: Use a hash table implementation that
> handles collisions in another way than having each bucket be a linked
> list. Double hashing and cuckoo hashing come to mind, where an attacker
> would have to find keys that map to the same value for not one, but two
> or more different hash functions. 

When I overlook this correctly, this is just a trick to use more bits.
So when you double-hash with two functions that deliver 30 bits each
you'll effectively have the same security as 60 bits - provided these
functions are really independent of each other.

The choice of the functions is crucial, IMHO. I have no idea how you
would choose them so that when you crack one of these, you don't know
anything about the other. Of course, you could use a secure hash
function like SHA-1 and take distinct ranges of bits, but I think this
is more an implementation of 2), and not what you really mean.

Is there any cryptanalysis of this approach?

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-02  1:43             ` oliver
@ 2012-01-04 17:56               ` Damien Doligez
  2012-01-04 21:52                 ` oliver
  0 siblings, 1 reply; 34+ messages in thread
From: Damien Doligez @ 2012-01-04 17:56 UTC (permalink / raw)
  To: caml users

On 2012-01-02, at 02:43, oliver wrote:

> If the type is an abstract type, which comes from something like
> Hashtbl.Randomseed
> and has type t, not type int, this problem would vanish.

You have to be careful.  If we make hash table randomization mandatory,
the Frama-C people will hate us, as will all the people who want
reproducible results from their programs (for purposes of testing and
benchmarking, for example).

So, even if randomized is the default, there must be a way to get a
plain hash table that does the same thing every time.

-- Damien


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-04 17:56               ` Damien Doligez
@ 2012-01-04 21:52                 ` oliver
  0 siblings, 0 replies; 34+ messages in thread
From: oliver @ 2012-01-04 21:52 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml users

On Wed, Jan 04, 2012 at 06:56:11PM +0100, Damien Doligez wrote:
> On 2012-01-02, at 02:43, oliver wrote:
> 
> > If the type is an abstract type, which comes from something like
> > Hashtbl.Randomseed
> > and has type t, not type int, this problem would vanish.
> 
> You have to be careful.  If we make hash table randomization mandatory,
> the Frama-C people will hate us, as will all the people who want
> reproducible results from their programs (for purposes of testing and
> benchmarking, for example).
[...]

I did not meant it must be mandatory.
But provide a way, that makes it easy to use randomization
and not-so-easy to use the always-same values e.g. for testing puroposes.

If it needs extra effort to make simple seed values, people would prefer the randomized ones,
if not they want to write some extra code (maybe applying a functor).


> 
> So, even if randomized is the default, there must be a way to get a
> plain hash table that does the same thing every time.

Yes, of course.

But maybe it should not be encouraged, and the programmer-in-a-hurry would
use ready-to-use random initializations, which are provided by the Hashtbl-module
and the one who needs it non-randomized would need to write his own addition then.

Then the lazy programmer goes safe and the unsafe way needs extra effort.

Nevertheless I think optional int value as a first fix would also be ok.


And maybe some of you remember the Debian random-device bug (some years ago),
where the random-device under certain circumstances ran out of entropy....

So in any case it needs to be possible to change the random generator.

But pseudo-random is always a compromise.
Who really needs true random should of course use special hardware that
creates wide bandwith noise and uses an ADC to sample the signal.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2011-12-30 16:44 [Caml-list] Hashtbl and security Gerd Stolpmann
                   ` (4 preceding siblings ...)
  2012-01-02  0:21 ` Shawn Wagner
@ 2012-01-30 10:51 ` Goswin von Brederlow
  2012-01-31 14:16   ` Gerd Stolpmann
  5 siblings, 1 reply; 34+ messages in thread
From: Goswin von Brederlow @ 2012-01-30 10:51 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Gerd Stolpmann <info@gerd-stolpmann.de> writes:

> Hi,
>
> there was recently a security alert for web services that use hash
> tables to store web form parameters sent via POST (so that millions of
> such parameters can be sent in a single request). It is possible to keep
> the web service busy for hours with such a DoS (denial of service)
> attack. The type of attack boils down to a problem in most hash table
> implementations, namely that the hash functions are invertible, and it
> is possible for a malicious user to construct lots of keys that all map
> to the same bucket of the hash table, creating a mass collision.
>
> The text of the alert: 
> http://www.nruns.com/_downloads/advisory28122011.pdf
>
> I'd like to discuss this issue, because it is not restricted to the
> processing of web requests, but may also occur for all other data coming
> from untrusted sources. The web is only the most exposed area where this
> issue exists.
>
> So how is Ocaml affected? The hash functions used in recent Ocaml
> releases are also insecure in the above mentioned sense (currently
> MurmurHash3, and even a simpler hash function in previous releases). A
> quick survey of the Internet revealed at least one site that tries to
> break it. Probably a good cryptographer could do it in minutes. 
>
> A pure Hashtbl.add of the constructed keys does not yet lead to the
> performance degradation, but a Hashtbl.replace, and of course
> Hashtbl.find after the table is built up will. So it depends very much
> of the details of the programs whether they are affected or not.
>
> I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
> parameters, so it is not directly vulnerable. But if the crafted request
> is actually served by a handler, the handler would get a degraded table,
> and could show in turn bad performance (again leading to DoS).
>
> What are possible fixes?
>
> 1) Avoid hash tables in contexts where security is relevant. The
> alternative is Set (actually a balanced binary tree), which does not
> show this problem.

Unfortunately O(log n) > O(1) and that can be a deciding factor in the
overall runtime. Even for small n your code then runs 2,3,4,... times
slower.

> 2) Use cryptographically secure hash functions.
>
> 3) Use "randomized" hash tables. The trick here is that there is not a
> single hash function h anymore, but a family h(1)...h(n). When the hash
> table is created, one of the functions is picked randomly. This makes it
> impossible to craft an attack request, because you cannot predict the
> function.
>
> I don't think 1) is viable - hash tables are too wide spread, and are
> loved by programmers because of their good performance. 2) would be good
> in extremely critical contexts - although it is then again questionable,
> because it is likely to have worse performance than 1).
>
> So, the question is how to do 3). I see two problems here:
>
> a) how to define the family of hash functions. Is it e.g. sufficient to
> introduce an initialization vector for the Murmurhash algorithm, and
> fill it randomly? How to get a random number that is good enough?
>
> b) the Hashtbl in the standard library does not allow it to set the hash
> function dynamically. Maybe one can get this effect by using first-class
> modules (haven't checked).
>
> Anyway, I think these problems are difficult enough to deserve some
> discussion here. At least I cannot solve them immediately, and this
> problem may exist for lots of Ocaml applications.
>
> Gerd

You are forgetting a variable in this. To create a DOS in the hashtable
you need to hit the same bucket again and again. And bucket = hash % size.

You focused on the hash but lets talk about the size for a moment.

The size is rather limited and large hashtabels simply won't increate
size anymore and allways have lots of collisions. So even if someone
isn't actively trying to create DOS the performace still tanks as you
add more items.

And this isn't only a problem in ocaml. The glib hashtable also has a
maximum size in the 32bit range for example.


So for ocaml the first thing that needs to be fixed is to allow larger
size arrays of buckets.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-01 17:29     ` Xavier Leroy
  2012-01-01 21:04       ` Gerd Stolpmann
@ 2012-01-30 10:54       ` Goswin von Brederlow
  1 sibling, 0 replies; 34+ messages in thread
From: Goswin von Brederlow @ 2012-01-30 10:54 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Xavier Leroy <Xavier.Leroy@inria.fr> writes:

> On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
>> On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
>>> Indeed.  The optional "seed" parameter to Hashtbl.create does exactly
>>> this in the new implementation of Hashtbl (the one based on Murmur3).
>> 
>> It may be worth noting that Perl solved this problem (back in 2003) by
>> unconditionally using a seed which is a global set to a random number
>> during interpreter initialization.  
>
> That's how my initial reimplementation of Hashtbl worked, using the
> Random module to produce seeds, but I was told (correctly) that in
> security-sensitive applications it's better to leave the generation of
> random numbers under control of the programmer.  For some applications
> Random.self_init might be good enough and for others stronger
> randomness is needed.
>
> Of course, you can trivially emulate Perl's behavior using the new
> Hashtbl implementation + the Random module.
>
> - Xavier Leroy

It is also crucial if you are doing performance tests or debugging. You
want the same behaviour on every run for that.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-30 10:51 ` Goswin von Brederlow
@ 2012-01-31 14:16   ` Gerd Stolpmann
  2012-02-08  9:41     ` Goswin von Brederlow
  0 siblings, 1 reply; 34+ messages in thread
From: Gerd Stolpmann @ 2012-01-31 14:16 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: caml-list


> Gerd Stolpmann <info@gerd-stolpmann.de> writes:
>
>> Hi,
>>
>> there was recently a security alert for web services that use hash
>> tables to store web form parameters sent via POST (so that millions of
>> such parameters can be sent in a single request). It is possible to keep
>> the web service busy for hours with such a DoS (denial of service)
>> attack. The type of attack boils down to a problem in most hash table
>> implementations, namely that the hash functions are invertible, and it
>> is possible for a malicious user to construct lots of keys that all map
>> to the same bucket of the hash table, creating a mass collision.
>>
>> The text of the alert:
>> http://www.nruns.com/_downloads/advisory28122011.pdf
>>
>> I'd like to discuss this issue, because it is not restricted to the
>> processing of web requests, but may also occur for all other data coming
>> from untrusted sources. The web is only the most exposed area where this
>> issue exists.
>>
>> So how is Ocaml affected? The hash functions used in recent Ocaml
>> releases are also insecure in the above mentioned sense (currently
>> MurmurHash3, and even a simpler hash function in previous releases). A
>> quick survey of the Internet revealed at least one site that tries to
>> break it. Probably a good cryptographer could do it in minutes.
>>
>> A pure Hashtbl.add of the constructed keys does not yet lead to the
>> performance degradation, but a Hashtbl.replace, and of course
>> Hashtbl.find after the table is built up will. So it depends very much
>> of the details of the programs whether they are affected or not.
>>
>> I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
>> parameters, so it is not directly vulnerable. But if the crafted request
>> is actually served by a handler, the handler would get a degraded table,
>> and could show in turn bad performance (again leading to DoS).
>>
>> What are possible fixes?
>>
>> 1) Avoid hash tables in contexts where security is relevant. The
>> alternative is Set (actually a balanced binary tree), which does not
>> show this problem.
>
> Unfortunately O(log n) > O(1) and that can be a deciding factor in the
> overall runtime. Even for small n your code then runs 2,3,4,... times
> slower.
>
>> 2) Use cryptographically secure hash functions.
>>
>> 3) Use "randomized" hash tables. The trick here is that there is not a
>> single hash function h anymore, but a family h(1)...h(n). When the hash
>> table is created, one of the functions is picked randomly. This makes it
>> impossible to craft an attack request, because you cannot predict the
>> function.
>>
>> I don't think 1) is viable - hash tables are too wide spread, and are
>> loved by programmers because of their good performance. 2) would be good
>> in extremely critical contexts - although it is then again questionable,
>> because it is likely to have worse performance than 1).
>>
>> So, the question is how to do 3). I see two problems here:
>>
>> a) how to define the family of hash functions. Is it e.g. sufficient to
>> introduce an initialization vector for the Murmurhash algorithm, and
>> fill it randomly? How to get a random number that is good enough?
>>
>> b) the Hashtbl in the standard library does not allow it to set the hash
>> function dynamically. Maybe one can get this effect by using first-class
>> modules (haven't checked).
>>
>> Anyway, I think these problems are difficult enough to deserve some
>> discussion here. At least I cannot solve them immediately, and this
>> problem may exist for lots of Ocaml applications.
>>
>> Gerd
>
> You are forgetting a variable in this. To create a DOS in the hashtable
> you need to hit the same bucket again and again. And bucket = hash % size.
>
> You focused on the hash but lets talk about the size for a moment.
>
> The size is rather limited and large hashtabels simply won't increate
> size anymore and allways have lots of collisions. So even if someone
> isn't actively trying to create DOS the performace still tanks as you
> add more items.

I cannot completely follow here. If you add more elements, the bucket
array is resized. The ocaml Hashtbl does this when the number of elements
exceeds half of the array size. The argument of Hashtbl.create is only the
initial size of the bucket array.

The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
platforms it is low (4M). But nobody of sound mind would run ocaml
programs on 32 bit for production purposes. (If you do, consider to
switch. You can use more memory and get a little performance boost, not
only for ocaml programs.)

Gerd

> And this isn't only a problem in ocaml. The glib hashtable also has a
> maximum size in the 32bit range for example.
>
>
> So for ocaml the first thing that needs to be fixed is to allow larger
> size arrays of buckets.
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


-- 
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-01-31 14:16   ` Gerd Stolpmann
@ 2012-02-08  9:41     ` Goswin von Brederlow
  2012-02-08 10:43       ` Philippe Wang
                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Goswin von Brederlow @ 2012-02-08  9:41 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Goswin von Brederlow, caml-list

"Gerd Stolpmann" <info@gerd-stolpmann.de> writes:

>> Gerd Stolpmann <info@gerd-stolpmann.de> writes:
>> You are forgetting a variable in this. To create a DOS in the hashtable
>> you need to hit the same bucket again and again. And bucket = hash % size.
>>
>> You focused on the hash but lets talk about the size for a moment.
>>
>> The size is rather limited and large hashtabels simply won't increate
>> size anymore and allways have lots of collisions. So even if someone
>> isn't actively trying to create DOS the performace still tanks as you
>> add more items.
>
> I cannot completely follow here. If you add more elements, the bucket
> array is resized. The ocaml Hashtbl does this when the number of elements
> exceeds half of the array size. The argument of Hashtbl.create is only the
> initial size of the bucket array.
>
> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
> platforms it is low (4M). But nobody of sound mind would run ocaml
> programs on 32 bit for production purposes. (If you do, consider to
> switch. You can use more memory and get a little performance boost, not
> only for ocaml programs.)
>
> Gerd

In practice I see a serious performance degradation with large
hashtables. So much so that I had to use an array of hashtables or
hashtable of hashtables and split the values across them.

Maybe the size of the hashtable indeed isn't the problem but the hash
function. Here is something odd:

# for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 lsl i)) done;;
2 2
4 4
8 8
16 16
32 32
64 64
128 128
256 256
512 512
1024 1024
2048 2048
4096 4096
8192 8192
16384 16384
32768 32768
65536 65536
131072 131072
262144 262144
524288 524288
1048576 1048576
2097152 2097152
4194304 4194304
8388608 8388608
16777216 16777216
33554432 33554432
67108864 67108864
134217728 134217728
268435456 268435456
536870912 536870912
1073741824 0
2147483648 0
4294967296 0
8589934592 0
17179869184 0
34359738368 0
68719476736 0
137438953472 0
274877906944 0
549755813888 0
1099511627776 0
2199023255552 0
4398046511104 0
8796093022208 0
17592186044416 0
35184372088832 0
70368744177664 0
140737488355328 0
281474976710656 0
562949953421312 0
1125899906842624 0
2251799813685248 0
4503599627370496 0
9007199254740992 0
18014398509481984 0
36028797018963968 0
72057594037927936 0
144115188075855872 0
288230376151711744 0
576460752303423488 0
1152921504606846976 0
2305843009213693952 0
-4611686018427387904 0
0 0
- : unit = ()

Is every value over a billion hashed to 0?

MfG
        Goswin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-02-08  9:41     ` Goswin von Brederlow
@ 2012-02-08 10:43       ` Philippe Wang
  2012-02-08 10:46       ` AUGER Cédric
  2012-02-08 11:12       ` Gerd Stolpmann
  2 siblings, 0 replies; 34+ messages in thread
From: Philippe Wang @ 2012-02-08 10:43 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Gerd Stolpmann, caml-list

On Wed, Feb 8, 2012 at 10:41 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> [...]
> Maybe the size of the hashtable indeed isn't the problem but the hash
> function. Here is something odd:
>
> # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 lsl i)) done;;
> 2 2
> 4 4
> 8 8
> 16 16
> 32 32
> 64 64
> 128 128
> 256 256
> 512 512
> 1024 1024
> 2048 2048
> 4096 4096
> 8192 8192
> 16384 16384
> 32768 32768
> 65536 65536
> 131072 131072
> 262144 262144
> 524288 524288
> 1048576 1048576
> 2097152 2097152
> 4194304 4194304
> 8388608 8388608
> 16777216 16777216
> 33554432 33554432
> 67108864 67108864
> 134217728 134217728
> 268435456 268435456
> 536870912 536870912
> 1073741824 0
> 2147483648 0
> 4294967296 0
> 8589934592 0
> 17179869184 0
> 34359738368 0
> 68719476736 0
> 137438953472 0
> 274877906944 0
> 549755813888 0
> 1099511627776 0
> 2199023255552 0
> 4398046511104 0
> 8796093022208 0
> 17592186044416 0
> 35184372088832 0
> 70368744177664 0
> 140737488355328 0
> 281474976710656 0
> 562949953421312 0
> 1125899906842624 0
> 2251799813685248 0
> 4503599627370496 0
> 9007199254740992 0
> 18014398509481984 0
> 36028797018963968 0
> 72057594037927936 0
> 144115188075855872 0
> 288230376151711744 0
> 576460752303423488 0
> 1152921504606846976 0
> 2305843009213693952 0
> -4611686018427387904 0
> 0 0
> - : unit = ()
>
> Is every value over a billion hashed to 0?
>
> MfG
>        Goswin
>


Hi,

you're making it look worse than it really is:

$ ocaml
        Objective Caml version 3.12.0

# Hashtbl.hash 1000000001;;
- : int = 1000000001
# Hashtbl.hash 1073741824;;
- : int = 0
# Hashtbl.hash 1073741825;;
- : int = 1
# Hashtbl.hash 1073741826;;
- : int = 2
# Hashtbl.hash 1073741829;;
- : int = 5
# Hashtbl.hash 1073741830;;
- : int = 6
# Hashtbl.hash 9007199254740994;;
- : int = 2
# Hashtbl.hash 9007199254740990;;
- : int = 1073741822

Cheers,

-- 
Philippe Wang
   mail@philippewang.info


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-02-08  9:41     ` Goswin von Brederlow
  2012-02-08 10:43       ` Philippe Wang
@ 2012-02-08 10:46       ` AUGER Cédric
  2012-02-09 13:22         ` Goswin von Brederlow
  2012-02-08 11:12       ` Gerd Stolpmann
  2 siblings, 1 reply; 34+ messages in thread
From: AUGER Cédric @ 2012-02-08 10:46 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Gerd Stolpmann, caml-list

Le Wed, 08 Feb 2012 10:41:03 +0100,
Goswin von Brederlow <goswin-v-b@web.de> a écrit :

> "Gerd Stolpmann" <info@gerd-stolpmann.de> writes:
> 
> >> Gerd Stolpmann <info@gerd-stolpmann.de> writes:
> >> You are forgetting a variable in this. To create a DOS in the
> >> hashtable you need to hit the same bucket again and again. And
> >> bucket = hash % size.
> >>
> >> You focused on the hash but lets talk about the size for a moment.
> >>
> >> The size is rather limited and large hashtabels simply won't
> >> increate size anymore and allways have lots of collisions. So even
> >> if someone isn't actively trying to create DOS the performace
> >> still tanks as you add more items.
> >
> > I cannot completely follow here. If you add more elements, the
> > bucket array is resized. The ocaml Hashtbl does this when the
> > number of elements exceeds half of the array size. The argument of
> > Hashtbl.create is only the initial size of the bucket array.
> >
> > The upper bound is Sys.max_array_length, of course. Granted, on 32
> > bit platforms it is low (4M). But nobody of sound mind would run
> > ocaml programs on 32 bit for production purposes. (If you do,
> > consider to switch. You can use more memory and get a little
> > performance boost, not only for ocaml programs.)
> >
> > Gerd
> 
> In practice I see a serious performance degradation with large
> hashtables. So much so that I had to use an array of hashtables or
> hashtable of hashtables and split the values across them.
> 
> Maybe the size of the hashtable indeed isn't the problem but the hash
> function. Here is something odd:
> 
> # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash
> (1 lsl i)) done;; 2 2
> 4 4
> 8 8
> 16 16
> 32 32
> 64 64
> 128 128
> 256 256
> 512 512
> 1024 1024
> 2048 2048
> 4096 4096
> 8192 8192
> 16384 16384
> 32768 32768
> 65536 65536
> 131072 131072
> 262144 262144
> 524288 524288
> 1048576 1048576
> 2097152 2097152
> 4194304 4194304
> 8388608 8388608
> 16777216 16777216
> 33554432 33554432
> 67108864 67108864
> 134217728 134217728
> 268435456 268435456
> 536870912 536870912
> 1073741824 0
> 2147483648 0
> 4294967296 0
> 8589934592 0
> 17179869184 0
> 34359738368 0
> 68719476736 0
> 137438953472 0
> 274877906944 0
> 549755813888 0
> 1099511627776 0
> 2199023255552 0
> 4398046511104 0
> 8796093022208 0
> 17592186044416 0
> 35184372088832 0
> 70368744177664 0
> 140737488355328 0
> 281474976710656 0
> 562949953421312 0
> 1125899906842624 0
> 2251799813685248 0
> 4503599627370496 0
> 9007199254740992 0
> 18014398509481984 0
> 36028797018963968 0
> 72057594037927936 0
> 144115188075855872 0
> 288230376151711744 0
> 576460752303423488 0
> 1152921504606846976 0
> 2305843009213693952 0
> -4611686018427387904 0
> 0 0
> - : unit = ()
> 
> Is every value over a billion hashed to 0?
> 
> MfG
>         Goswin
> 

Go to:
http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?view=markup
line 285

if you want another hasfunction, you can download the sources, modify
it and recompile.

More seriously, hashtables are here to store values, so you don't want
want to have a hash value bigger than (2^30)-1.

In fact even such a value is too big for that purpouse IMHO, since on a
4Gio RAM computer, it will take half it to store the table, assuming
each entry only takes 4 octets, that is it would be ok only for storing
just the structure of the table (each entry has to be a pointer),
so that in practice, either your table is ridiculously big to store
too few values either it is relevant to have such a size and in this
case your computer will spend all its time to swap since you won't
have enough RAM.

 Clearly you cannot a bijection from 63 bits
integers to 31 bits integers; so there are a lot of collisions. I do
not have a 64 arch, but I guesse that if you hash "2^48+168" you will
get "168". I guess such a function is ways to be ideal, since when
playing with bitvectors having two integers which differs by only one
bit is very common, but at least it is a fast function, and has a good
repartition (you have 2^32 collisions exactly per bucket).


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-02-08  9:41     ` Goswin von Brederlow
  2012-02-08 10:43       ` Philippe Wang
  2012-02-08 10:46       ` AUGER Cédric
@ 2012-02-08 11:12       ` Gerd Stolpmann
  2012-02-09 13:11         ` Goswin von Brederlow
  2 siblings, 1 reply; 34+ messages in thread
From: Gerd Stolpmann @ 2012-02-08 11:12 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Gerd Stolpmann, Goswin von Brederlow, caml-list

Well, looks like that the upper 32 bits aren't considered at all for
computing the hash value. This is for sure surprising - I would have
expected that these bits somehow go in to the final value (e.g. that they
are xor-ed with the lower half).

My guess is that this was done to ensure that an int-to-something
hashtable can be marshalled from 64 to 32 bit systems. Not sure for what
this is important, but it looks like a nice symmetry.

In trunk the method for ensuring this seems to have changed (see function
caml_hash_mix_intnat in
http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072&view=markup).

Gerd

> "Gerd Stolpmann" <info@gerd-stolpmann.de> writes:
>
>>> Gerd Stolpmann <info@gerd-stolpmann.de> writes:
>>> You are forgetting a variable in this. To create a DOS in the hashtable
>>> you need to hit the same bucket again and again. And bucket = hash %
>>> size.
>>>
>>> You focused on the hash but lets talk about the size for a moment.
>>>
>>> The size is rather limited and large hashtabels simply won't increate
>>> size anymore and allways have lots of collisions. So even if someone
>>> isn't actively trying to create DOS the performace still tanks as you
>>> add more items.
>>
>> I cannot completely follow here. If you add more elements, the bucket
>> array is resized. The ocaml Hashtbl does this when the number of
>> elements
>> exceeds half of the array size. The argument of Hashtbl.create is only
>> the
>> initial size of the bucket array.
>>
>> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
>> platforms it is low (4M). But nobody of sound mind would run ocaml
>> programs on 32 bit for production purposes. (If you do, consider to
>> switch. You can use more memory and get a little performance boost, not
>> only for ocaml programs.)
>>
>> Gerd
>
> In practice I see a serious performance degradation with large
> hashtables. So much so that I had to use an array of hashtables or
> hashtable of hashtables and split the values across them.
>
> Maybe the size of the hashtable indeed isn't the problem but the hash
> function. Here is something odd:
>
> # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1
> lsl i)) done;;
> 2 2
> 4 4
> 8 8
> 16 16
> 32 32
> 64 64
> 128 128
> 256 256
> 512 512
> 1024 1024
> 2048 2048
> 4096 4096
> 8192 8192
> 16384 16384
> 32768 32768
> 65536 65536
> 131072 131072
> 262144 262144
> 524288 524288
> 1048576 1048576
> 2097152 2097152
> 4194304 4194304
> 8388608 8388608
> 16777216 16777216
> 33554432 33554432
> 67108864 67108864
> 134217728 134217728
> 268435456 268435456
> 536870912 536870912
> 1073741824 0
> 2147483648 0
> 4294967296 0
> 8589934592 0
> 17179869184 0
> 34359738368 0
> 68719476736 0
> 137438953472 0
> 274877906944 0
> 549755813888 0
> 1099511627776 0
> 2199023255552 0
> 4398046511104 0
> 8796093022208 0
> 17592186044416 0
> 35184372088832 0
> 70368744177664 0
> 140737488355328 0
> 281474976710656 0
> 562949953421312 0
> 1125899906842624 0
> 2251799813685248 0
> 4503599627370496 0
> 9007199254740992 0
> 18014398509481984 0
> 36028797018963968 0
> 72057594037927936 0
> 144115188075855872 0
> 288230376151711744 0
> 576460752303423488 0
> 1152921504606846976 0
> 2305843009213693952 0
> -4611686018427387904 0
> 0 0
> - : unit = ()
>
> Is every value over a billion hashed to 0?
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


-- 
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-02-08 11:12       ` Gerd Stolpmann
@ 2012-02-09 13:11         ` Goswin von Brederlow
  0 siblings, 0 replies; 34+ messages in thread
From: Goswin von Brederlow @ 2012-02-09 13:11 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Goswin von Brederlow, caml-list

"Gerd Stolpmann" <info@gerd-stolpmann.de> writes:

> Well, looks like that the upper 32 bits aren't considered at all for
> computing the hash value. This is for sure surprising - I would have
> expected that these bits somehow go in to the final value (e.g. that they
> are xor-ed with the lower half).
>
> My guess is that this was done to ensure that an int-to-something
> hashtable can be marshalled from 64 to 32 bit systems. Not sure for what
> this is important, but it looks like a nice symmetry.

Marshaling hashtables of ints might be an argument. BUT:

# Hashtbl.hash (1 lsr 40);;
- : int = 0
# Hashtbl.hash (Int64.of_int (1 lsr 40));;
- : int = 0
# Hashtbl.hash (Int64.of_int (1 lsr 40 + 1));;
- : int = 1

> In trunk the method for ensuring this seems to have changed (see function
> caml_hash_mix_intnat in
> http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072&view=markup).
>
> Gerd

Lets hope that fixes this.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-02-08 10:46       ` AUGER Cédric
@ 2012-02-09 13:22         ` Goswin von Brederlow
  2012-02-09 14:48           ` Gerd Stolpmann
  0 siblings, 1 reply; 34+ messages in thread
From: Goswin von Brederlow @ 2012-02-09 13:22 UTC (permalink / raw)
  To: AUGER Cdric; +Cc: Goswin von Brederlow, Gerd Stolpmann, caml-list

AUGER Cédric <Cedric.Auger@lri.fr> writes:

> More seriously, hashtables are here to store values, so you don't want
> want to have a hash value bigger than (2^30)-1.
>
> In fact even such a value is too big for that purpouse IMHO, since on a
> 4Gio RAM computer, it will take half it to store the table, assuming
> each entry only takes 4 octets, that is it would be ok only for storing
> just the structure of the table (each entry has to be a pointer),
> so that in practice, either your table is ridiculously big to store
> too few values either it is relevant to have such a size and in this
> case your computer will spend all its time to swap since you won't
> have enough RAM.

But I have 128GB ram. :) That allows for a hashtable of size 2^33
entries pointing at simple values (e.g. ints) without swapping. And ram
sizes get bigger and bigger.

[Doesn't a Hashtbl of ints store the ints directly? That would allow
2^34 entries.]

>  Clearly you cannot a bijection from 63 bits
> integers to 31 bits integers; so there are a lot of collisions. I do
> not have a 64 arch, but I guesse that if you hash "2^48+168" you will
> get "168". I guess such a function is ways to be ideal, since when
> playing with bitvectors having two integers which differs by only one
> bit is very common, but at least it is a fast function, and has a good
> repartition (you have 2^32 collisions exactly per bucket).

Ignoring the upper bits makes for a verry verry poor hash function. Also
means it is 100% trivial to create a DOS by tuneing your input values to
only differ in the upper bits.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] Hashtbl and security
  2012-02-09 13:22         ` Goswin von Brederlow
@ 2012-02-09 14:48           ` Gerd Stolpmann
  0 siblings, 0 replies; 34+ messages in thread
From: Gerd Stolpmann @ 2012-02-09 14:48 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: AUGER Cdric, caml-list


> AUGER Cédric <Cedric.Auger@lri.fr> writes:
>
>> More seriously, hashtables are here to store values, so you don't want
>> want to have a hash value bigger than (2^30)-1.
>>
>> In fact even such a value is too big for that purpouse IMHO, since on a
>> 4Gio RAM computer, it will take half it to store the table, assuming
>> each entry only takes 4 octets, that is it would be ok only for storing
>> just the structure of the table (each entry has to be a pointer),
>> so that in practice, either your table is ridiculously big to store
>> too few values either it is relevant to have such a size and in this
>> case your computer will spend all its time to swap since you won't
>> have enough RAM.
>
> But I have 128GB ram. :) That allows for a hashtable of size 2^33
> entries pointing at simple values (e.g. ints) without swapping. And ram
> sizes get bigger and bigger.

Fully agreed. The limitation to 30 bits is already unpractical for such
large computers. Maybe we need a second hash function

val Hashtbl.wide_hash : 'a -> int

without this constraint (on 64 bit systems).

Btw, you can already define your own hash functions, just use the
functorized interface. And you are not limited to 30 bits. It's a bit
inconvenient only.

Gerd



>
> [Doesn't a Hashtbl of ints store the ints directly? That would allow
> 2^34 entries.]
>
>>  Clearly you cannot a bijection from 63 bits
>> integers to 31 bits integers; so there are a lot of collisions. I do
>> not have a 64 arch, but I guesse that if you hash "2^48+168" you will
>> get "168". I guess such a function is ways to be ideal, since when
>> playing with bitvectors having two integers which differs by only one
>> bit is very common, but at least it is a fast function, and has a good
>> repartition (you have 2^32 collisions exactly per bucket).
>
> Ignoring the upper bits makes for a verry verry poor hash function. Also
> means it is 100% trivial to create a DOS by tuneing your input values to
> only differ in the upper bits.
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


-- 
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2012-02-09 14:49 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-30 16:44 [Caml-list] Hashtbl and security Gerd Stolpmann
2011-12-30 16:48 ` Yaron Minsky
2011-12-30 19:01   ` David Allsopp
2011-12-30 20:52     ` Yaron Minsky
2011-12-30 21:54       ` Gerd Stolpmann
2011-12-30 17:06 ` Xavier Leroy
2011-12-30 21:16   ` Gerd Stolpmann
2011-12-31  0:57   ` oliver
2011-12-31  0:59     ` oliver
2012-01-01 12:52   ` Richard W.M. Jones
2012-01-01 17:29     ` Xavier Leroy
2012-01-01 21:04       ` Gerd Stolpmann
2012-01-01 23:24         ` oliver
2012-01-01 23:58           ` Gerd Stolpmann
2012-01-02  1:43             ` oliver
2012-01-04 17:56               ` Damien Doligez
2012-01-04 21:52                 ` oliver
2012-01-02  9:34         ` David MENTRE
2012-01-30 10:54       ` Goswin von Brederlow
2011-12-30 17:40 ` rixed
2011-12-30 17:52   ` Edgar Friendly
2011-12-31  1:02   ` oliver
2011-12-31  0:33 ` oliver
2012-01-02  0:21 ` Shawn Wagner
2012-01-02 14:52   ` Gerd Stolpmann
2012-01-30 10:51 ` Goswin von Brederlow
2012-01-31 14:16   ` Gerd Stolpmann
2012-02-08  9:41     ` Goswin von Brederlow
2012-02-08 10:43       ` Philippe Wang
2012-02-08 10:46       ` AUGER Cédric
2012-02-09 13:22         ` Goswin von Brederlow
2012-02-09 14:48           ` Gerd Stolpmann
2012-02-08 11:12       ` Gerd Stolpmann
2012-02-09 13:11         ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).