[Caml-list] Hardening [Perl's] hash function further

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] Hardening [Perl's] hash function further
@ 2013-11-18 20:44 Richard W.M. Jones
  2013-11-19  0:08 ` Gerd Stolpmann
  0 siblings, 1 reply; 18+ messages in thread
From: Richard W.M. Jones @ 2013-11-18 20:44 UTC (permalink / raw)
  To: caml-list


The Perl community have found further attacks against their
(already considered hardened) hash function.  This article
explains the problems quite well:

http://blog.booking.com/hardening-perls-hash-function.html

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-18 20:44 [Caml-list] Hardening [Perl's] hash function further Richard W.M. Jones
@ 2013-11-19  0:08 ` Gerd Stolpmann
  2013-11-19  7:53   ` David MENTRE
  2013-11-25 13:38   ` Goswin von Brederlow
  0 siblings, 2 replies; 18+ messages in thread
From: Gerd Stolpmann @ 2013-11-19  0:08 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

Am Montag, den 18.11.2013, 20:44 +0000 schrieb Richard W.M. Jones:
> The Perl community have found further attacks against their
> (already considered hardened) hash function.  This article
> explains the problems quite well:
> 
> http://blog.booking.com/hardening-perls-hash-function.html

Interesting read. I'm wondering which conclusions one should draw from
it. This article is more or less discussing cryptoattacks on an insecure
hash function, and the new versions of the functions are only "better"
but in no way safe (which you only get with crypto hashes or
dictionaries that don't allow collisions). This sounds for me like a
race between attackers and hash implementors.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19  0:08 ` Gerd Stolpmann
@ 2013-11-19  7:53   ` David MENTRE
  2013-11-19  8:50     ` Richard W.M. Jones
                       ` (2 more replies)
  2013-11-25 13:38   ` Goswin von Brederlow
  1 sibling, 3 replies; 18+ messages in thread
From: David MENTRE @ 2013-11-19  7:53 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Richard W.M. Jones, caml users

Hello,

2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
> This article is more or less discussing cryptoattacks on an insecure
> hash function, and the new versions of the functions are only "better"
> but in no way safe (which you only get with crypto hashes or
> dictionaries that don't allow collisions).

Which in turn raises the question: Who has the responsibility to
implement such hashes that can be exposed to the Internet: OCaml
standard library or dedicated frameworks like OCamlNet or Ocsigen? In
other words, shouldn't we keep simple hashes in the OCaml standard
library (for people making regular programs) and let people making web
applications chose the proper crypto hash in a specialized library?

Best regards,
david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19  7:53   ` David MENTRE
@ 2013-11-19  8:50     ` Richard W.M. Jones
  2013-11-19  9:14     ` Gabriel Scherer
  2013-11-19 22:15     ` Nicolas Braud-Santoni
  2 siblings, 0 replies; 18+ messages in thread
From: Richard W.M. Jones @ 2013-11-19  8:50 UTC (permalink / raw)
  To: David MENTRE; +Cc: Gerd Stolpmann, caml users

On Tue, Nov 19, 2013 at 08:53:23AM +0100, David MENTRE wrote:
> Hello,
> 
> 2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
> > This article is more or less discussing cryptoattacks on an insecure
> > hash function, and the new versions of the functions are only "better"
> > but in no way safe (which you only get with crypto hashes or
> > dictionaries that don't allow collisions).
> 
> Which in turn raises the question: Who has the responsibility to
> implement such hashes that can be exposed to the Internet: OCaml
> standard library or dedicated frameworks like OCamlNet or Ocsigen? In
> other words, shouldn't we keep simple hashes in the OCaml standard
> library (for people making regular programs) and let people making web
> applications chose the proper crypto hash in a specialized library?

In Perl, the hash is a fundamental structure.  It has the same
position of importance as the linked list has in functional
languages[1].  Perl doesn't have structs (it uses hashes), it doesn't
have associative arrays (hashes instead), nor objects (hashes again).

In other words it'd be very difficult to write a non-trivial Perl
application that didn't depend on the hash implementation of the Perl
interpreter.

OCaml is different in that Map/Hashtbl are not fundamental to writing
programs.  You can write respectable programs without needing to use
those modules at all.

So I think it's probably the job of the web frameworks to keep an eye
on this and perhaps implement cryptographically strong hash tables (as
well as taking many other steps to mitigate attacks).

Rich.

[1] The one time I wrote an OCaml program which spectacularly blew up
in production, it was because I was using an assoc list which managed
to expand to 10000's of entries.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19  7:53   ` David MENTRE
  2013-11-19  8:50     ` Richard W.M. Jones
@ 2013-11-19  9:14     ` Gabriel Scherer
  2013-11-19 11:19       ` Dario Teixeira
  2013-11-19 22:15     ` Nicolas Braud-Santoni
  2 siblings, 1 reply; 18+ messages in thread
From: Gabriel Scherer @ 2013-11-19  9:14 UTC (permalink / raw)
  To: David MENTRE; +Cc: Gerd Stolpmann, Richard W.M. Jones, caml users

I've thought about this during the last debate on randomized hashing
(which I think is now known to be ineffective, as implemented, to be
secure), and I think a reasonable goal would be to use a balanced tree
of buckets to a good worst-case, instead of fighting at the hash
function level. As said at the time, this is what Core use, but I'm
not sure what the bookkeeping overhead is against current hashtables.
I hope it's possible to fight the constants to get something in the
same performance ballpark.

On Tue, Nov 19, 2013 at 8:53 AM, David MENTRE <dmentre@linux-france.org> wrote:
> Hello,
>
> 2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
>> This article is more or less discussing cryptoattacks on an insecure
>> hash function, and the new versions of the functions are only "better"
>> but in no way safe (which you only get with crypto hashes or
>> dictionaries that don't allow collisions).
>
> Which in turn raises the question: Who has the responsibility to
> implement such hashes that can be exposed to the Internet: OCaml
> standard library or dedicated frameworks like OCamlNet or Ocsigen? In
> other words, shouldn't we keep simple hashes in the OCaml standard
> library (for people making regular programs) and let people making web
> applications chose the proper crypto hash in a specialized library?
>
> Best regards,
> david
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19  9:14     ` Gabriel Scherer
@ 2013-11-19 11:19       ` Dario Teixeira
  2013-11-19 12:55         ` rixed
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Dario Teixeira @ 2013-11-19 11:19 UTC (permalink / raw)
  To: Gabriel Scherer, David MENTRE
  Cc: Gerd Stolpmann, Richard W.M. Jones, caml users

Hi,

> I've thought about this during the last debate on randomized hashing

> (which I think is now known to be ineffective, as implemented, to be
> secure), and I think a reasonable goal would be to use a balanced tree
> of buckets to a good worst-case, instead of fighting at the hash
> function level. As said at the time, this is what Core use, but I'm
> not sure what the bookkeeping overhead is against current hashtables.
> I hope it's possible to fight the constants to get something in the
> same performance ballpark.

Just to make sure we are all on the same page, allow me to summarise
the main points under discussion.  I think we all agree on the following:

 - Any hashtable implementation that uses a non-cryptographic hash function
   and relies on an association list of buckets is vulnerable to attacks
   that force the worst-case O(n) behaviour.  Though you can buy some time
   by tweaking the hash function, you are still stuck in an arms race with
   attackers.

 - Solution approach #1: switch to a cryptographic hash function.  This
   approach is simple and would solve the problem once and for all.
   Unfortunately, the performance would be terrible (how much though?).

 - Solution approach #2: keep the non-cryptographic hash function but use
   a balanced tree for the buckets instead of an association list.  This
   should mitigate the worst-case scenario -- it would go from O(n) to
   O(log n) -- and performance is most likely better than the first approach.

 - Since this problem presents itself on limited domains, it seems unfair
   to ask *all* applications to sacrifice performance to protect themselves
   from non-existent attackers.  Therefore, if the adopted solution does
   indeed have a serious performance penalty, then it should be opt-in.

Did I miss something important?  Anyway, it seems we are lacking actual
metrics to quantify the performance impact of the various approaches.
To complicate matters further, on some architectures but not on others
you may now or in the future have silicon that speeds up the calculation
of cryptographic hash functions...

Best regards,
Dario Teixeira

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19 11:19       ` Dario Teixeira
@ 2013-11-19 12:55         ` rixed
  2013-11-19 22:18           ` Nicolas Braud-Santoni
  2013-11-19 22:31         ` Nicolas Braud-Santoni
  2013-11-20 18:56         ` Florian Weimer
  2 siblings, 1 reply; 18+ messages in thread
From: rixed @ 2013-11-19 12:55 UTC (permalink / raw)
  To: caml users; +Cc: Dario Teixeira

-[ Tue, Nov 19, 2013 at 03:19:13AM -0800, Dario Teixeira ]----
> To complicate matters further, on some architectures but not on others
> you may now or in the future have silicon that speeds up the calculation
> of cryptographic hash functions...

To complicate matters even further, many users concerned about security
will refuse to rely on a hardware blackbox implementation anyway. ;-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19  7:53   ` David MENTRE
  2013-11-19  8:50     ` Richard W.M. Jones
  2013-11-19  9:14     ` Gabriel Scherer
@ 2013-11-19 22:15     ` Nicolas Braud-Santoni
  2 siblings, 0 replies; 18+ messages in thread
From: Nicolas Braud-Santoni @ 2013-11-19 22:15 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]

On 19/11/2013 08:53, David MENTRE wrote:
> Hello,
>
> 2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
>> This article is more or less discussing cryptoattacks on an insecure
>> hash function, and the new versions of the functions are only "better"
>> but in no way safe (which you only get with crypto hashes or
>> dictionaries that don't allow collisions).
> Which in turn raises the question: Who has the responsibility to
> implement such hashes that can be exposed to the Internet: OCaml
> standard library or dedicated frameworks like OCamlNet or Ocsigen? In
> other words, shouldn't we keep simple hashes in the OCaml standard
> library (for people making regular programs) and let people making web
> applications chose the proper crypto hash in a specialized library?
>
> Best regards,
> david

Hello,

I believe that, before discussing changes to Hashtbl, we need to
know the “€œparameters” of our performance/security tradeoff:
1. We need actual performance data to know what is currently costly (in
seeded_hash).
A big part of the code in byterun/hash.c is actually concerned with
traversal of the data-structure being hashed;
is there any information about how much time is actually spent hashing
the data, versus traversal & memory access times ?

2. What security guarantees do we want Hashtbl.seeded_hash to provide?
On the security side, as Gabriel Scherer mentionned, “€œrandomized
hashing” failed because little was known about the security of the hash
functions that were used (such as CityHash or MurmurHash).
a. There are several classical notions that may apply:
- Universal Hash Function: *theoretically* suitable for use in hash
tables, even when data is chosen by an adversary (as long as the random
parameter is kept secret);
- Universal One-Way Hash Function (pronounced â€œwoofâ€): stronger
guarantees wrt. the hardness of collision (as long as the random
parameter is kept secret).

However, this still can be said to be insecure, as Hashtbl leaks
information about the random parameter:
for instance, Hashtbl.iter (currently) iterates the table in the order
of the hashes of the keys (mod 2^s);
naively displaying a table of data by iterating over the hash table
leaks some information about this parameter.

Moreover, there is no (as far as I know) UH function that is both fast,
gives good uniformity in practice (nevermind secure).

b. recent research has yielded some cryptographic hash functions, like
SipHash[1], which are:
- believed to be strong[2];
- have speed comparable with â€œcommonâ€ non-crypto hash functions,
notably on short inputs (but we need to evaluate the speed of seeded_hash);
- have good statistical properties.

Regarding the suggestion of libraries implementing hashes suitable for
use with untrusted data[3], I see 3 problems:
1. It is difficult for a library author to guess whether his library
will be used in a context where a â€œmore secureâ€ Hashtbl
implementation is required (and functorizing everything is problematic);
that's part of why having a â€œreasonable secureâ€ default seems
important (to me, at least).
2. Implementing a polymorphic hash function is rather non-trivial.
3. Most users do not use secure primitives, especially if the default
option doesn't document its security trade-off.

Regards,
Nicolas

[1] https://131002.net/siphash/
[2] “œstrong” means here that it is claimed to be a PRF (e.g. not
effectively distinguishable from a random oracle)
[3] It isn't necessarily “œover the Internet”, and the hashes themselves
need not be exposed for the security to matter.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19 12:55         ` rixed
@ 2013-11-19 22:18           ` Nicolas Braud-Santoni
  2013-11-19 22:39             ` Eric Cooper
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Braud-Santoni @ 2013-11-19 22:18 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 624 bytes --]

On 19/11/2013 13:55, rixed@happyleptic.org wrote:
> -[ Tue, Nov 19, 2013 at 03:19:13AM -0800, Dario Teixeira ]----
>> To complicate matters further, on some architectures but not on others
>> you may now or in the future have silicon that speeds up the calculation
>> of cryptographic hash functions...
> To complicate matters even further, many users concerned about security
> will refuse to rely on a hardware blackbox implementation anyway. ;-)
I am not aware of instruction sets dedicated to speeding up hash
functions (unlike encryption with AES, for instance).
However, vectorization can definitely help.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19 11:19       ` Dario Teixeira
  2013-11-19 12:55         ` rixed
@ 2013-11-19 22:31         ` Nicolas Braud-Santoni
  2013-11-20 18:56         ` Florian Weimer
  2 siblings, 0 replies; 18+ messages in thread
From: Nicolas Braud-Santoni @ 2013-11-19 22:31 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 2413 bytes --]

On 19/11/2013 12:19, Dario Teixeira wrote:
> Just to make sure we are all on the same page, allow me to summarise
> the main points under discussion.  I think we all agree on the following:
>
>  - Any hashtable implementation that uses a non-cryptographic hash function
>    and relies on an association list of buckets is vulnerable to attacks
>    that force the worst-case O(n) behaviour.  Though you can buy some time
>    by tweaking the hash function, you are still stuck in an arms race with
>    attackers.
Depends on what you mean by “non-cryptographic”.
But not using a PRF (a strong hash function) seems to be unpractical.

>  - Solution approach #1: switch to a cryptographic hash function.  This
>    approach is simple and would solve the problem once and for all.
>    Unfortunately, the performance would be terrible (how much though?).
As I stated in my previous mail, some recent hash function have good
speed (without relying on fancy superscalar instructions).
Moreover, it is unclear how time is spent inside `caml_hash`.

However, changing the hash function isn't so simple, as :
- Hashtbl's documentation[1] specifies that “in non-randomized mode, the
order in which the bindings are enumerated is reproducible between [...]
minor versions of OCaml”.
This means the change cannot be made (for the non-randomized mode) until
OCaml 5.
- Implementing a hash function in a form suitable for `byterun/hash.c`
requires
- to care about endiannes ? (though the hash function doesn't seem to be
specified to be architecture-independent, it is currently the case[2])
- to be able to feed the data to the hash function in small “chunks”
while traversing the OCaml value.

[1] http://caml.inria.fr/pub/docs/manual-ocaml/libref/Hashtbl.html
[2] Except for 64-bit values that cannot be represented in 32-bit, of course
>  - Solution approach #2: keep the non-cryptographic hash function but use
>    a balanced tree for the buckets instead of an association list. [...] performance is most likely better than the first approach.
Again, I'm wary of “hand waving” statements about performance in this case.

>  - Since this problem presents itself on limited domains, [...] if the adopted solution does
>    indeed have a serious performance penalty, then it should be opt-in.
If the performance penalty is prohibitive, this seems clear, yes.


Best,
Nicolas


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19 22:18           ` Nicolas Braud-Santoni
@ 2013-11-19 22:39             ` Eric Cooper
  2013-11-19 22:55               ` Nicolas Braud-Santoni
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Cooper @ 2013-11-19 22:39 UTC (permalink / raw)
  To: caml-list

On Tue, Nov 19, 2013 at 11:18:43PM +0100, Nicolas Braud-Santoni wrote:
> I am not aware of instruction sets dedicated to speeding up hash
> functions (unlike encryption with AES, for instance).

But many systems-on-chips have crypto engines that can be used for
this purpose. See
    http://www.marvell.com/application-processors/armada-500/
to cite just one example.

-- 
Eric Cooper             e c c @ c m u . e d u

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19 22:39             ` Eric Cooper
@ 2013-11-19 22:55               ` Nicolas Braud-Santoni
  2013-11-25 13:46                 ` Goswin von Brederlow
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Braud-Santoni @ 2013-11-19 22:55 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 306 bytes --]

On 19/11/2013 23:39, Eric Cooper wrote:
> But many systems-on-chips have crypto engines that can be used for
> this purpose. See
> http://www.marvell.com/application-processors/armada-500/ to cite just
> one example. 
Using SoC-specific hardware accelerators is probably way out-of-scope
here :-)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19 11:19       ` Dario Teixeira
  2013-11-19 12:55         ` rixed
  2013-11-19 22:31         ` Nicolas Braud-Santoni
@ 2013-11-20 18:56         ` Florian Weimer
  2013-11-20 21:47           ` Gerd Stolpmann
  2 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2013-11-20 18:56 UTC (permalink / raw)
  To: Dario Teixeira
  Cc: Gabriel Scherer, David MENTRE, Gerd Stolpmann,
	Richard W.M. Jones, caml users

* Dario Teixeira:

>  - Any hashtable implementation that uses a non-cryptographic hash function
>    and relies on an association list of buckets is vulnerable to attacks
>    that force the worst-case O(n) behaviour.  Though you can buy some time
>    by tweaking the hash function, you are still stuck in an arms race with
>    attackers.

You just need a keyed hash with a secret key, not a general-purpose
cryptographic hash function.  So far, Jenkins' lookup3 function still
stands.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-20 18:56         ` Florian Weimer
@ 2013-11-20 21:47           ` Gerd Stolpmann
  2013-11-25 13:51             ` Goswin von Brederlow
  0 siblings, 1 reply; 18+ messages in thread
From: Gerd Stolpmann @ 2013-11-20 21:47 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Dario Teixeira, Gabriel Scherer, David MENTRE, Gerd Stolpmann,
	Richard W.M. Jones, caml users

[-- Attachment #1: Type: text/plain, Size: 2221 bytes --]

Am Mittwoch, den 20.11.2013, 19:56 +0100 schrieb Florian Weimer:
> * Dario Teixeira:
> 
> >  - Any hashtable implementation that uses a non-cryptographic hash function
> >    and relies on an association list of buckets is vulnerable to attacks
> >    that force the worst-case O(n) behaviour.  Though you can buy some time
> >    by tweaking the hash function, you are still stuck in an arms race with
> >    attackers.
> 
> You just need a keyed hash with a secret key, not a general-purpose
> cryptographic hash function.  So far, Jenkins' lookup3 function still
> stands.

The problem is this: You think you add n bits of randomness with the
secret key. But because the hash function isn't cryptographic, there
will be symmetries, and the n bits are actually not worth n bits. And
because there is no cryptanalysis we don't know how safe we are.

You don't need a giant compute centre for finding pairs (secret, keys)
where the keys collide. I'd guess a normal GPU can compute more than
1E11 hashes of that complexity per second. That way you can watch out
for keys that increase the likeliness of a collision. And that's only
the brute force method.

Also, you can test easily 100 or more different series of keys in a
single HTTP request. If you find an attack method where the original n
bits of randomness are, say, only worth 20 bits, you can shoot off the
server process within a couple of minutes. And within an hour the whole
computer. Of course, this is only speculation, but the point is that we
cannot be sure that this isn't possible.

Generally, I think it is better to change the hash table algorithm in
situations where data from untrusted sources is processed. That means
using balanced trees for the buckets. Consumes more RAM, but is provably
safe. (Or, at minimum, limit the length of the buckets.)

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19  0:08 ` Gerd Stolpmann
  2013-11-19  7:53   ` David MENTRE
@ 2013-11-25 13:38   ` Goswin von Brederlow
  1 sibling, 0 replies; 18+ messages in thread
From: Goswin von Brederlow @ 2013-11-25 13:38 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Richard W.M. Jones, caml-list

On Tue, Nov 19, 2013 at 01:08:40AM +0100, Gerd Stolpmann wrote:
> Am Montag, den 18.11.2013, 20:44 +0000 schrieb Richard W.M. Jones:
> > The Perl community have found further attacks against their
> > (already considered hardened) hash function.  This article
> > explains the problems quite well:
> > 
> > http://blog.booking.com/hardening-perls-hash-function.html
> 
> Interesting read. I'm wondering which conclusions one should draw from
> it. This article is more or less discussing cryptoattacks on an insecure
> hash function, and the new versions of the functions are only "better"
> but in no way safe (which you only get with crypto hashes or
> dictionaries that don't allow collisions). This sounds for me like a
> race between attackers and hash implementors.
> 
> Gerd

I notice that the problem isn't exploting hash collisions but
collisions in the bucket being used in the hashtable. So no matter how
good the hash is the table still breaks down.

1) the hashtable isn't randomized by default
   - the bucket being used is predictable so you can make long chains

Why not just always use the randomized hash? Why only use it when you
think you have detected an abuse? That is just silly.

2) on long chains the hashtable size is doubled
   - each bucket is split in two and assuming a good hash each will
     get half the entries

You don't double your size. The old and new size should not have a
common divisor, esspecially not 2. Best is to use primes but not
fastest. Using 2^n-1 is tempting since (x mod 2^n-1) is fast to
compute without needing a long division.

If the old and new size have no common divisor then the entries of the
overly long chain are distributed over *all* the buckets. So instead
of getting 2 buckets with n/2 entries you get buckets with n/size
entries.

3) hashtables aren't re-randomized on resize

When you resize a hashtable you have to sort every key into a new
bucket. So why not change the randomize of the hash function at that
time. If the randomizer is applied after the hash is computed this can
be done quickly without having to rehash every key. Takes extra memory
to store the non randomized hash value of each key though. If you
don't want to spend that memory you can rehash every key, which also
allows using a randomized hash function instead of randomizing the
result of the hash.

4) hashtable sizes are predictable

You can also randomize the size. Nobody says you have to (nearly)
double the size in a predictable way. You can have a long list of good
sizes (e.g. prime numbers) and randomly pick one that is between 1.8
and 2.2 times (or whatever grows factor you want) the size as the old
size. The randomness will make the size of the table unpredictable
making it that much harder to construct a key collision.

Note: if a table is resized due to an overlong chain without being
reasonable full I would just resize it a tiny bit instead of about
double. That gets rid of the collisions without doubling the memory
used.

MfG
	Goswin

PS: is anyone working on fixing the stdlib hashtable?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-19 22:55               ` Nicolas Braud-Santoni
@ 2013-11-25 13:46                 ` Goswin von Brederlow
  0 siblings, 0 replies; 18+ messages in thread
From: Goswin von Brederlow @ 2013-11-25 13:46 UTC (permalink / raw)
  To: caml-list

On Tue, Nov 19, 2013 at 11:55:39PM +0100, Nicolas Braud-Santoni wrote:
> On 19/11/2013 23:39, Eric Cooper wrote:
> > But many systems-on-chips have crypto engines that can be used for
> > this purpose. See
> > http://www.marvell.com/application-processors/armada-500/ to cite just
> > one example. 
> Using SoC-specific hardware accelerators is probably way out-of-scope
> here :-)

I also think that hashtable keys are usualy too small by several
magnitudes to benefit from it. The cost of setting up the hardware
crypto engine from userspace will be way higher than hashing e.g. 64
byte yourself. I would expect that anything below page size at least
won't be faster in hardware but that is just a guess.

MfG
	Goswin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-20 21:47           ` Gerd Stolpmann
@ 2013-11-25 13:51             ` Goswin von Brederlow
  2013-11-25 14:43               ` Yaron Minsky
  0 siblings, 1 reply; 18+ messages in thread
From: Goswin von Brederlow @ 2013-11-25 13:51 UTC (permalink / raw)
  To: caml-list

On Wed, Nov 20, 2013 at 10:47:10PM +0100, Gerd Stolpmann wrote:
> Generally, I think it is better to change the hash table algorithm in
> situations where data from untrusted sources is processed. That means
> using balanced trees for the buckets. Consumes more RAM, but is provably
> safe. (Or, at minimum, limit the length of the buckets.)
> 
> Gerd

If you truely have hash collisions then you can't limit the length of
the buckets. There is no way to make 2 keys with identical hash not
land in the same bucket.

Or did you mean use a list up to N items and then switch to a tree?

MfG
	Goswin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Caml-list] Hardening [Perl's] hash function further
  2013-11-25 13:51             ` Goswin von Brederlow
@ 2013-11-25 14:43               ` Yaron Minsky
  0 siblings, 0 replies; 18+ messages in thread
From: Yaron Minsky @ 2013-11-25 14:43 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: caml-list

For what it's worth, Core's Hashtbl module uses AVL trees for the
buckets, so the behavior on large numbers of collisions degrades to
logarithmic rather than linear.

y

On Mon, Nov 25, 2013 at 8:51 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> On Wed, Nov 20, 2013 at 10:47:10PM +0100, Gerd Stolpmann wrote:
>> Generally, I think it is better to change the hash table algorithm in
>> situations where data from untrusted sources is processed. That means
>> using balanced trees for the buckets. Consumes more RAM, but is provably
>> safe. (Or, at minimum, limit the length of the buckets.)
>>
>> Gerd
>
> If you truely have hash collisions then you can't limit the length of
> the buckets. There is no way to make 2 keys with identical hash not
> land in the same bucket.
>
> Or did you mean use a list up to N items and then switch to a tree?
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-11-25 14:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-18 20:44 [Caml-list] Hardening [Perl's] hash function further Richard W.M. Jones
2013-11-19  0:08 ` Gerd Stolpmann
2013-11-19  7:53   ` David MENTRE
2013-11-19  8:50     ` Richard W.M. Jones
2013-11-19  9:14     ` Gabriel Scherer
2013-11-19 11:19       ` Dario Teixeira
2013-11-19 12:55         ` rixed
2013-11-19 22:18           ` Nicolas Braud-Santoni
2013-11-19 22:39             ` Eric Cooper
2013-11-19 22:55               ` Nicolas Braud-Santoni
2013-11-25 13:46                 ` Goswin von Brederlow
2013-11-19 22:31         ` Nicolas Braud-Santoni
2013-11-20 18:56         ` Florian Weimer
2013-11-20 21:47           ` Gerd Stolpmann
2013-11-25 13:51             ` Goswin von Brederlow
2013-11-25 14:43               ` Yaron Minsky
2013-11-19 22:15     ` Nicolas Braud-Santoni
2013-11-25 13:38   ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).