* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 7:53 ` David MENTRE
@ 2013-11-19 8:50 ` Richard W.M. Jones
2013-11-19 9:14 ` Gabriel Scherer
2013-11-19 22:15 ` Nicolas Braud-Santoni
2 siblings, 0 replies; 18+ messages in thread
From: Richard W.M. Jones @ 2013-11-19 8:50 UTC (permalink / raw)
To: David MENTRE; +Cc: Gerd Stolpmann, caml users
On Tue, Nov 19, 2013 at 08:53:23AM +0100, David MENTRE wrote:
> Hello,
>
> 2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
> > This article is more or less discussing cryptoattacks on an insecure
> > hash function, and the new versions of the functions are only "better"
> > but in no way safe (which you only get with crypto hashes or
> > dictionaries that don't allow collisions).
>
> Which in turn raises the question: Who has the responsibility to
> implement such hashes that can be exposed to the Internet: OCaml
> standard library or dedicated frameworks like OCamlNet or Ocsigen? In
> other words, shouldn't we keep simple hashes in the OCaml standard
> library (for people making regular programs) and let people making web
> applications chose the proper crypto hash in a specialized library?
In Perl, the hash is a fundamental structure. It has the same
position of importance as the linked list has in functional
languages[1]. Perl doesn't have structs (it uses hashes), it doesn't
have associative arrays (hashes instead), nor objects (hashes again).
In other words it'd be very difficult to write a non-trivial Perl
application that didn't depend on the hash implementation of the Perl
interpreter.
OCaml is different in that Map/Hashtbl are not fundamental to writing
programs. You can write respectable programs without needing to use
those modules at all.
So I think it's probably the job of the web frameworks to keep an eye
on this and perhaps implement cryptographically strong hash tables (as
well as taking many other steps to mitigate attacks).
Rich.
[1] The one time I wrote an OCaml program which spectacularly blew up
in production, it was because I was using an assoc list which managed
to expand to 10000's of entries.
--
Richard Jones
Red Hat
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 7:53 ` David MENTRE
2013-11-19 8:50 ` Richard W.M. Jones
@ 2013-11-19 9:14 ` Gabriel Scherer
2013-11-19 11:19 ` Dario Teixeira
2013-11-19 22:15 ` Nicolas Braud-Santoni
2 siblings, 1 reply; 18+ messages in thread
From: Gabriel Scherer @ 2013-11-19 9:14 UTC (permalink / raw)
To: David MENTRE; +Cc: Gerd Stolpmann, Richard W.M. Jones, caml users
I've thought about this during the last debate on randomized hashing
(which I think is now known to be ineffective, as implemented, to be
secure), and I think a reasonable goal would be to use a balanced tree
of buckets to a good worst-case, instead of fighting at the hash
function level. As said at the time, this is what Core use, but I'm
not sure what the bookkeeping overhead is against current hashtables.
I hope it's possible to fight the constants to get something in the
same performance ballpark.
On Tue, Nov 19, 2013 at 8:53 AM, David MENTRE <dmentre@linux-france.org> wrote:
> Hello,
>
> 2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
>> This article is more or less discussing cryptoattacks on an insecure
>> hash function, and the new versions of the functions are only "better"
>> but in no way safe (which you only get with crypto hashes or
>> dictionaries that don't allow collisions).
>
> Which in turn raises the question: Who has the responsibility to
> implement such hashes that can be exposed to the Internet: OCaml
> standard library or dedicated frameworks like OCamlNet or Ocsigen? In
> other words, shouldn't we keep simple hashes in the OCaml standard
> library (for people making regular programs) and let people making web
> applications chose the proper crypto hash in a specialized library?
>
> Best regards,
> david
>
> --
> Caml-list mailing list. Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 9:14 ` Gabriel Scherer
@ 2013-11-19 11:19 ` Dario Teixeira
2013-11-19 12:55 ` rixed
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Dario Teixeira @ 2013-11-19 11:19 UTC (permalink / raw)
To: Gabriel Scherer, David MENTRE
Cc: Gerd Stolpmann, Richard W.M. Jones, caml users
Hi,
> I've thought about this during the last debate on randomized hashing
> (which I think is now known to be ineffective, as implemented, to be
> secure), and I think a reasonable goal would be to use a balanced tree
> of buckets to a good worst-case, instead of fighting at the hash
> function level. As said at the time, this is what Core use, but I'm
> not sure what the bookkeeping overhead is against current hashtables.
> I hope it's possible to fight the constants to get something in the
> same performance ballpark.
Just to make sure we are all on the same page, allow me to summarise
the main points under discussion. I think we all agree on the following:
- Any hashtable implementation that uses a non-cryptographic hash function
and relies on an association list of buckets is vulnerable to attacks
that force the worst-case O(n) behaviour. Though you can buy some time
by tweaking the hash function, you are still stuck in an arms race with
attackers.
- Solution approach #1: switch to a cryptographic hash function. This
approach is simple and would solve the problem once and for all.
Unfortunately, the performance would be terrible (how much though?).
- Solution approach #2: keep the non-cryptographic hash function but use
a balanced tree for the buckets instead of an association list. This
should mitigate the worst-case scenario -- it would go from O(n) to
O(log n) -- and performance is most likely better than the first approach.
- Since this problem presents itself on limited domains, it seems unfair
to ask *all* applications to sacrifice performance to protect themselves
from non-existent attackers. Therefore, if the adopted solution does
indeed have a serious performance penalty, then it should be opt-in.
Did I miss something important? Anyway, it seems we are lacking actual
metrics to quantify the performance impact of the various approaches.
To complicate matters further, on some architectures but not on others
you may now or in the future have silicon that speeds up the calculation
of cryptographic hash functions...
Best regards,
Dario Teixeira
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 11:19 ` Dario Teixeira
@ 2013-11-19 12:55 ` rixed
2013-11-19 22:18 ` Nicolas Braud-Santoni
2013-11-19 22:31 ` Nicolas Braud-Santoni
2013-11-20 18:56 ` Florian Weimer
2 siblings, 1 reply; 18+ messages in thread
From: rixed @ 2013-11-19 12:55 UTC (permalink / raw)
To: caml users; +Cc: Dario Teixeira
-[ Tue, Nov 19, 2013 at 03:19:13AM -0800, Dario Teixeira ]----
> To complicate matters further, on some architectures but not on others
> you may now or in the future have silicon that speeds up the calculation
> of cryptographic hash functions...
To complicate matters even further, many users concerned about security
will refuse to rely on a hardware blackbox implementation anyway. ;-)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 12:55 ` rixed
@ 2013-11-19 22:18 ` Nicolas Braud-Santoni
2013-11-19 22:39 ` Eric Cooper
0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Braud-Santoni @ 2013-11-19 22:18 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 624 bytes --]
On 19/11/2013 13:55, rixed@happyleptic.org wrote:
> -[ Tue, Nov 19, 2013 at 03:19:13AM -0800, Dario Teixeira ]----
>> To complicate matters further, on some architectures but not on others
>> you may now or in the future have silicon that speeds up the calculation
>> of cryptographic hash functions...
> To complicate matters even further, many users concerned about security
> will refuse to rely on a hardware blackbox implementation anyway. ;-)
I am not aware of instruction sets dedicated to speeding up hash
functions (unlike encryption with AES, for instance).
However, vectorization can definitely help.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 11:19 ` Dario Teixeira
2013-11-19 12:55 ` rixed
@ 2013-11-19 22:31 ` Nicolas Braud-Santoni
2013-11-20 18:56 ` Florian Weimer
2 siblings, 0 replies; 18+ messages in thread
From: Nicolas Braud-Santoni @ 2013-11-19 22:31 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 2413 bytes --]
On 19/11/2013 12:19, Dario Teixeira wrote:
> Just to make sure we are all on the same page, allow me to summarise
> the main points under discussion. I think we all agree on the following:
>
> - Any hashtable implementation that uses a non-cryptographic hash function
> and relies on an association list of buckets is vulnerable to attacks
> that force the worst-case O(n) behaviour. Though you can buy some time
> by tweaking the hash function, you are still stuck in an arms race with
> attackers.
Depends on what you mean by “non-cryptographic”.
But not using a PRF (a strong hash function) seems to be unpractical.
> - Solution approach #1: switch to a cryptographic hash function. This
> approach is simple and would solve the problem once and for all.
> Unfortunately, the performance would be terrible (how much though?).
As I stated in my previous mail, some recent hash function have good
speed (without relying on fancy superscalar instructions).
Moreover, it is unclear how time is spent inside `caml_hash`.
However, changing the hash function isn't so simple, as :
- Hashtbl's documentation[1] specifies that “in non-randomized mode, the
order in which the bindings are enumerated is reproducible between [...]
minor versions of OCaml”.
This means the change cannot be made (for the non-randomized mode) until
OCaml 5.
- Implementing a hash function in a form suitable for `byterun/hash.c`
requires
- to care about endiannes ? (though the hash function doesn't seem to be
specified to be architecture-independent, it is currently the case[2])
- to be able to feed the data to the hash function in small “chunks”
while traversing the OCaml value.
[1] http://caml.inria.fr/pub/docs/manual-ocaml/libref/Hashtbl.html
[2] Except for 64-bit values that cannot be represented in 32-bit, of course
> - Solution approach #2: keep the non-cryptographic hash function but use
> a balanced tree for the buckets instead of an association list. [...] performance is most likely better than the first approach.
Again, I'm wary of “hand waving” statements about performance in this case.
> - Since this problem presents itself on limited domains, [...] if the adopted solution does
> indeed have a serious performance penalty, then it should be opt-in.
If the performance penalty is prohibitive, this seems clear, yes.
Best,
Nicolas
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 11:19 ` Dario Teixeira
2013-11-19 12:55 ` rixed
2013-11-19 22:31 ` Nicolas Braud-Santoni
@ 2013-11-20 18:56 ` Florian Weimer
2013-11-20 21:47 ` Gerd Stolpmann
2 siblings, 1 reply; 18+ messages in thread
From: Florian Weimer @ 2013-11-20 18:56 UTC (permalink / raw)
To: Dario Teixeira
Cc: Gabriel Scherer, David MENTRE, Gerd Stolpmann,
Richard W.M. Jones, caml users
* Dario Teixeira:
> - Any hashtable implementation that uses a non-cryptographic hash function
> and relies on an association list of buckets is vulnerable to attacks
> that force the worst-case O(n) behaviour. Though you can buy some time
> by tweaking the hash function, you are still stuck in an arms race with
> attackers.
You just need a keyed hash with a secret key, not a general-purpose
cryptographic hash function. So far, Jenkins' lookup3 function still
stands.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-20 18:56 ` Florian Weimer
@ 2013-11-20 21:47 ` Gerd Stolpmann
2013-11-25 13:51 ` Goswin von Brederlow
0 siblings, 1 reply; 18+ messages in thread
From: Gerd Stolpmann @ 2013-11-20 21:47 UTC (permalink / raw)
To: Florian Weimer
Cc: Dario Teixeira, Gabriel Scherer, David MENTRE, Gerd Stolpmann,
Richard W.M. Jones, caml users
[-- Attachment #1: Type: text/plain, Size: 2221 bytes --]
Am Mittwoch, den 20.11.2013, 19:56 +0100 schrieb Florian Weimer:
> * Dario Teixeira:
>
> > - Any hashtable implementation that uses a non-cryptographic hash function
> > and relies on an association list of buckets is vulnerable to attacks
> > that force the worst-case O(n) behaviour. Though you can buy some time
> > by tweaking the hash function, you are still stuck in an arms race with
> > attackers.
>
> You just need a keyed hash with a secret key, not a general-purpose
> cryptographic hash function. So far, Jenkins' lookup3 function still
> stands.
The problem is this: You think you add n bits of randomness with the
secret key. But because the hash function isn't cryptographic, there
will be symmetries, and the n bits are actually not worth n bits. And
because there is no cryptanalysis we don't know how safe we are.
You don't need a giant compute centre for finding pairs (secret, keys)
where the keys collide. I'd guess a normal GPU can compute more than
1E11 hashes of that complexity per second. That way you can watch out
for keys that increase the likeliness of a collision. And that's only
the brute force method.
Also, you can test easily 100 or more different series of keys in a
single HTTP request. If you find an attack method where the original n
bits of randomness are, say, only worth 20 bits, you can shoot off the
server process within a couple of minutes. And within an hour the whole
computer. Of course, this is only speculation, but the point is that we
cannot be sure that this isn't possible.
Generally, I think it is better to change the hash table algorithm in
situations where data from untrusted sources is processed. That means
using balanced trees for the buckets. Consumes more RAM, but is provably
safe. (Or, at minimum, limit the length of the buckets.)
Gerd
--
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de
My OCaml site: http://www.camlcity.org
Contact details: http://www.camlcity.org/contact.html
Company homepage: http://www.gerd-stolpmann.de
------------------------------------------------------------
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-20 21:47 ` Gerd Stolpmann
@ 2013-11-25 13:51 ` Goswin von Brederlow
2013-11-25 14:43 ` Yaron Minsky
0 siblings, 1 reply; 18+ messages in thread
From: Goswin von Brederlow @ 2013-11-25 13:51 UTC (permalink / raw)
To: caml-list
On Wed, Nov 20, 2013 at 10:47:10PM +0100, Gerd Stolpmann wrote:
> Generally, I think it is better to change the hash table algorithm in
> situations where data from untrusted sources is processed. That means
> using balanced trees for the buckets. Consumes more RAM, but is provably
> safe. (Or, at minimum, limit the length of the buckets.)
>
> Gerd
If you truely have hash collisions then you can't limit the length of
the buckets. There is no way to make 2 keys with identical hash not
land in the same bucket.
Or did you mean use a list up to N items and then switch to a tree?
MfG
Goswin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-25 13:51 ` Goswin von Brederlow
@ 2013-11-25 14:43 ` Yaron Minsky
0 siblings, 0 replies; 18+ messages in thread
From: Yaron Minsky @ 2013-11-25 14:43 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: caml-list
For what it's worth, Core's Hashtbl module uses AVL trees for the
buckets, so the behavior on large numbers of collisions degrades to
logarithmic rather than linear.
y
On Mon, Nov 25, 2013 at 8:51 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> On Wed, Nov 20, 2013 at 10:47:10PM +0100, Gerd Stolpmann wrote:
>> Generally, I think it is better to change the hash table algorithm in
>> situations where data from untrusted sources is processed. That means
>> using balanced trees for the buckets. Consumes more RAM, but is provably
>> safe. (Or, at minimum, limit the length of the buckets.)
>>
>> Gerd
>
> If you truely have hash collisions then you can't limit the length of
> the buckets. There is no way to make 2 keys with identical hash not
> land in the same bucket.
>
> Or did you mean use a list up to N items and then switch to a tree?
>
> MfG
> Goswin
>
> --
> Caml-list mailing list. Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Caml-list] Hardening [Perl's] hash function further
2013-11-19 7:53 ` David MENTRE
2013-11-19 8:50 ` Richard W.M. Jones
2013-11-19 9:14 ` Gabriel Scherer
@ 2013-11-19 22:15 ` Nicolas Braud-Santoni
2 siblings, 0 replies; 18+ messages in thread
From: Nicolas Braud-Santoni @ 2013-11-19 22:15 UTC (permalink / raw)
To: caml-list
[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]
On 19/11/2013 08:53, David MENTRE wrote:
> Hello,
>
> 2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
>> This article is more or less discussing cryptoattacks on an insecure
>> hash function, and the new versions of the functions are only "better"
>> but in no way safe (which you only get with crypto hashes or
>> dictionaries that don't allow collisions).
> Which in turn raises the question: Who has the responsibility to
> implement such hashes that can be exposed to the Internet: OCaml
> standard library or dedicated frameworks like OCamlNet or Ocsigen? In
> other words, shouldn't we keep simple hashes in the OCaml standard
> library (for people making regular programs) and let people making web
> applications chose the proper crypto hash in a specialized library?
>
> Best regards,
> david
Hello,
I believe that, before discussing changes to Hashtbl, we need to
know the “parameters” of our performance/security tradeoff:
1. We need actual performance data to know what is currently costly (in
seeded_hash).
A big part of the code in byterun/hash.c is actually concerned with
traversal of the data-structure being hashed;
is there any information about how much time is actually spent hashing
the data, versus traversal & memory access times ?
2. What security guarantees do we want Hashtbl.seeded_hash to provide?
On the security side, as Gabriel Scherer mentionned, “randomized
hashing” failed because little was known about the security of the hash
functions that were used (such as CityHash or MurmurHash).
a. There are several classical notions that may apply:
- Universal Hash Function: *theoretically* suitable for use in hash
tables, even when data is chosen by an adversary (as long as the random
parameter is kept secret);
- Universal One-Way Hash Function (pronounced âwoofâ): stronger
guarantees wrt. the hardness of collision (as long as the random
parameter is kept secret).
However, this still can be said to be insecure, as Hashtbl leaks
information about the random parameter:
for instance, Hashtbl.iter (currently) iterates the table in the order
of the hashes of the keys (mod 2^s);
naively displaying a table of data by iterating over the hash table
leaks some information about this parameter.
Moreover, there is no (as far as I know) UH function that is both fast,
gives good uniformity in practice (nevermind secure).
b. recent research has yielded some cryptographic hash functions, like
SipHash[1], which are:
- believed to be strong[2];
- have speed comparable with âcommonâ non-crypto hash functions,
notably on short inputs (but we need to evaluate the speed of seeded_hash);
- have good statistical properties.
Regarding the suggestion of libraries implementing hashes suitable for
use with untrusted data[3], I see 3 problems:
1. It is difficult for a library author to guess whether his library
will be used in a context where a âmore secureâ Hashtbl
implementation is required (and functorizing everything is problematic);
that's part of why having a âreasonable secureâ default seems
important (to me, at least).
2. Implementing a polymorphic hash function is rather non-trivial.
3. Most users do not use secure primitives, especially if the default
option doesn't document its security trade-off.
Regards,
Nicolas
[1] https://131002.net/siphash/
[2] “strong” means here that it is claimed to be a PRF (e.g. not
effectively distinguishable from a random oracle)
[3] It isn't necessarily “over the Internet”, and the hashes themselves
need not be exposed for the security to matter.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread