On 19/11/2013 08:53, David MENTRE wrote: > Hello, > > 2013/11/19 Gerd Stolpmann : >> This article is more or less discussing cryptoattacks on an insecure >> hash function, and the new versions of the functions are only "better" >> but in no way safe (which you only get with crypto hashes or >> dictionaries that don't allow collisions). > Which in turn raises the question: Who has the responsibility to > implement such hashes that can be exposed to the Internet: OCaml > standard library or dedicated frameworks like OCamlNet or Ocsigen? In > other words, shouldn't we keep simple hashes in the OCaml standard > library (for people making regular programs) and let people making web > applications chose the proper crypto hash in a specialized library? > > Best regards, > david Hello, I believe that, before discussing changes to Hashtbl, we need to know the “€œparameters” of our performance/security tradeoff: 1. We need actual performance data to know what is currently costly (in seeded_hash). A big part of the code in byterun/hash.c is actually concerned with traversal of the data-structure being hashed; is there any information about how much time is actually spent hashing the data, versus traversal & memory access times ? 2. What security guarantees do we want Hashtbl.seeded_hash to provide? On the security side, as Gabriel Scherer mentionned, “€œrandomized hashing” failed because little was known about the security of the hash functions that were used (such as CityHash or MurmurHash). a. There are several classical notions that may apply: - Universal Hash Function: *theoretically* suitable for use in hash tables, even when data is chosen by an adversary (as long as the random parameter is kept secret); - Universal One-Way Hash Function (pronounced “woof”): stronger guarantees wrt. the hardness of collision (as long as the random parameter is kept secret). However, this still can be said to be insecure, as Hashtbl leaks information about the random parameter: for instance, Hashtbl.iter (currently) iterates the table in the order of the hashes of the keys (mod 2^s); naively displaying a table of data by iterating over the hash table leaks some information about this parameter. Moreover, there is no (as far as I know) UH function that is both fast, gives good uniformity in practice (nevermind secure). b. recent research has yielded some cryptographic hash functions, like SipHash[1], which are: - believed to be strong[2]; - have speed comparable with “common” non-crypto hash functions, notably on short inputs (but we need to evaluate the speed of seeded_hash); - have good statistical properties. Regarding the suggestion of libraries implementing hashes suitable for use with untrusted data[3], I see 3 problems: 1. It is difficult for a library author to guess whether his library will be used in a context where a “more secure” Hashtbl implementation is required (and functorizing everything is problematic); that's part of why having a “reasonable secure” default seems important (to me, at least). 2. Implementing a polymorphic hash function is rather non-trivial. 3. Most users do not use secure primitives, especially if the default option doesn't document its security trade-off. Regards, Nicolas [1] https://131002.net/siphash/ [2] “œstrong” means here that it is claimed to be a PRF (e.g. not effectively distinguishable from a random oracle) [3] It isn't necessarily “œover the Internet”, and the hashes themselves need not be exposed for the security to matter.