On 19/11/2013 12:19, Dario Teixeira wrote: > Just to make sure we are all on the same page, allow me to summarise > the main points under discussion. I think we all agree on the following: > > - Any hashtable implementation that uses a non-cryptographic hash function > and relies on an association list of buckets is vulnerable to attacks > that force the worst-case O(n) behaviour. Though you can buy some time > by tweaking the hash function, you are still stuck in an arms race with > attackers. Depends on what you mean by “non-cryptographic”. But not using a PRF (a strong hash function) seems to be unpractical. > - Solution approach #1: switch to a cryptographic hash function. This > approach is simple and would solve the problem once and for all. > Unfortunately, the performance would be terrible (how much though?). As I stated in my previous mail, some recent hash function have good speed (without relying on fancy superscalar instructions). Moreover, it is unclear how time is spent inside `caml_hash`. However, changing the hash function isn't so simple, as : - Hashtbl's documentation[1] specifies that “in non-randomized mode, the order in which the bindings are enumerated is reproducible between [...] minor versions of OCaml”. This means the change cannot be made (for the non-randomized mode) until OCaml 5. - Implementing a hash function in a form suitable for `byterun/hash.c` requires - to care about endiannes ? (though the hash function doesn't seem to be specified to be architecture-independent, it is currently the case[2]) - to be able to feed the data to the hash function in small “chunks” while traversing the OCaml value. [1] http://caml.inria.fr/pub/docs/manual-ocaml/libref/Hashtbl.html [2] Except for 64-bit values that cannot be represented in 32-bit, of course > - Solution approach #2: keep the non-cryptographic hash function but use > a balanced tree for the buckets instead of an association list. [...] performance is most likely better than the first approach. Again, I'm wary of “hand waving” statements about performance in this case. > - Since this problem presents itself on limited domains, [...] if the adopted solution does > indeed have a serious performance penalty, then it should be opt-in. If the performance penalty is prohibitive, this seems clear, yes. Best, Nicolas