On 19/11/2013 12:19, Dario Teixeira wrote:
> Just to make sure we are all on the same page, allow me to summarise
> the main points under discussion.  I think we all agree on the following:
>
>  - Any hashtable implementation that uses a non-cryptographic hash function
>    and relies on an association list of buckets is vulnerable to attacks
>    that force the worst-case O(n) behaviour.  Though you can buy some time
>    by tweaking the hash function, you are still stuck in an arms race with
>    attackers.
Depends on what you mean by “non-cryptographic”.
But not using a PRF (a strong hash function) seems to be unpractical.

>  - Solution approach #1: switch to a cryptographic hash function.  This
>    approach is simple and would solve the problem once and for all.
>    Unfortunately, the performance would be terrible (how much though?).
As I stated in my previous mail, some recent hash function have good
speed (without relying on fancy superscalar instructions).
Moreover, it is unclear how time is spent inside `caml_hash`.

However, changing the hash function isn't so simple, as :
- Hashtbl's documentation[1] specifies that “in non-randomized mode, the
order in which the bindings are enumerated is reproducible between [...]
minor versions of OCaml”.
This means the change cannot be made (for the non-randomized mode) until
OCaml 5.
- Implementing a hash function in a form suitable for `byterun/hash.c`
requires
- to care about endiannes ? (though the hash function doesn't seem to be
specified to be architecture-independent, it is currently the case[2])
- to be able to feed the data to the hash function in small “chunks”
while traversing the OCaml value.

[1] http://caml.inria.fr/pub/docs/manual-ocaml/libref/Hashtbl.html
[2] Except for 64-bit values that cannot be represented in 32-bit, of course
>  - Solution approach #2: keep the non-cryptographic hash function but use
>    a balanced tree for the buckets instead of an association list. [...] performance is most likely better than the first approach.
Again, I'm wary of “hand waving” statements about performance in this case.

>  - Since this problem presents itself on limited domains, [...] if the adopted solution does
>    indeed have a serious performance penalty, then it should be opt-in.
If the performance penalty is prohibitive, this seems clear, yes.


Best,
Nicolas