caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Nicolas Braud-Santoni <nicolas@braud-santoni.eu>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Hardening [Perl's] hash function further
Date: Tue, 19 Nov 2013 23:15:58 +0100	[thread overview]
Message-ID: <528BE31E.70308@braud-santoni.eu> (raw)
In-Reply-To: <CAC3Lx=ZrsHau9NicrwTjz-Lt7R5ysdrh4S3KVMOc1vZAAhJpvA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]

On 19/11/2013 08:53, David MENTRE wrote:
> Hello,
>
> 2013/11/19 Gerd Stolpmann <info@gerd-stolpmann.de>:
>> This article is more or less discussing cryptoattacks on an insecure
>> hash function, and the new versions of the functions are only "better"
>> but in no way safe (which you only get with crypto hashes or
>> dictionaries that don't allow collisions).
> Which in turn raises the question: Who has the responsibility to
> implement such hashes that can be exposed to the Internet: OCaml
> standard library or dedicated frameworks like OCamlNet or Ocsigen? In
> other words, shouldn't we keep simple hashes in the OCaml standard
> library (for people making regular programs) and let people making web
> applications chose the proper crypto hash in a specialized library?
>
> Best regards,
> david

Hello,

I believe that, before discussing changes to Hashtbl, we need to
know the “€œparameters” of our performance/security tradeoff:
1. We need actual performance data to know what is currently costly (in
seeded_hash).
A big part of the code in byterun/hash.c is actually concerned with
traversal of the data-structure being hashed;
is there any information about how much time is actually spent hashing
the data, versus traversal & memory access times ?

2. What security guarantees do we want Hashtbl.seeded_hash to provide?
On the security side, as Gabriel Scherer mentionned, “€œrandomized
hashing” failed because little was known about the security of the hash
functions that were used (such as CityHash or MurmurHash).
a. There are several classical notions that may apply:
- Universal Hash Function: *theoretically* suitable for use in hash
tables, even when data is chosen by an adversary (as long as the random
parameter is kept secret);
- Universal One-Way Hash Function (pronounced “woof”): stronger
guarantees wrt. the hardness of collision (as long as the random
parameter is kept secret).

However, this still can be said to be insecure, as Hashtbl leaks
information about the random parameter:
for instance, Hashtbl.iter (currently) iterates the table in the order
of the hashes of the keys (mod 2^s);
naively displaying a table of data by iterating over the hash table
leaks some information about this parameter.

Moreover, there is no (as far as I know) UH function that is both fast,
gives good uniformity in practice (nevermind secure).

b. recent research has yielded some cryptographic hash functions, like
SipHash[1], which are:
- believed to be strong[2];
- have speed comparable with “common” non-crypto hash functions,
notably on short inputs (but we need to evaluate the speed of seeded_hash);
- have good statistical properties.

Regarding the suggestion of libraries implementing hashes suitable for
use with untrusted data[3], I see 3 problems:
1. It is difficult for a library author to guess whether his library
will be used in a context where a “more secure” Hashtbl
implementation is required (and functorizing everything is problematic);
that's part of why having a “reasonable secure” default seems
important (to me, at least).
2. Implementing a polymorphic hash function is rather non-trivial.
3. Most users do not use secure primitives, especially if the default
option doesn't document its security trade-off.

Regards,
Nicolas

[1] https://131002.net/siphash/
[2] “œstrong” means here that it is claimed to be a PRF (e.g. not
effectively distinguishable from a random oracle)
[3] It isn't necessarily “œover the Internet”, and the hashes themselves
need not be exposed for the security to matter.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

  parent reply	other threads:[~2013-11-19 22:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 20:44 Richard W.M. Jones
2013-11-19  0:08 ` Gerd Stolpmann
2013-11-19  7:53   ` David MENTRE
2013-11-19  8:50     ` Richard W.M. Jones
2013-11-19  9:14     ` Gabriel Scherer
2013-11-19 11:19       ` Dario Teixeira
2013-11-19 12:55         ` rixed
2013-11-19 22:18           ` Nicolas Braud-Santoni
2013-11-19 22:39             ` Eric Cooper
2013-11-19 22:55               ` Nicolas Braud-Santoni
2013-11-25 13:46                 ` Goswin von Brederlow
2013-11-19 22:31         ` Nicolas Braud-Santoni
2013-11-20 18:56         ` Florian Weimer
2013-11-20 21:47           ` Gerd Stolpmann
2013-11-25 13:51             ` Goswin von Brederlow
2013-11-25 14:43               ` Yaron Minsky
2013-11-19 22:15     ` Nicolas Braud-Santoni [this message]
2013-11-25 13:38   ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528BE31E.70308@braud-santoni.eu \
    --to=nicolas@braud-santoni.eu \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).