9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Nathaniel W Filardo <nwf@cs.jhu.edu>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] In case anyone worries about block hash collision in	venti
Date: Mon,  8 Feb 2010 18:37:50 -0500	[thread overview]
Message-ID: <20100208233750.GR15480@gradx.cs.jhu.edu> (raw)
In-Reply-To: <4B6F39E7.6050605@maht0x0r.net>

[-- Attachment #1: Type: text/plain, Size: 2660 bytes --]

On Sun, Feb 07, 2010 at 10:08:39PM +0000, matt wrote:
> not only has someone got to find a collision during a tiny timeframe,  
> they also have to fit it in 8k

MD5 collisions can, apparently, be constructed in 24 hours on a laptop these
days.  Yes, it's a constraint, but.

Venti supports larger blocks than 8K, and even if it didn't, the exhibited
MD5 collisions to date are short strings and would fit.  Working with larger
data feels like it would be unlikely to be advantageous to the collision
search -- 8K is much more than 20 bytes, and so there should be plenty of
pidgeons to cram into holes.

In any case, I fear that the point has been lost in details.  Something
about forests and trees.  So:

The point was not to say that you shouldn't be using venti; it's a lovely
idea and the code does its job somewhat well, bugs aside.  The point was to
object to the kind of analysis which postulates that you need to store 2^98
blocks with a 256 bit hash (or 2^50 for 160-bit hashes) before collisions
start be meaningful.

If you're storing iid uniformly-selected-at-random blocks of iid uniform
noise then yes, that's true, but you're not and besides the instantiation of
the random oracle model you're using (SHA-1) isn't perfect.  As such, we
should expect to see collisions after many fewer blocks stored -- strictly,
the iid uniform-at-random selection criterion yields an upper bound.

One way to demonstrate this upper-bound nature is to exhibit work on hash
collisions and the reduction in estimated security margins.  To take an
extreme case, the Fletcher checksum or your favorite CRC polynomial can be
extended to be perfectly valid 160 bit hash, but the lack of cryptographic
security makes it laughably easy to collide blocks.  Surely you'd be dubious
of a Venti where I replaced SHA-1 with such an extended Fletcher?  (It'd be
faster, too!)  Why?  Selection of blocks iid uniformly-at-random should mean
that it will take exactly as many blocks as before, before you see a
problem...  SHA-1 is not laughably easy to break by any stretch of the
imagination, but it also isn't the "impossibly hard" you'd get in the random
oracle model either.

(While nitpicking about the analysis, I feel compelled to add that hardware
uncorrectable bitflips are still reported as erasures, whereas venti
collisions are reported as success and only caught if somebody's doing
checksumming at larger granularity.  So at least in the one case you know
you're in trouble.  Collisions in venti act as if the disk corrupted so many
bits that we move into the correctable region for a different codeword.)

--nwf;

[-- Attachment #2: Type: application/pgp-signature, Size: 204 bytes --]

  reply	other threads:[~2010-02-08 23:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-06 18:47 maht
2010-02-06 23:42 ` Tim Newsham
2010-02-07  4:47   ` erik quanstrom
2010-02-07 16:54     ` Tim Newsham
2010-02-07 17:44       ` erik quanstrom
2010-02-07 19:12         ` Don Bailey
2010-02-07 19:24         ` Nathaniel W Filardo
2010-02-07 22:08           ` matt
2010-02-08 23:37             ` Nathaniel W Filardo [this message]
2010-02-09 13:13               ` hiro
2010-02-09 13:50                 ` ron minnich
2010-02-09 14:54                   ` erik quanstrom
2010-02-07 20:03         ` Tim Newsham
2010-02-08 21:58           ` Georg Lehner
2010-02-07 20:21   ` [9fans] In case anyone worries about block hash collision in Lyndon Nerenberg (VE6BBM/VE7TFX)
2010-02-07 20:31     ` erik quanstrom
2010-02-07 20:57       ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2010-02-07 23:25         ` Akshat Kumar
2010-02-08  0:37           ` Russ Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100208233750.GR15480@gradx.cs.jhu.edu \
    --to=nwf@cs.jhu.edu \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).