From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 8 Feb 2010 18:37:50 -0500 From: Nathaniel W Filardo To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <20100208233750.GR15480@gradx.cs.jhu.edu> References: <4B6DB95F.4090907@maht0x0r.net> <78b9710340a6345eac9f8690d306e1bb@brasstown.quanstro.net> <3dd5c634eddc6496085190a0e6de46a4@ladd.quanstro.net> <20100207192400.GN15480@gradx.cs.jhu.edu> <4B6F39E7.6050605@maht0x0r.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="QRtLtq+kfJNLc57H" Content-Disposition: inline In-Reply-To: <4B6F39E7.6050605@maht0x0r.net> User-Agent: Mutt/1.5.18 (2008-05-17) Subject: Re: [9fans] In case anyone worries about block hash collision in venti Topicbox-Message-UUID: d098b418-ead5-11e9-9d60-3106f5b1d025 --QRtLtq+kfJNLc57H Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Feb 07, 2010 at 10:08:39PM +0000, matt wrote: > not only has someone got to find a collision during a tiny timeframe, =20 > they also have to fit it in 8k MD5 collisions can, apparently, be constructed in 24 hours on a laptop these days. Yes, it's a constraint, but. Venti supports larger blocks than 8K, and even if it didn't, the exhibited MD5 collisions to date are short strings and would fit. Working with larger data feels like it would be unlikely to be advantageous to the collision search -- 8K is much more than 20 bytes, and so there should be plenty of pidgeons to cram into holes. In any case, I fear that the point has been lost in details. Something about forests and trees. So: The point was not to say that you shouldn't be using venti; it's a lovely idea and the code does its job somewhat well, bugs aside. The point was to object to the kind of analysis which postulates that you need to store 2^98 blocks with a 256 bit hash (or 2^50 for 160-bit hashes) before collisions start be meaningful. If you're storing iid uniformly-selected-at-random blocks of iid uniform noise then yes, that's true, but you're not and besides the instantiation of the random oracle model you're using (SHA-1) isn't perfect. As such, we should expect to see collisions after many fewer blocks stored -- strictly, the iid uniform-at-random selection criterion yields an upper bound. One way to demonstrate this upper-bound nature is to exhibit work on hash collisions and the reduction in estimated security margins. To take an extreme case, the Fletcher checksum or your favorite CRC polynomial can be extended to be perfectly valid 160 bit hash, but the lack of cryptographic security makes it laughably easy to collide blocks. Surely you'd be dubious of a Venti where I replaced SHA-1 with such an extended Fletcher? (It'd be faster, too!) Why? Selection of blocks iid uniformly-at-random should mean that it will take exactly as many blocks as before, before you see a problem... SHA-1 is not laughably easy to break by any stretch of the imagination, but it also isn't the "impossibly hard" you'd get in the random oracle model either. (While nitpicking about the analysis, I feel compelled to add that hardware uncorrectable bitflips are still reported as erasures, whereas venti collisions are reported as success and only caught if somebody's doing checksumming at larger granularity. So at least in the one case you know you're in trouble. Collisions in venti act as if the disk corrupted so many bits that we move into the correctable region for a different codeword.) --nwf; --QRtLtq+kfJNLc57H Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAktwoE4ACgkQTeQabvr9Tc8j8gCfXTx31xD6PYsGHVzIHaVQJ8py hfQAn27WK5H4GXtgeaAiluzkfiGHQUKC =poxs -----END PGP SIGNATURE----- --QRtLtq+kfJNLc57H--