From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <20100208233750.GR15480@gradx.cs.jhu.edu> References: <4B6DB95F.4090907@maht0x0r.net> <78b9710340a6345eac9f8690d306e1bb@brasstown.quanstro.net> <3dd5c634eddc6496085190a0e6de46a4@ladd.quanstro.net> <20100207192400.GN15480@gradx.cs.jhu.edu> <4B6F39E7.6050605@maht0x0r.net> <20100208233750.GR15480@gradx.cs.jhu.edu> Date: Tue, 9 Feb 2010 14:13:05 +0100 Message-ID: From: hiro <23hiro@googlemail.com> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Subject: Re: [9fans] In case anyone worries about block hash collision in venti Topicbox-Message-UUID: d0fe63bc-ead5-11e9-9d60-3106f5b1d025 >I feel compelled to add that hardware >uncorrectable bitflips are still reported as erasures, whereas venti >collisions are reported as success and only caught if somebody's doing >checksumming at larger granularity. Uncorrectable bitflips can also be unreported. If you are really paranoid and don't want any collisions in the next 10 years: don't let strangers in your venti. On 2/9/10, Nathaniel W Filardo wrote: > On Sun, Feb 07, 2010 at 10:08:39PM +0000, matt wrote: >> not only has someone got to find a collision during a tiny timeframe, >> they also have to fit it in 8k > > MD5 collisions can, apparently, be constructed in 24 hours on a laptop these > days. Yes, it's a constraint, but. > > Venti supports larger blocks than 8K, and even if it didn't, the exhibited > MD5 collisions to date are short strings and would fit. Working with larger > data feels like it would be unlikely to be advantageous to the collision > search -- 8K is much more than 20 bytes, and so there should be plenty of > pidgeons to cram into holes. > > In any case, I fear that the point has been lost in details. Something > about forests and trees. So: > > The point was not to say that you shouldn't be using venti; it's a lovely > idea and the code does its job somewhat well, bugs aside. The point was to > object to the kind of analysis which postulates that you need to store 2^98 > blocks with a 256 bit hash (or 2^50 for 160-bit hashes) before collisions > start be meaningful. > > If you're storing iid uniformly-selected-at-random blocks of iid uniform > noise then yes, that's true, but you're not and besides the instantiation of > the random oracle model you're using (SHA-1) isn't perfect. As such, we > should expect to see collisions after many fewer blocks stored -- strictly, > the iid uniform-at-random selection criterion yields an upper bound. > > One way to demonstrate this upper-bound nature is to exhibit work on hash > collisions and the reduction in estimated security margins. To take an > extreme case, the Fletcher checksum or your favorite CRC polynomial can be > extended to be perfectly valid 160 bit hash, but the lack of cryptographic > security makes it laughably easy to collide blocks. Surely you'd be dubious > of a Venti where I replaced SHA-1 with such an extended Fletcher? (It'd be > faster, too!) Why? Selection of blocks iid uniformly-at-random should mean > that it will take exactly as many blocks as before, before you see a > problem... SHA-1 is not laughably easy to break by any stretch of the > imagination, but it also isn't the "impossibly hard" you'd get in the random > oracle model either. > > (While nitpicking about the analysis, I feel compelled to add that hardware > uncorrectable bitflips are still reported as erasures, whereas venti > collisions are reported as success and only caught if somebody's doing > checksumming at larger granularity. So at least in the one case you know > you're in trouble. Collisions in venti act as if the disk corrupted so many > bits that we move into the correctable region for a different codeword.) > > --nwf; >