From mboxrd@z Thu Jan 1 00:00:00 1970 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> In-reply-to: Your message of "Mon, 27 Feb 2017 19:02:29 GMT." References: <8D987F97-4760-4243-A9E7-F2F3BA9C63E3@bitblocks.com> <20170226194618.E3E5F124AEA5@mail.bitblocks.com> <7F0C4A3F-0EC1-437B-BE8C-9BC97BE651E9@westryn.net> Date: Mon, 27 Feb 2017 12:14:08 -0800 From: Bakul Shah Message-Id: <20170227201408.13B5C124AEA5@mail.bitblocks.com> Subject: Re: [9fans] SHA-1 collision and venti Topicbox-Message-UUID: b5f85862-ead9-11e9-9d60-3106f5b1d025 On Mon, 27 Feb 2017 19:02:29 GMT Charles Forsyth wrote: > On 27 February 2017 at 18:30, Charles Forsyth > wrote: > > > that's a separate argument that venti would never work for you, regardless > > of the hash algorithm used. > since venti returns the resulting score from each write, and it knows > whether there's been a collision, > it appears it could return a modified score (having ensured that is now > unique, "and the next judge said that's a very shaggy dog") Consider what can happens you want to consolidate two venti archives into another one. Each source venti has a different file with the same hash. When you discover in the destination venti that they collide, it is too late to return a modified score -- you have to find and fix all pointer blocks that refer to this block as well. In theory the chance of a random collion with SHA1 may be 1 in 2^80 but we have existing files that collide (unlike the hypothetical argument of someone wanting to store 10^21 byte size files -- but if they can produce it, we can store it!). Your argument is that since venti is readonly, existing data in it is not vulnerable but not everyone stores their archives on readonly medium. Another argument would be that almost always venti is privately used and unlikely to be accessible to the badguys. Yet another argument is that hardly anyone uses venti so why even bother. These are behavior patterns that are true today but why limit its usefulness? Just as we move archived data we care about to more modern media (as we no longer have easy access to floppies, 9track tapes, 1.4" streamer tape etc.), and update our crypto keys, since they too have limited shelf-life, we can replace the use of SHA1. This is a fixable problem. [It is much much worse for git given the amount of s/w that relies on it. I think it is a matter of time before someone comes up with a collision between two different types of git objects (such as a blob and a tree) but we'll let Linus worry about it :-)] The solution is to convert from sha1 to blake2b or something strong and be prepared to move the data again in 10-20 years.