From mboxrd@z Thu Jan 1 00:00:00 1970 To: weigelt@metux.de, 9fans@cse.psu.edu Subject: Re: [9fans] thoughs about venti+fossil From: "Russ Cox" Date: Thu, 6 Mar 2008 11:58:28 -0500 In-Reply-To: <20080306123941.GE18329@nibiru.local> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Message-Id: <20080306165916.7B88C1E8C22@holo.morphisms.net> Cc: Topicbox-Message-UUID: 713f21a2-ead3-11e9-9d60-3106f5b1d025 > (we couldn't use hashing for traffic reductions, safely). yes you can. you can use hashes to build a hash table with a collision policy. there is some company (whose name escapes me; maybe someone else will remember) that makes exactly this product, so that once network A has sent a particular chunk of data to network B once, future transmissions are replaced transparently with a shorter name. kind of like lempel-ziv on steroids. apparently it makes cross-country ms exchange servers and file servers much more bearable. > it would be an interesting feature. Of course the fs on top then > MUST refresh from time to time, but this can be done while the > system is idle (good for situations with high load peaks and enough > idle time on the other hand). sorry, but this is just a fantastically terrible idea. you're taking a reliable system and making it unreliable. if you were really concerned, it would be better to implement a garbage collector that you could hand a root set. even that would worry me (a simple bug would wipe out your entire archive), but it wouldn't be as bad as relying on timeouts. > For this I need to be *sure* that there will be > *no* collissions, even if the system runs for a long time and > grows really big (maybe several PB on thousands of nodes). > > Another interesting question: can the risk of colissions be > reduced by combining several different hash functions in > parallel ? sure. use sha-256 and your probability of collision goes down even further. but *you* (probably) still won't be *sure*. russ