From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <140e7ec30803100319l31e1088fm9f6b4e89f92047f5@mail.gmail.com> Date: Mon, 10 Mar 2008 19:19:35 +0900 From: sqweek To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: [9fans] thoughs about venti+fossil In-Reply-To: <68a46edfa8e40c2fc74da101e3dbe24b@terzarima.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080306064002.GD18329@nibiru.local> <68a46edfa8e40c2fc74da101e3dbe24b@terzarima.net> Cc: Topicbox-Message-UUID: 74378eda-ead3-11e9-9d60-3106f5b1d025 On Fri, Mar 7, 2008 at 12:09 AM, Charles Forsyth wrote: > > But for HA applications, we still need some additional redundancy > > or at least some error diagnostics at application level. Well, > > we'll most likely needs this anyways, eg. to detect human fault > > or code bugs. > > i hadn't realised the code i'd quoted only dealt with blocks in memory > (i didn't look hard enough once i'd found it), but russ then pointed out > that another option will do something like the check i'd intended. > > given that, you have at least a check and a diagnostic that the > unlikely event ocurred. it isn't the case i'd worry about first. after all, the applications > pull the stuff into memory across interfaces that might have at most a parity > check, after transmission using protocols that use a fairly simple 16-bit > check sum, a compromise between speed of calculation and effectiveness. > one might sometimes add an end-to-end check, or digesting ... perhaps using SHA1! The difference between this and venti (aside from the factor of 2^60 or whatever it was) is that network/memory/disk errors are either transient or managable. Silent network error? Going to be difficult to notice, but once you do a retransmit will fix it (or if things are really bad, a replacement network card). RAM Problems? If transient, it is fixed next reboot, otherwise replace the module. Silent disk corruption? Rewrite the data or replace the disk. Venti hash collision? Um... well, it doesn't matter how many times we try to rewrite the block in question, it is always going to collide. Replacing venti seems less than satisfactory - what else provides the same functionality? Our best option is to replace the hash and hope we don't get a different collision. But, this leaves us with a whole bunch of data addressed by the old hashing scheme which we presumably have to write new code to convert[1]. New code means new bugs, and I'd be lying if I claimed the prospect of writing such a utility to run on several years of a venti archive didn't scare me. [1] Unless you could do this with vac and co... my venti-fu is weak. I'm setting my file server up soon, I promise! But if I normalise my worries based on the likelihood of the problem occuring, then the real thing leaving a bad taste in my mouth is that eventually something happens to force maintenance: 1) you get a hash collision 2) something displaces venti 3) venti changes OTOH, eventually you're going to run out of disk space, so venti is unlikely to be the weak link here either. Well, I came up with one perhaps more interesting question while thinking about what happens with different block sizes (in particular blocks of one byte and blocks of the same size as the hash)... As I understand it, venti uses the hash of the data to determine where on disk to store the block. So, what happens when the hash resolves to an address which is off the end of the disk? -sqweek