From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4627E197.90804@conducive.org> Date: Fri, 20 Apr 2007 05:39:35 +0800 From: W B Hacker User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.0.8) Gecko/20061030 SeaMonkey/1.0.6 MIME-Version: 1.0 To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] Recovering a venti from disk failure References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Topicbox-Message-UUID: 4cacb83c-ead2-11e9-9d60-3106f5b1d025 erik quanstrom wrote: >> Various studies seem to indicate failure rates are highly >> correlated with drive model, vintage and manufacturer. >> Assuming a RAID is built from similar disks, when one fails >> the others are more likely to fail. > > while it is true that some disks vintages are better than others, when > one drive fails, the probability of the other drives failing has not > changed. this is the same as if you flip a coin ten times and get ten > heads, the probability of flipping the same coin and getting heads, is > still 1/2. > >>> i think this corelation gives people the false impression that they do >>> fail en masse, but that's really wrong. the latent errors probablly >>> happened months ago. >> Yes but if there are many latent errors and/or the error rate >> is going up it is time to replace it. > > maybe. the goggle paper you cited didn't find a strong correlation > between smart errors (including block relocation) and failure. > >> This is a good idea. We did this in 1983, back when disks >> were simpler beasts. No RAID then of course. > > even a better idea back then. disks didn't have 1/4 million > lines of firmware relocating blocks and doing other things to^w > i mean for you. > > - erik > > And - lest we forget - a RAID array actually has a higher statistical chance of failure, and a *lower* MTBF than a single drive. Simple math. What we gain is a reduced risk of *unrecoverable* damage, not fewer failures, per se. Bill