From mboxrd@z Thu Jan 1 00:00:00 1970 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> In-reply-to: Your message of "Mon, 21 Sep 2009 14:02:40 EDT." <650f1c31a83a452580882cbad2dfbba7@quanstro.net> References: <650f1c31a83a452580882cbad2dfbba7@quanstro.net> From: Bakul Shah Date: Mon, 21 Sep 2009 12:10:05 -0700 Message-Id: <20090921191005.DD14F5B55@mail.bitblocks.com> Subject: Re: [9fans] Petabytes on a budget: JBODs + Linux + JFS Topicbox-Message-UUID: 73c71c7a-ead5-11e9-9d60-3106f5b1d025 On Mon, 21 Sep 2009 14:02:40 EDT erik quanstrom wrote: > > > i would think this is acceptable. at these low levels, something > > > else is going to get you -- like drives failing unindependently. > > > say because of power problems. > > > > 8% rate for an array rebuild may or may not be acceptable > > depending on your application. > > i think the lesson here is don't by cheep drives; if you > have enterprise drives at 1e-15 error rate, the fail rate > will be 0.8%. of course if you don't have a raid, the fail > rate is 100%. > > if that's not acceptable, then use raid 6. Hopefully Raid 6 or zfs's raidz2 works well enough with cheap drives! > > > so there are 4 ways to fail. 3 double fail have a probability of > > > 3*(2^9 bits * 1e-14 1/ bit)^2 > > > > Why 2^9 bits? A sector is 2^9 bytes or 2^12 bits. > > > cut-and-paste error. sorry that was 2^19 bits, e.g. 64k*8 bits/byte. > the calculation is still correct, since it was done on that basis. Ok. > > If per sector recovery is done, you have > > 3E-22*(64K/512) = 3.84E-20 > > i'd be interested to know if anyone does this. it's not > as easy as it would first appear. do you know of any > hardware or software that does sector-level recovery? No idea -- I haven't really looked in this area in ages. In case of two stripes being bad it would make sense to me to reread a stripe one sector at a time since chances of the exact same sector being bad on two disks is much lower (about 2^14 times smaller for 64k stripes?). I don't know if disk drives return a error bit array along with data of a multisector read (nth bit is set if nth sector could not be recovered). If not, that would be a worthwhile addition. > i don't have enough data to know how likely it is to > have exactly 1 bad sector. any references? Not sure what you are asking. Reed-solomon are block codes, applied to a whole sector so per sector error rate is UER*512*8 where UER == uncorrectable error rate. [Early IDE disks had 4 byte ECC per sector. Now that bits are packed so tight, S/N ratio is far worse and ECC is at least 40 bytes, to keep UER to 1E-14 or whatever is the target].