From mboxrd@z Thu Jan  1 00:00:00 1970
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
In-reply-to: Your message of "Mon, 21 Sep 2009 14:02:40 EDT."
	<650f1c31a83a452580882cbad2dfbba7@quanstro.net>
References: <650f1c31a83a452580882cbad2dfbba7@quanstro.net>
From: Bakul Shah <bakul+plan9@bitblocks.com>
Date: Mon, 21 Sep 2009 12:10:05 -0700
Message-Id: <20090921191005.DD14F5B55@mail.bitblocks.com>
Subject: Re: [9fans] Petabytes on a budget: JBODs + Linux + JFS
Topicbox-Message-UUID: 73c71c7a-ead5-11e9-9d60-3106f5b1d025

On Mon, 21 Sep 2009 14:02:40 EDT erik quanstrom <quanstro@quanstro.net>  wrote:
> > > i would think this is acceptable.  at these low levels, something
> > > else is going to get you -- like drives failing unindependently.
> > > say because of power problems.
> >
> > 8% rate for an array rebuild may or may not be acceptable
> > depending on your application.
>
> i think the lesson here is don't by cheep drives; if you
> have enterprise drives at 1e-15 error rate, the fail rate
> will be 0.8%.  of course if you don't have a raid, the fail
> rate is 100%.
>
> if that's not acceptable, then use raid 6.

Hopefully Raid 6 or zfs's raidz2 works well enough with cheap
drives!

> > > so there are 4 ways to fail.  3 double fail have a probability of
> > > 3*(2^9 bits * 1e-14 1/ bit)^2
> >
> > Why 2^9 bits? A sector is 2^9 bytes or 2^12 bits.
>
>
> cut-and-paste error.  sorry that was 2^19 bits, e.g. 64k*8 bits/byte.
> the calculation is still correct, since it was done on that basis.

Ok.

> > If per sector recovery is done, you have
> > 	3E-22*(64K/512) = 3.84E-20
>
> i'd be interested to know if anyone does this.  it's not
> as easy as it would first appear.  do you know of any
> hardware or software that does sector-level recovery?

No idea -- I haven't really looked in this area in ages.  In
case of two stripes being bad it would make sense to me to
reread a stripe one sector at a time since chances of the
exact same sector being bad on two disks is much lower (about
2^14 times smaller for 64k stripes?).  I don't know if disk
drives return a error bit array along with data of a
multisector read (nth bit is set if nth sector could not be
recovered).  If not, that would be a worthwhile addition.

> i don't have enough data to know how likely it is to
> have exactly 1 bad sector.  any references?

Not sure what you are asking.  Reed-solomon are block codes,
applied to a whole sector so per sector error rate is
UER*512*8 where UER == uncorrectable error rate. [Early IDE
disks had 4 byte ECC per sector.  Now that bits are packed so
tight, S/N ratio is far worse and ECC is at least 40 bytes,
to keep UER to 1E-14 or whatever is the target].