[9fans] ata drive capabilities

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] ata drive capabilities
@ 2007-12-25 21:40 Christian Kellermann
  2007-12-25 21:48 ` Pietro Gagliardi
  2007-12-25 23:59 ` erik quanstrom
  0 siblings, 2 replies; 19+ messages in thread
From: Christian Kellermann @ 2007-12-25 21:40 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 486 bytes --]

Hi 9fans,

can someone on this list tell me how to interpret the config part of 
cpu% cat /dev/sdC0/ctl
inquiry WDC WD1600JB-00REA0                     
config 427A capabilities 2F00 dma 00550020 dmactl 00550020 rwm 16 rwmctl 0 lba48always off

I am trying to figure out whether the disk signals the implementation
of the SMART feature set.

Kind regards,

Christian

-- 
You may use my gpg key for replies:
pub  1024D/47F79788 2005/02/02 Christian Kellermann (C-Keen)

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-25 21:40 [9fans] ata drive capabilities Christian Kellermann
@ 2007-12-25 21:48 ` Pietro Gagliardi
  2007-12-25 23:59 ` erik quanstrom
  1 sibling, 0 replies; 19+ messages in thread
From: Pietro Gagliardi @ 2007-12-25 21:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

You can tell this from your BIOS setup.

On Dec 25, 2007, at 4:40 PM, Christian Kellermann wrote:

> Hi 9fans,
>
> can someone on this list tell me how to interpret the config part of
> cpu% cat /dev/sdC0/ctl
> inquiry WDC WD1600JB-00REA0
> config 427A capabilities 2F00 dma 00550020 dmactl 00550020 rwm 16  
> rwmctl 0 lba48always off
>
> I am trying to figure out whether the disk signals the implementation
> of the SMART feature set.
>
> Kind regards,
>
> Christian
>
> -- 
> You may use my gpg key for replies:
> pub  1024D/47F79788 2005/02/02 Christian Kellermann (C-Keen)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-25 21:40 [9fans] ata drive capabilities Christian Kellermann
  2007-12-25 21:48 ` Pietro Gagliardi
@ 2007-12-25 23:59 ` erik quanstrom
  2007-12-26  6:31   ` ron minnich
  1 sibling, 1 reply; 19+ messages in thread
From: erik quanstrom @ 2007-12-25 23:59 UTC (permalink / raw)
  To: 9fans

> Hi 9fans,
> 
> can someone on this list tell me how to interpret the config part of 
> cpu% cat /dev/sdC0/ctl
> inquiry WDC WD1600JB-00REA0                     
> config 427A capabilities 2F00 dma 00550020 dmactl 00550020 rwm 16 rwmctl 0 lba48always off
> 
> I am trying to figure out whether the disk signals the implementation
> of the SMART feature set.
> 
> Kind regards,

in the return of identify (packet) device, if bits 14:16 of word 83 is 1, then
smart support is indicated by word 82 bit 1. otherwise smart isn't supported.

word 49 is the capabilities word and the important bits of the configuration
are:

10,11	iordy configuration
bit 8	dma support
9	lba support

the intel/amd sata driver support smart commands via
	echo smartenable>/dev/sdXX/ctl		# turn drive's smart on.
	echo smart>/dev/sdXX/ctl			# smart report status.
this isn't implemented in the sdata driver, but i think a similar
strategy could be employed.  note: smart commands are not dma
commands.  also, smart support doesn't imply much about what
commands are supported or much about the return values.
report returns if the drive is likely to fail seems the most useful.

bios isn't always helpful in this regard.  some bios don't report
smart status.  some bios do a smart check on power on and won't
boot with a drive that smart considers suspect.  (we have a drive
in the lab that smart declares will fail any minute now.  it's been
this way for 2 years.)  this can be a big problem if you have a
machine with raid that won't boot due to a drive failure.
(why have a raid if one failure means an unbootable machine?)

- erik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-25 23:59 ` erik quanstrom
@ 2007-12-26  6:31   ` ron minnich
  2007-12-26 13:10     ` erik quanstrom
  0 siblings, 1 reply; 19+ messages in thread
From: ron minnich @ 2007-12-26  6:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Dec 25, 2007 6:59 PM, erik quanstrom <quanstro@quanstro.net> wrote:
>(we have a drive
> in the lab that smart declares will fail any minute now.  it's been
> this way for 2 years.)

>From everything I've seen, SMART has zero correlation with real
hardware issues -- confirmed by a discussion with someone at a big
search company. SMART is dumb.



> this can be a big problem if you have a
> machine with raid that won't boot due to a drive failure.
> (why have a raid if one failure means an unbootable machine?)

it makes great ad copy.

ron


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26  6:31   ` ron minnich
@ 2007-12-26 13:10     ` erik quanstrom
  2007-12-26 19:52       ` Christian Kellermann
  0 siblings, 1 reply; 19+ messages in thread
From: erik quanstrom @ 2007-12-26 13:10 UTC (permalink / raw)
  To: 9fans

On Wed Dec 26 01:33:14 EST 2007, rminnich@gmail.com wrote:
> On Dec 25, 2007 6:59 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> >(we have a drive
> > in the lab that smart declares will fail any minute now.  it's been
> > this way for 2 years.)
> 
> From everything I've seen, SMART has zero correlation with real
> hardware issues -- confirmed by a discussion with someone at a big
> search company. SMART is dumb.

the google paper shows a 40% afr for the first 6 months after some
smart errors appear.  (unfortunately they don't do numbers for
a simple smart status.)

from my understanding of how google do things, loosing a drive just
means they need to replace it.  so it's cheeper to let drives fail.
on the other hand, we have our main filesystem raided on an aoe
appliance.  suppose that one of those raids has two disks showing
a smart status of "will fail".  in this case i want to know the elevated
risk and i will allocate a spare drive to replace at least one of the
drives.

i guess this is the long way of saying, it all depends on how painful
loosing your data might be.  if it's painful enough, even a poor tool
like smart is better than nothing.

- erik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26 13:10     ` erik quanstrom
@ 2007-12-26 19:52       ` Christian Kellermann
  2007-12-26 20:13         ` andrey mirtchovski
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Christian Kellermann @ 2007-12-26 19:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 552 bytes --]

Thanks for your replies!

The reason I asked is that I am thinking about a couple of methods
to detect failures of my two mirrored disks (by fs) automatically.
How do you check if your disks are still ok? I know I could invest
in a real raid controller and rely on that but I still like the
idea of being independent of yet another piece of hardware which
internal formatting of the disk is hidden from me..

Kind regards,

Christian

-- 
You may use my gpg key for replies:
pub  1024D/47F79788 2005/02/02 Christian Kellermann (C-Keen)

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26 19:52       ` Christian Kellermann
@ 2007-12-26 20:13         ` andrey mirtchovski
  2007-12-27 18:12           ` Christian Kellermann
  2007-12-26 23:58         ` Robert William Fuller
  2007-12-27  2:34         ` erik quanstrom
  2 siblings, 1 reply; 19+ messages in thread
From: andrey mirtchovski @ 2007-12-26 20:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> How do you check if your disks are still ok?

i used to run cmp on the two mirrorred partitions to verify weekly
that devfs hadn't missed anything. i would imagine if one of the disks
went south cmp would fail.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26 19:52       ` Christian Kellermann
  2007-12-26 20:13         ` andrey mirtchovski
@ 2007-12-26 23:58         ` Robert William Fuller
  2007-12-27  2:34         ` erik quanstrom
  2 siblings, 0 replies; 19+ messages in thread
From: Robert William Fuller @ 2007-12-26 23:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Christian Kellermann wrote:
> Thanks for your replies!
> 
> The reason I asked is that I am thinking about a couple of methods
> to detect failures of my two mirrored disks (by fs) automatically.
> How do you check if your disks are still ok? I know I could invest

The newer high capacity drives all have high "raw error read rates" but 
they're generally all corrected as indicated by an equivalent value of 
"hardware ECC corrected."  So this does not seem to really correspond to 
anything.

Frankly, the one SMART variable I've seen that seems to always 
correspond to impending disk failure is the "reallocated sector count." 
  Once you see that incrementing, it's time to decommission that disk 
for anything other than scratch storage.  Such drives are good for 
intermediate files of non-linear video editing until they die.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26 19:52       ` Christian Kellermann
  2007-12-26 20:13         ` andrey mirtchovski
  2007-12-26 23:58         ` Robert William Fuller
@ 2007-12-27  2:34         ` erik quanstrom
  2 siblings, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2007-12-27  2:34 UTC (permalink / raw)
  To: 9fans

> Thanks for your replies!
> 
> The reason I asked is that I am thinking about a couple of methods
> to detect failures of my two mirrored disks (by fs) automatically.
> How do you check if your disks are still ok? I know I could invest
> in a real raid controller and rely on that but I still like the
> idea of being independent of yet another piece of hardware which
> internal formatting of the disk is hidden from me..
> 
> Kind regards,
> 
> Christian

i'm not sure you need that.  fs(3) already logs i/o errors which should
be as good an indication of trouble as smart.  i/o errors have the benefit
of not being drive dependent.  the recovery will need to be done by
hand anway, as fs doesn't have the concept of device state.  (there are
some subtile difficulties, too.  sometimes drives read an lba correctly
but writes fail.)

for something more automatic, devices would need state and an 
online recovery mechanism.

- erik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26 20:13         ` andrey mirtchovski
@ 2007-12-27 18:12           ` Christian Kellermann
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Kellermann @ 2007-12-27 18:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 2264 bytes --]

* andrey mirtchovski <mirtchovski@gmail.com> [071226 21:17]:
> > How do you check if your disks are still ok?
> 
> i used to run cmp on the two mirrorred partitions to verify weekly
> that devfs hadn't missed anything. i would imagine if one of the disks
> went south cmp would fail.

Ah Thanks!

I have fs(3) configured as follows:

cpu% cat /dev/fs/ctl
mirror arenas /dev/sdD0/arenas /dev/sdD1/arenas
mirror isect /dev/sdD0/isect /dev/sdD1/isect
mirror fscfg /dev/sdD0/fscfg /dev/sdD1/fscfg

And my disks have been formatted and partitioned beforehand like this:

cpu% lc sdD0
arenas	ctl		data		fscfg		isect		plan9	raw
cpu% lc sdD1
arenas	ctl		data		fscfg		isect		plan9	raw

Still if I know run cmp against the arena partition from the running
system it says they differ, and ls says the same:

cpu% lc -ld sdD*
--rw-r----- S 0 bootes bootes 285779527168 Dec  5 03:37 arenas	--rw-r----- S 0 bootes bootes 285777530368 Dec  5 03:37 arenas
--rw-r----- S 0 bootes bootes            0 Dec  5 03:37 ctl			--rw-r----- S 0 bootes bootes            0 Dec  5 03:37 ctl
--rw-r----- S 0 bootes bootes 300069052416 Dec  5 03:37 data	--rw-r----- S 0 bootes bootes 300069052416 Dec  5 03:37 data
--rw-r----- S 0 bootes bootes          512 Dec  5 03:37 fscfg		--rw-r----- S 0 bootes bootes          512 Dec  5 03:37 fscfg
--rw-r----- S 0 bootes bootes  14288976384 Dec  5 03:37 isect	--rw-r----- S 0 bootes bootes  14288876544 Dec  5 03:37 isect
--rw-r----- S 0 bootes bootes 300068504064 Dec  5 03:37 plan9	--rw-r----- S 0 bootes bootes 300066407424 Dec  5 03:37 plan9
-lrw------- S 0 bootes bootes            0 Dec  5 03:37 raw			-lrw------- S 0 bootes bootes            0 Dec  5 03:37 raw

I cannot see the /dev/fs directory from my server terminal so I
wonder if fs is even active? But if it is not I wonder how my venti
tells me it is using /def/fs/arenas for storage?

I guess that fs(3) is working properly otherwise it would not show
the correct configuration on a simple bind. But then why do the
disks differ and would it suffice to dd the contents of sdD0 to
sdD1 using the live cd?

Kind regards,

Christian



-- 
You may use my gpg key for replies:
pub  1024D/47F79788 2005/02/02 Christian Kellermann (C-Keen)

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-27  9:01 Joshua Wood
@ 2007-12-27 15:15 ` Brantley Coile
  0 siblings, 0 replies; 19+ messages in thread
From: Brantley Coile @ 2007-12-27 15:15 UTC (permalink / raw)
  To: 9fans

> Oh right -- I'd forgotten about the wireless link out to another  
> building that is described in the above. Good enough against the  
> small meteorite. :-)

This design is bit selfish on my part.  I live only 2 miles from
the office. :)

 Brantley


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
@ 2007-12-27  9:01 Joshua Wood
  2007-12-27 15:15 ` Brantley Coile
  0 siblings, 1 reply; 19+ messages in thread
From: Joshua Wood @ 2007-12-27  9:01 UTC (permalink / raw)
  To: 9fans

> here's what we do useing ken's fs fs, not venti.
>         http://cm.bell-labs.com/iwp9/papers/23.disklessfs.pdf
>

Oh right -- I'd forgotten about the wireless link out to another  
building that is described in the above. Good enough against the  
small meteorite. :-)

Viewed from a high enough level, I think I'm using venti and a backup  
fileserver in a broadly similar way, albeit less `live' and fully  
integrated.

Thanks, Erik.

--
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-27  6:22 Joshua Wood
@ 2007-12-27  7:28 ` erik quanstrom
  0 siblings, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2007-12-27  7:28 UTC (permalink / raw)
  To: 9fans

> 
> Or, put another way, you're not asserting you have no backup beyond  
> that fileserver raid, are you? Because if so, I want to learn how I  
> can skip that step, too.
> 

here's what we do useing ken's fs fs, not venti.

	http://cm.bell-labs.com/iwp9/papers/23.disklessfs.pdf

- erik


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
@ 2007-12-27  6:22 Joshua Wood
  2007-12-27  7:28 ` erik quanstrom
  0 siblings, 1 reply; 19+ messages in thread
From: Joshua Wood @ 2007-12-27  6:22 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]

> i don't know.  if you lean that direction, then the only thing raid  
> gives
> you is reduced downtime.

On reflection, it seems I do lean that direction, and generally use  
raid mainly to dodge downtime. Our plan 9 systems (and `other' alike)  
mostly have redundant disks (when they must have them at all) -- but  
they have regular offsite backup also. I wonder if I'm being wasteful.

> i think of raid as reliable storage.  backups are for saving one's  
> bacon in
> the face of other disasters.  you know, sysadmin mistakes,  
> misconfiguration,
> code gone wild, building burns down,
>
meteorite! ;)

> (and if my experience with backups is any indiciation, it's best  
> not to
> rely on them.)
>
Probably another discussion, but I try to deal with this by testing  
the offsite backups (rdarena output) of the plan9 fileserver against  
a similar system that's designated the second-string fileserver. I  
haven't had to do it in production in a while, (raid narrowed the  
reasons I'd need to) so maybe I'm missing something and it would be  
less successful than in the testing.

> but this thinking is probablly specific to how i use raid.  i  
> imagine the
> exact answer on what raid gives you should be worked out based on
> the application.
>
I probably veer toward mere semantics, but I'd still define your use  
of raid to be uptime-protection. The list of exceptions you place  
under ``backups are for...'' is the same list, essentially, that  
motivates the offsite backups I mention -- the usual holes in the  
raid prophylactic. I see how Plan 9 facilities (esp. dump) ameliorate  
some of them: admin mistakes, for example. But it doesn't fireproof  
the system.

Or, put another way, you're not asserting you have no backup beyond  
that fileserver raid, are you? Because if so, I want to learn how I  
can skip that step, too.

--
Josh




[-- Attachment #2: Type: text/html, Size: 3013 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26 13:18 ` roger peppe
@ 2007-12-26 18:15   ` erik quanstrom
  0 siblings, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2007-12-26 18:15 UTC (permalink / raw)
  To: 9fans

> what a pity! it would have been so great to have had
> an objective assessment of reliability by manufacturer.
> 
> i've found it really quite hard to find useful data to
> indicate how reliable a drive might be.

would it really help?  any numbers generated would be indicators
for models no longer being sold.  do we know that past manufacturing
performance insures future performance?  

i think that's why they call different drive batches "vintages".

- erik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26 16:22 Joshua Wood
@ 2007-12-26 18:14 ` erik quanstrom
  0 siblings, 0 replies; 19+ messages in thread
From: erik quanstrom @ 2007-12-26 18:14 UTC (permalink / raw)
  To: 9fans

> > from my understanding of how google do things, loosing a drive just
> > means they need to replace it.  so it's cheeper to let drives fail.
> > on the other hand, we have our main filesystem raided on an aoe
> > appliance.  suppose that one of those raids has two disks showing
> > a smart status of "will fail".  in this case i want to know the  
> > elevated
> > risk and i will allocate a spare drive to replace at least one of the
> > drives.
> >
> > i guess this is the long way of saying, it all depends on how painful
> > loosing your data might be.  if it's painful enough, even a poor tool
> > like smart is better than nothing.
> >
> I agree (plus I was just wrong about SMART at first), though I do  
> think your example above is about preventing downtime, not so much  
> data loss (Even without smart entirely, and all the disks come up  
> corrupt, we're all backed up within some acceptable window, right?)

i don't know.  if you lean that direction, then the only thing raid gives
you is reduced downtime.

i think of raid as reliable storage.  backups are for saving one's bacon in
the face of other disasters.  you know, sysadmin mistakes, misconfiguration,
code gone wild, building burns down — disaster recovery.

(and if my experience with backups is any indiciation, it's best not to
rely on them.)

but this thinking is probablly specific to how i use raid.  i imagine the
exact answer on what raid gives you should be worked out based on
the application.  for linux-type filesystems, e.g., raid won't save your
accidently deleted files.

- erik


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
@ 2007-12-26 16:22 Joshua Wood
  2007-12-26 18:14 ` erik quanstrom
  0 siblings, 1 reply; 19+ messages in thread
From: Joshua Wood @ 2007-12-26 16:22 UTC (permalink / raw)
  To: 9fans

> the google paper shows a 40% afr for the first 6 months after some
> smart errors appear.  (unfortunately they don't do numbers for
> a simple smart status.)

Yes, and I rather mischaracterized the google paper's comments on  
SMART. A reread (I first read them a few months ago) shows the above.  
Further, the CMU paper even references the google study on the SMART  
subject:

``They find that [ ... ] the value of several SMART counters  
correlate highly with failures.''

So SMART appears a little less dumb. I'd say meets the better than  
nothing criterion.

> from my understanding of how google do things, loosing a drive just
> means they need to replace it.  so it's cheeper to let drives fail.
> on the other hand, we have our main filesystem raided on an aoe
> appliance.  suppose that one of those raids has two disks showing
> a smart status of "will fail".  in this case i want to know the  
> elevated
> risk and i will allocate a spare drive to replace at least one of the
> drives.
>
> i guess this is the long way of saying, it all depends on how painful
> loosing your data might be.  if it's painful enough, even a poor tool
> like smart is better than nothing.
>
I agree (plus I was just wrong about SMART at first), though I do  
think your example above is about preventing downtime, not so much  
data loss (Even without smart entirely, and all the disks come up  
corrupt, we're all backed up within some acceptable window, right?)


> what a pity! it would have been so great to have had
> an objective assessment of reliability by manufacturer.
>
Since the CMU thing found no difference between disk *types*, I  
wonder if it might be that there's little difference between  
manufacturers either -- instead the difference is in manufacturing,  
i.e., `vintage' & the like.

> i've found it really quite hard to find useful data to
> indicate how reliable a drive might be.
>

I think Fig. 2, Sec. 4.2 of the CMU paper relates to that; the  
`infant mortality' of manufactured mechanical parts isn't captured in  
MTTF -- but IDEMA is apparently going to solve this by replacing the  
single MTTF number that I don't quite understand with 4 different  
MTTF numbers, one for each `phase' of a disk's life.

--
Josh




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
  2007-12-26  7:44 Joshua Wood
@ 2007-12-26 13:18 ` roger peppe
  2007-12-26 18:15   ` erik quanstrom
  0 siblings, 1 reply; 19+ messages in thread
From: roger peppe @ 2007-12-26 13:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Dec 26, 2007 7:44 AM, Joshua Wood <josh@utopian.net> wrote:
> If it's everyone's favorite ``big search company'' in question, they
> have an [only moderately depressing] paper:
>        http://209.85.163.132/papers/disk_failures.pdf

where it says:
: Most age-related results are impacted by
: drive vintages. However, in this paper, we do not show a
: breakdown of drives per manufacturer, model, or vintage
: due to the proprietary nature of these data.

what a pity! it would have been so great to have had
an objective assessment of reliability by manufacturer.

i've found it really quite hard to find useful data to
indicate how reliable a drive might be.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] ata drive capabilities
@ 2007-12-26  7:44 Joshua Wood
  2007-12-26 13:18 ` roger peppe
  0 siblings, 1 reply; 19+ messages in thread
From: Joshua Wood @ 2007-12-26  7:44 UTC (permalink / raw)
  To: 9fans

> >From everything I've seen, SMART has zero correlation with real
> hardware issues -- confirmed by a discussion with someone at a big
> search company. SMART is dumb.

If it's everyone's favorite ``big search company'' in question, they  
have an [only moderately depressing] paper:
	http://209.85.163.132/papers/disk_failures.pdf

Turns out from their big sample that, nope, SMART isn't good at  
predicting failure; nor are temperature or activity levels. Instead  
it seems like almost entirely a manufacturing crapshoot.

SMART looks no smarter in CMU's study of the same topic, which nixes  
age as a good failure predictor, too:
	http://www.usenix.org/event/fast07/tech/schroeder/schroeder_html/ 
index.html

--
Josh


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-12-27 18:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-25 21:40 [9fans] ata drive capabilities Christian Kellermann
2007-12-25 21:48 ` Pietro Gagliardi
2007-12-25 23:59 ` erik quanstrom
2007-12-26  6:31   ` ron minnich
2007-12-26 13:10     ` erik quanstrom
2007-12-26 19:52       ` Christian Kellermann
2007-12-26 20:13         ` andrey mirtchovski
2007-12-27 18:12           ` Christian Kellermann
2007-12-26 23:58         ` Robert William Fuller
2007-12-27  2:34         ` erik quanstrom
2007-12-26  7:44 Joshua Wood
2007-12-26 13:18 ` roger peppe
2007-12-26 18:15   ` erik quanstrom
2007-12-26 16:22 Joshua Wood
2007-12-26 18:14 ` erik quanstrom
2007-12-27  6:22 Joshua Wood
2007-12-27  7:28 ` erik quanstrom
2007-12-27  9:01 Joshua Wood
2007-12-27 15:15 ` Brantley Coile

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).