[9fans] disk error info

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] disk error info
@ 2006-01-19 20:37 Steve Simon
  2006-01-20  8:50 ` Nigel Roles
  0 siblings, 1 reply; 2+ messages in thread
From: Steve Simon @ 2006-01-19 20:37 UTC (permalink / raw)
  To: 9fans

Ok,

After recovering from my seccond disk failure in a month -
both part of the same batch so I suspose it makes sense.

I began to wonder, anyone any ideas about how to probe a scsi
disk for the number entries in its spare table?

I would like to get a bit or warning of impending failure
(if I can) next time.

-Steve

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [9fans] disk error info
  2006-01-19 20:37 [9fans] disk error info Steve Simon
@ 2006-01-20  8:50 ` Nigel Roles
  0 siblings, 0 replies; 2+ messages in thread
From: Nigel Roles @ 2006-01-20  8:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Steve Simon wrote:

>Ok,
>
>After recovering from my seccond disk failure in a month -
>both part of the same batch so I suspose it makes sense.
>
>I began to wonder, anyone any ideas about how to probe a scsi
>disk for the number entries in its spare table?
>
>I would like to get a bit or warning of impending failure
>(if I can) next time.
>
>-Steve
>
>  
>
You can read the grown defect table; it's a standard command. The 
formats are drive specific, but you can find out how many have been used 
and the rate of increase of grown defects is significant. You have to 
decide how many is too many of course.

You should also ensure that automatic replacement on read and automatic 
replacement on write are enabled in the appropriate mode page so that 
the drive does repair itself. The spares are cylinder/cylinder group 
associated to limit latency. As a result, a big error in one cylinder 
(group) can cause the drive to run out of replacements, even though 
there are spares in other cylinder (group)s.

An alternative is SMART monitoring. This started on IDE drives and only 
exists as an ATA standard, but I think there is a de facto way of asking 
some SCSI drives the same questions. You can schedule short and long 
self-tests and monitor the results. This gives you much more to do with 
bit error rates, seek failures, mileage, that kind of stuff. The 
self-tests run in the background so (in principle) can safely be invoked 
on a live system. I do this on my Linux file server, but then it isn't 
really very busy in the small hours.

Check out smartmontools for Linux.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-01-20  8:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-19 20:37 [9fans] disk error info Steve Simon
2006-01-20  8:50 ` Nigel Roles

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).