* [9fans] disk error info
@ 2006-01-19 20:37 Steve Simon
2006-01-20 8:50 ` Nigel Roles
0 siblings, 1 reply; 2+ messages in thread
From: Steve Simon @ 2006-01-19 20:37 UTC (permalink / raw)
To: 9fans
Ok,
After recovering from my seccond disk failure in a month -
both part of the same batch so I suspose it makes sense.
I began to wonder, anyone any ideas about how to probe a scsi
disk for the number entries in its spare table?
I would like to get a bit or warning of impending failure
(if I can) next time.
-Steve
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [9fans] disk error info
2006-01-19 20:37 [9fans] disk error info Steve Simon
@ 2006-01-20 8:50 ` Nigel Roles
0 siblings, 0 replies; 2+ messages in thread
From: Nigel Roles @ 2006-01-20 8:50 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Steve Simon wrote:
>Ok,
>
>After recovering from my seccond disk failure in a month -
>both part of the same batch so I suspose it makes sense.
>
>I began to wonder, anyone any ideas about how to probe a scsi
>disk for the number entries in its spare table?
>
>I would like to get a bit or warning of impending failure
>(if I can) next time.
>
>-Steve
>
>
>
You can read the grown defect table; it's a standard command. The
formats are drive specific, but you can find out how many have been used
and the rate of increase of grown defects is significant. You have to
decide how many is too many of course.
You should also ensure that automatic replacement on read and automatic
replacement on write are enabled in the appropriate mode page so that
the drive does repair itself. The spares are cylinder/cylinder group
associated to limit latency. As a result, a big error in one cylinder
(group) can cause the drive to run out of replacements, even though
there are spares in other cylinder (group)s.
An alternative is SMART monitoring. This started on IDE drives and only
exists as an ATA standard, but I think there is a de facto way of asking
some SCSI drives the same questions. You can schedule short and long
self-tests and monitor the results. This gives you much more to do with
bit error rates, seek failures, mileage, that kind of stuff. The
self-tests run in the background so (in principle) can safely be invoked
on a live system. I do this on my Linux file server, but then it isn't
really very busy in the small hours.
Check out smartmontools for Linux.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-01-20 8:50 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-19 20:37 [9fans] disk error info Steve Simon
2006-01-20 8:50 ` Nigel Roles
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).