9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] atagenioretry: nondma
@ 2013-05-19  4:32 arisawa
  2013-05-19 14:13 ` erik quanstrom
  0 siblings, 1 reply; 8+ messages in thread
From: arisawa @ 2013-05-19  4:32 UTC (permalink / raw)
  To: 9fans

Hello,

yesterday day I had error messages below:

automatic dump Fri May 17 05:00:07 2013
automatic dump Sat May 18 05:00:08 2013
command 30
data f7aea8a0 limit f7aec0a0 dlen 16384 status 0 error 0
lba 163565804 -> 163565804, count 32 -> 32 (32)
atagenioretry: nondma w:163565804:32 @163565804:32
wrenwrite: error on w"/dev/sdD0/fsworm"(2315949): i/o error 040804 163565804
mirrwrite: error at w"/dev/sdD0/fsworm" block 2315949
atagenioretry: nondma w:163565836:32 @163565836:32
wrenwrite: error on w"/dev/sdD0/fsworm"(2315950): i/o error 040804 163565836
mirrwrite: error at w"/dev/sdD0/fsworm" block 2315950
atagenioretry: nondma w:163565900:32 @163565900:32
wrenwrite: error on w"/dev/sdD0/fsworm"(2315952): i/o error 040804 163565900
mirrwrite: error at w"/dev/sdD0/fsworm" block 2315952
atagenioretry: nondma w:163565932:32 @163565932:32
...
...

it seems that wrenwrite/mirrwrite errors from cwfs are induced by "atagenioretry: nondma"
which come from kernel (/sys/src/9/pc/sdide.c).

have you ever seen messages like this?

I suspected my sata drive (/dev/sdD0) has broken. but it seems the drive is alive.

Kenji Arisawa




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] atagenioretry: nondma
  2013-05-19  4:32 [9fans] atagenioretry: nondma arisawa
@ 2013-05-19 14:13 ` erik quanstrom
  2013-05-19 15:55   ` cinap_lenrek
  0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2013-05-19 14:13 UTC (permalink / raw)
  To: 9fans

On Sun May 19 00:34:14 EDT 2013, arisawa@ar.aichi-u.ac.jp wrote:
> Hello,
>
> yesterday day I had error messages below:
>
> automatic dump Fri May 17 05:00:07 2013
> automatic dump Sat May 18 05:00:08 2013
> command 30
> data f7aea8a0 limit f7aec0a0 dlen 16384 status 0 error 0
> lba 163565804 -> 163565804, count 32 -> 32 (32)
> atagenioretry: nondma w:163565804:32 @163565804:32
> wrenwrite: error on w"/dev/sdD0/fsworm"(2315949): i/o error 040804 163565804
> mirrwrite: error at w"/dev/sdD0/fsworm" block 2315949
> atagenioretry: nondma w:163565836:32 @163565836:32
> wrenwrite: error on w"/dev/sdD0/fsworm"(2315950): i/o error 040804 163565836
> mirrwrite: error at w"/dev/sdD0/fsworm" block 2315950
> atagenioretry: nondma w:163565900:32 @163565900:32
> wrenwrite: error on w"/dev/sdD0/fsworm"(2315952): i/o error 040804 163565900
> mirrwrite: error at w"/dev/sdD0/fsworm" block 2315952
> atagenioretry: nondma w:163565932:32 @163565932:32

iirc, there are three big issues i've seen with the ide driver.
1.  incorrrect pio timings,
2.  missed interrupts
3.  assumption that dma doesn't work.

the 9atom driver addresses all three of these issues.
but you may find it easier to chip away at the issue.
correcting the timings and adding the vid/did.  or
use the ahci driver, which tends to be more reliable
on newer hardware.  here's the ide source for comparison:

	/n/atom/plan9/sys/src/9/pc/sdide.c

what does your fs configuration string look like?

- erik



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] atagenioretry: nondma
  2013-05-19 14:13 ` erik quanstrom
@ 2013-05-19 15:55   ` cinap_lenrek
  2013-05-19 17:47     ` erik quanstrom
  0 siblings, 1 reply; 8+ messages in thread
From: cinap_lenrek @ 2013-05-19 15:55 UTC (permalink / raw)
  To: 9fans

9front uses a (older) variant of the 9atom ide driver.

from arisawas's output, it seems that the write command times out
in atagenio():

		iowait(drive, 30*1000, 0);
		if(!ctlr->done){
			/*
			 * What should the above timeout be? In
			 * standby and sleep modes it could take as
			 * long as 30 seconds for a drive to respond.
			 * Very hard to get out of this cleanly.
			 */
			atadumpstate(drive, r, lba, count);
			ataabort(drive, 1);
			qunlock(ctlr);
			return atagenioretry(drive, r, lba, count);
		}

atagenioretry() will just cause the command to be retried in pio mode
when it previously was failing in dma mode. think this what erik means
with "assumes dma doesnt work". dma is already off so we fail the
request.

i doubt this is missed interrupts. iowait() calls the interrupt handler
itself before giving up. you can even check, as the driver keeps statistics
about missed interrupts in the ctl file.

the drive seems to not complete the command within the timeout.

what i would try is to check if reading the offending sectors produce
i/o errors as well with dd.

where these sectors ever written before? i dd /dev/zero over the whole
drive before initializing filesystems on it. was something similar done
here as well?

could this be a prblem with the drive going into standby mode and then
the next command taking too long to complete because drive is slow to
spin up?

--
cinap



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] atagenioretry: nondma
  2013-05-19 15:55   ` cinap_lenrek
@ 2013-05-19 17:47     ` erik quanstrom
  2013-05-19 17:52       ` erik quanstrom
  0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2013-05-19 17:47 UTC (permalink / raw)
  To: 9fans

> i doubt this is missed interrupts. iowait() calls the interrupt handler
> itself before giving up. you can even check, as the driver keeps statistics
> about missed interrupts in the ctl file.

it can be.  previously, on e.g. intel devices, we did not properly
handle dma interrupts.  this lead to what appeard to be missed
interrupts.  by checking for the completion condition after timeout,
this case can be eliminated from consideration.  i/o is slow, but
i've done what i can to make it work as well as possible.  since it's
always possible that we have new hardware with different bugs that
present the same way.

> what i would try is to check if reading the offending sectors produce
> i/o errors as well with dd.
>
> where these sectors ever written before? i dd /dev/zero over the whole
> drive before initializing filesystems on it. was something similar done
> here as well?

i didn't see which model drive this is, but this is doubtful.  properly functioning
modern drives do not fail writes unless they have exhausted the
reallocation pool.  being previously written should not make any difference.

9atom has smart(8), and "smart -tvp" should tell you if any drives failed.
drives with no reallocations left will exhihbit smart failure.

> could this be a prblem with the drive going into standby mode and then
> the next command taking too long to complete because drive is slow to
> spin up?

unfortunately, the ide driver doesn't have full support for PUIS.
it's not clear to me what all sata-emulating-ide firmware does in
these cases.

the (9atom) ahci driver should directly handle ALPM, PUIS and
other power-saving bits added in ahci 1.3, so i prefer to run ahci
whenever possible.

- erik



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] atagenioretry: nondma
  2013-05-19 17:47     ` erik quanstrom
@ 2013-05-19 17:52       ` erik quanstrom
  2013-05-20 22:10         ` arisawa
  0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2013-05-19 17:52 UTC (permalink / raw)
  To: 9fans

> the (9atom) ahci driver should directly handle ALPM, PUIS and
> other power-saving bits added in ahci 1.3, so i prefer to run ahci
> whenever possible.

that is, so one doesn't have to worry about power-savings techniques
messing things up.

- erik



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] atagenioretry: nondma
  2013-05-19 17:52       ` erik quanstrom
@ 2013-05-20 22:10         ` arisawa
  2013-05-21  1:40           ` erik quanstrom
  0 siblings, 1 reply; 8+ messages in thread
From: arisawa @ 2013-05-20 22:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

thanks for attention to my issue.

it seems more info is needed.

(a) the motherboard
GA-H61M-USB3-B3 (v2.0)
(b) kernel
9front
(c) fs configuration
cwfs
pseudo RAID1 configuration.
filsys main c(/dev/sdC0/fscache){(/dev/sdC0/fsworm)(/dev/sdD0/fsworm)}
these hdds are 2.5" sata 160GB and 320GB respectively.
(d) standby?
not in standby mode.
(e) dma?
dma was off
dmactl 00000000	# from both /dev/sdC0/ctl and /dev/sdD0/ctl


when I found the error message, I suspected the disk is dead.
so I tried if the disk (/dev/sdD0) was readable.
for example I tried to read /dev/sdD0/nvram, but was unable to read.
I powered off the machine, dismounted the disk, and attached to other machine.
I retried to read the disk. the result was OK.

more information:

term% 8.inspect s		# 8.inspect is my tool that shows superblocks on /dev/sdC0/fsworm
         2          5
         5      69908
     69908      85793
     85793     104695
	...
	...
   2315825    2315948
   2315948    2316035		# reported error is block 2315949, 2315950 ...
   2316035    2316133
   2316133    2316366

term% ls /n/dump/2013
/n/dump/2013/0121
/n/dump/2013/0127
/n/dump/2013/0128
...
...
/n/dump/2013/0517
/n/dump/2013/0518			# the dump that report the error
/n/dump/2013/0519
/n/dump/2013/0520
term% 


the following result is from fsworm blocks of the disk that was in trouble.
2315947		readable, OK, last dump of /2013/0517
2315948		readable but garbage , that should be the superblock of /2013/0518
2315949		readable but garbage
2315950		readable but garbage
...

Kenji Arisawa




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] atagenioretry: nondma
  2013-05-20 22:10         ` arisawa
@ 2013-05-21  1:40           ` erik quanstrom
  2013-05-22 23:18             ` arisawa
  0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2013-05-21  1:40 UTC (permalink / raw)
  To: 9fans

> filsys main c(/dev/sdC0/fscache){(/dev/sdC0/fsworm)(/dev/sdD0/fsworm)}
> these hdds are 2.5" sata 160GB and 320GB respectively.

i would recommend switching to ahci mode in bios, if that is
possible and upgrading your driver.  if you can't switch from ide
emulation, entering your vid and did (from pci|grep disk) in the ide
driver would be helpful, as dma mode is surprisingly more
forgiving.

- erik



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] atagenioretry: nondma
  2013-05-21  1:40           ` erik quanstrom
@ 2013-05-22 23:18             ` arisawa
  0 siblings, 0 replies; 8+ messages in thread
From: arisawa @ 2013-05-22 23:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

thank you erik,

I have switched to ahci mode.

Kenji Arisawa

On 2013/05/21, at 10:40, erik quanstrom <quanstro@quanstro.net> wrote:

> i would recommend switching to ahci mode in bios, if that is
> possible and upgrading your driver.  if you can't switch from ide
> emulation, entering your vid and did (from pci|grep disk) in the ide
> driver would be helpful, as dma mode is surprisingly more
> forgiving.
>
> - erik




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-05-22 23:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-19  4:32 [9fans] atagenioretry: nondma arisawa
2013-05-19 14:13 ` erik quanstrom
2013-05-19 15:55   ` cinap_lenrek
2013-05-19 17:47     ` erik quanstrom
2013-05-19 17:52       ` erik quanstrom
2013-05-20 22:10         ` arisawa
2013-05-21  1:40           ` erik quanstrom
2013-05-22 23:18             ` arisawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).