[9fans] i/o error reading large sata disk

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] i/o error reading large sata disk
@ 2008-04-06  2:50 sqweek
  2008-04-06  3:58 ` erik quanstrom
  0 siblings, 1 reply; 5+ messages in thread
From: sqweek @ 2008-04-06  2:50 UTC (permalink / raw)
  To: 9fans

 I went to set up a file server the other day - installed w/
fossil+venti and the install went fine but after booting up it wasn't
long before venti started spamming i/o errors. Now I'm using second
hand disks, so I figure the one I installed to might be stuffed and
boot to the livecd to verify. I partition the disk in half and dd from
one partition to the other, and sure enough it gets an io error on
record 16777212 (w/ 8192 blocksize).
 16777212 is awful close to 2^24, but I can't make any sense of why
that would be a limit. If you take the 8192 into account you're around
byte 2^37 which makes even less sense, so I shrug it off as a bad disk
and test the second disk... which gets a read error in the exact same
spot.
 So, now I'm suspicious ;)
 I can reproduce it quickly with:

term% cat /dev/sdE0/ctl
inquiry ST3300831AS
config 0C5A capabilities 2F00 dma 00550020 dmactl 00000000 rwm 16
rwmctl 0 lba48always off
geometry 586072368 512 16383 16 63
part data 0 586072368
part plan9 63 293025600
part plan9.1 293041665 586067265
term% dd -if /dev/sdE0/plan9 -of /dev/sdE0/plan9.1 -bs 8192 -iseek 16777211
read: i/o error
1+0 records in
1+0 records out

 Using blocks of 512 bytes I can narrow it down to the 268435393th
block on the plan9 partition, which is the block that starts at byte
2^37 (in terms of absolute disk position). I can access blocks before
and after it fine, it's just this one... sector? Same story with the
second disk.

 The SATA controller details from pci -v:
3.5.0:    disk 01.80.00 1095/3114 10 0:0000bc01 16 1:0000b401 16
2:0000b001 16 3:0000ac01 16 4:0000a801 16 5:feafec00 1024
        Silicon Image, Inc. Sil 3114 SATALink/SATARaid Controller

 Anyone think they know what's going on?
-sqweek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] i/o error reading large sata disk
  2008-04-06  2:50 [9fans] i/o error reading large sata disk sqweek
@ 2008-04-06  3:58 ` erik quanstrom
  2008-04-06 16:19   ` sqweek
  2008-04-22 13:56   ` erik quanstrom
  0 siblings, 2 replies; 5+ messages in thread
From: erik quanstrom @ 2008-04-06  3:58 UTC (permalink / raw)
  To: 9fans

> term% cat /dev/sdE0/ctl
> inquiry ST3300831AS
> config 0C5A capabilities 2F00 dma 00550020 dmactl 00000000 rwm 16
> rwmctl 0 lba48always off
> geometry 586072368 512 16383 16 63
> part data 0 586072368
> part plan9 63 293025600
> part plan9.1 293041665 586067265
> term% dd -if /dev/sdE0/plan9 -of /dev/sdE0/plan9.1 -bs 8192 -iseek 16777211
> read: i/o error

i think i see the problem.  we're off by one bit.

in your case, i calculate h = 0xf.  but since the head shares bits
with the device, there just isn't enough room for a head > 7.  i think you
can fix this problem by

(a) setting lba48always on
	; echo llba48always on>/dev/sd??/ctl
if this doesn't work, then i'm wrong.

(b) (the proper fix). apply this change to sdata

/n/sources/plan9//sys/src/9/pc/sdata.c:1344,1350 - sdata.c:1344,1350
  };

  static int
- atageniostart(Drive* drive, vlong lba)
+ atageniostart(Drive* drive, uvlong lba)
  {
  	Ctlr *ctlr;
  	uchar cmd;
/n/sources/plan9//sys/src/9/pc/sdata.c:1351,1357 - sdata.c:1351,1357
  	int as, c, cmdport, ctlport, h, len, s, use48;

  	use48 = 0;
- 	if((drive->flags&Lba48always) || (lba>>28) || drive->count > 256){
+ 	if((drive->flags&Lba48always) || (lba>>27) || drive->count > 256){
  		if(!(drive->flags & Lba48))
  			return -1;
  		use48 = 1;
/n/sources/plan9//sys/src/9/pc/sdata.c:1359,1365 - sdata.c:1359,1365
  	}
  	else if(drive->dev & Lba){
  		c = (lba>>8) & 0xFFFF;
- 		h = (lba>>24) & 0x0F;
+ 		h = (lba>>24) & 7;	/* tautology */
  		s = lba & 0xFF;
  	}

there's also a problem with disk > 2GB.  but that's not your problem.

- erik



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] i/o error reading large sata disk
  2008-04-06  3:58 ` erik quanstrom
@ 2008-04-06 16:19   ` sqweek
       [not found]     ` <b0541197023336a814de597a7abcaa66@quanstro.net>
  2008-04-22 13:56   ` erik quanstrom
  1 sibling, 1 reply; 5+ messages in thread
From: sqweek @ 2008-04-06 16:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Apr 6, 2008 at 11:58 AM, erik quanstrom <quanstro@quanstro.net> wrote:
>  (a) setting lba48always on
>         ; echo llba48always on>/dev/sd??/ctl
>  if this doesn't work, then i'm wrong.

 Thanks erik, that does the trick. Didn't get around to trying the
patch yet, I'll be in touch.
-sqweek


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] i/o error reading large sata disk
       [not found]     ` <b0541197023336a814de597a7abcaa66@quanstro.net>
@ 2008-04-14  4:51       ` sqweek
  0 siblings, 0 replies; 5+ messages in thread
From: sqweek @ 2008-04-14  4:51 UTC (permalink / raw)
  To: erik quanstrom; +Cc: 9fans

 Yes, just got to it last night. Sorry about the delay, got
interrupted last week with a notice from my real estate and had to
reassess my finances.
 Anyway, it seems to work - using the patched kernel I can
successfully read the block I was having trouble with without turning
lba48always on.
 I am seeing freezes... first boot it froze during termrc, second was
fine, third boot it froze as I tried to run kbmap... well, it didn't
freeze - stats kept on running happily, letting me know about the
>1000 syscalls per second (which I'm seeing consistently every boot)
and faces kept on updating the time. I could move the mouse, but not
interact with rio at all. ^T^Tp still worked, but I forget the rest of
those debugging shortcuts and it was past my bedtime so I just left it
running, but it was in the same state when I woke up this morning.
Rebooted with ^T^Tr and it came up fine.
 I'm not sure if this is related to the patch or just some other
incompatibility... still, it's progress compared to my last two
install attempts, in which venti choked on the second boot with a
whole lot of I/O errors.
-sqweek

PS. Any idea why your reply showed up here in an anonymous attachment
instead of the email body, erik?

On Sat, Apr 12, 2008 at 7:40 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> get a chance to test this yet?
>
> - erik
>
> ---------- Forwarded message ----------
> From: sqweek <sqweek@gmail.com>
> To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
> Date: Mon, 7 Apr 2008 00:19:00 +0800
> Subject: Re: [9fans] i/o error reading large sata disk
> On Sun, Apr 6, 2008 at 11:58 AM, erik quanstrom <quanstro@quanstro.net> wrote:
>  >  (a) setting lba48always on
>  >         ; echo llba48always on>/dev/sd??/ctl
>  >  if this doesn't work, then i'm wrong.
>
>   Thanks erik, that does the trick. Didn't get around to trying the
>  patch yet, I'll be in touch.
>  -sqweek
>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] i/o error reading large sata disk
  2008-04-06  3:58 ` erik quanstrom
  2008-04-06 16:19   ` sqweek
@ 2008-04-22 13:56   ` erik quanstrom
  1 sibling, 0 replies; 5+ messages in thread
From: erik quanstrom @ 2008-04-22 13:56 UTC (permalink / raw)
  To: 9fans

> > read: i/o error
>
> i think i see the problem.  we're off by one bit.
>
[...]
> /n/sources/plan9//sys/src/9/pc/sdata.c:1344,1350 - sdata.c:1344,1350
>   };
>
>   static int
> - atageniostart(Drive* drive, vlong lba)
> + atageniostart(Drive* drive, uvlong lba)
>   {
>   	Ctlr *ctlr;
>   	uchar cmd;
> /n/sources/plan9//sys/src/9/pc/sdata.c:1351,1357 - sdata.c:1351,1357
>   	int as, c, cmdport, ctlport, h, len, s, use48;
>
>   	use48 = 0;
> - 	if((drive->flags&Lba48always) || (lba>>28) || drive->count > 256){
> + 	if((drive->flags&Lba48always) || (lba>>27) || drive->count > 256){
>   		if(!(drive->flags & Lba48))
>   			return -1;
>   		use48 = 1;

while this does fix the problem, it's sloppy.  the problem is actually
that ata reports device sizes as number of sectors+1.  it also does not follow
the tradition used for sector counts, where 0 sector count = all-ones+1 = 256.
this is because removable media drives with no media (eg cdroms) give size = 0.
therefore if under any ata addressing scheme, the all-ones sector is not accessable.
credit to sam hopkins for pointing this out.

/n/sources/plan9//sys/src/9/pc/sdata.c:1344,1350 - sdata.c:1344,1350
  };

+ enum{
+ 	Last28	= (1<<28) - 1 - 1,
+ };
+
  static int
- atageniostart(Drive* drive, vlong lba)
+ atageniostart(Drive* drive, uvlong lba)
  {
  	Ctlr *ctlr;
  	uchar cmd;
/n/sources/plan9//sys/src/9/pc/sdata.c:1351,1357 - sdata.c:1355,1361
  	int as, c, cmdport, ctlport, h, len, s, use48;

  	use48 = 0;
- 	if((drive->flags&Lba48always) || (lba>>28) || drive->count > 256){
+ 	if((drive->flags&Lba48always) || lba > Last28 || drive->count > 256){
  		if(!(drive->flags & Lba48))
  			return -1;
		use48 = 1;


- erik


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-04-22 13:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-06  2:50 [9fans] i/o error reading large sata disk sqweek
2008-04-06  3:58 ` erik quanstrom
2008-04-06 16:19   ` sqweek
     [not found]     ` <b0541197023336a814de597a7abcaa66@quanstro.net>
2008-04-14  4:51       ` sqweek
2008-04-22 13:56   ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).