9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Venti problem...?
@ 2010-04-30  8:47 Pavel Klinkovsky
  0 siblings, 0 replies; 3+ messages in thread
From: Pavel Klinkovsky @ 2010-04-30  8:47 UTC (permalink / raw)
  To: 9fans

Hi all,

I installed Erik's 9atom.iso (official distribution fails to perform
the bios boot):
- fossil+venti
- disk structure (proposed by installation process)
100 MB 9fat
512 B nvram
38057 MB fossil
190288 MB arenas
9514 MB isect
512 MB swap

After the reboot I can see:
2010/0427 20:53:40 err 4: read /dev/sdC0/isect offset 0xe70c8000 count
65536 buf 20ae000 returned 0:
/boot/venti: part /dev/sdC0/isect addr 0xe6c68000: icachewritesect
readpart: read /dev/sdC0/isect offset 0xe70c8000 count 65536 buf
20ae000 returned 0:

If I install fossil-only version, everything is ok.

Any idea?

Thanks in advance.

Pavel



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [9fans] Venti problem...?
@ 2010-05-07 15:41 erik quanstrom
  0 siblings, 0 replies; 3+ messages in thread
From: erik quanstrom @ 2010-05-07 15:41 UTC (permalink / raw)
  To: 9fans

> > After the reboot I can see:
> > 2010/0427 20:53:40 err 4: read /dev/sdC0/isect offset 0xe70c8000 count
> > 65536 buf 20ae000 returned 0:
> > /boot/venti: part /dev/sdC0/isect addr 0xe6c68000: icachewritesect
> > readpart: read /dev/sdC0/isect offset 0xe70c8000 count 65536 buf
> > 20ae000 returned 0:
>
> i don't know why this took me so long to recognize, but after this
> was posted i realized we're not getting proper error reporting from sdata
> the partition read should never return 0.  [...]

this has been sorted out.  -1 is now returned (Eio).  the underlying
problem was a bad spot on the disk.

many thanks to pavel for the persistance to help get this tracked down.

add'l feature for 9atom: full intel 82575/82576/82580 1/2/4x gbe support,
including pxe.

- erik



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [9fans] Venti problem...?
       [not found] <55a4c682-f418-4fa0-b3c1-bb37f661773e@y36g2000yqm.googlegroups.co>
@ 2010-05-02 19:37 ` erik quanstrom
  0 siblings, 0 replies; 3+ messages in thread
From: erik quanstrom @ 2010-05-02 19:37 UTC (permalink / raw)
  To: 9fans

> After the reboot I can see:
> 2010/0427 20:53:40 err 4: read /dev/sdC0/isect offset 0xe70c8000 count
> 65536 buf 20ae000 returned 0:
> /boot/venti: part /dev/sdC0/isect addr 0xe6c68000: icachewritesect
> readpart: read /dev/sdC0/isect offset 0xe70c8000 count 65536 buf
> 20ae000 returned 0:

i don't know why this took me so long to recognize, but after this
was posted i realized we're not getting proper error reporting from sdata
the partition read should never return 0.  the error is missing.  i've spent
a few days looking at the problem, and i believe this is due to the convoluted
way SDreq->status interacts with sdsetsense and the return value of
SDifc->rio and scsibio.

(it's not clear from this error message what the problem really is.)

i took a few minutes up to detail the fix, which i'm currently testing.

- erik

---
The Problem Of the Statuses

Devsd(3) is built around two interfaces: raw commands (SDifc->rio),
and regular reads and writes (SDifc->bio). [http://iwp9.org/papers/atazz.pdf,
http://www.quanstro.net/plan9/sdtalk].  While it is possible to
directly implement a bio command that just does reads and writes,
drivers like sdata(3) use the raw interface in ../$arch/sdscsi.c.
Raw commands are executed in 3 phases.
1.	the CDB/FIS is written to /dev/sdXX/raw
2.	the data is written or read from /dev/sdXX/raw
	(a 0 byte write is used if no data)
3.	the status/FIS is read back

During step 2, devsd sends the CDB or FIS to the device-specific
driver via SDifc->rio and the operation is performed.  The tracking
structure fora raw request is called a SDreq.  If this operation is
unsuccessful, there are 3 ways the error can be returned:
1. 	by calling error(),
2.	by returning a status which is not SDok,
3.	by setting SDreq->status.

In the first case, the operation is aborted an an error is immediately
returned.  The third phase isn't executed.  If there is any sense data,
it can't be retrieved.  The second case is the same as the first, except
the return value rather than error() is used.

The third case is different.  It does not abort the 3-phase raw
command.  And the sense data may be retrieved via the normal methods.
Therefore, it would seem to make sense to use the 3d option unless the
drive is so sideways that continuing makes no sense.

The function used to set the sense data for the 3d return option
is sdsetsense.  The arguments are
	int sdsetsense(SDreq *r, int status, int key, int asc, int asq)
To avoid aborting raw transactions, sdsetsense returns SDok and
sets r->status to the given status if status = SDcheck (CHECK
CONDITION).

Well, groovy.  But what's the problem?  The problem is that
the bio function, scsibio used for normal (reads and writes to
#S/sdXX/data) i/o to many scsi devices only checks the return status, not
r->status.  Unfortunately, sdsetsense sets that to SDok.

The solution currently taken was to modify scsibio to check that
r->status is also okay.  This should fix a few cases in the ata,
ahci and marvell 50xx drivers where a read or write could return
0 instead of an error.

I think it also make sense to modify sdsetsense to always return
SDok in the future.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-05-07 15:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-30  8:47 [9fans] Venti problem...? Pavel Klinkovsky
     [not found] <55a4c682-f418-4fa0-b3c1-bb37f661773e@y36g2000yqm.googlegroups.co>
2010-05-02 19:37 ` erik quanstrom
2010-05-07 15:41 erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).