From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 2 May 2010 15:37:20 -0400 To: 9fans@9fans.net Message-ID: <32e2d6d469b3b291c16b92b2ce7bdbb1@kw.quanstro.net> In-Reply-To: <55a4c682-f418-4fa0-b3c1-bb37f661773e@y36g2000yqm.googlegroups.co> References: <55a4c682-f418-4fa0-b3c1-bb37f661773e@y36g2000yqm.googlegroups.co> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Venti problem...? Topicbox-Message-UUID: 172fc20e-ead6-11e9-9d60-3106f5b1d025 > After the reboot I can see: > 2010/0427 20:53:40 err 4: read /dev/sdC0/isect offset 0xe70c8000 count > 65536 buf 20ae000 returned 0: > /boot/venti: part /dev/sdC0/isect addr 0xe6c68000: icachewritesect > readpart: read /dev/sdC0/isect offset 0xe70c8000 count 65536 buf > 20ae000 returned 0: i don't know why this took me so long to recognize, but after this was posted i realized we're not getting proper error reporting from sdata the partition read should never return 0. the error is missing. i've spent a few days looking at the problem, and i believe this is due to the convoluted way SDreq->status interacts with sdsetsense and the return value of SDifc->rio and scsibio. (it's not clear from this error message what the problem really is.) i took a few minutes up to detail the fix, which i'm currently testing. - erik --- The Problem Of the Statuses Devsd(3) is built around two interfaces: raw commands (SDifc->rio), and regular reads and writes (SDifc->bio). [http://iwp9.org/papers/atazz.pdf, http://www.quanstro.net/plan9/sdtalk]. While it is possible to directly implement a bio command that just does reads and writes, drivers like sdata(3) use the raw interface in ../$arch/sdscsi.c. Raw commands are executed in 3 phases. 1. the CDB/FIS is written to /dev/sdXX/raw 2. the data is written or read from /dev/sdXX/raw (a 0 byte write is used if no data) 3. the status/FIS is read back During step 2, devsd sends the CDB or FIS to the device-specific driver via SDifc->rio and the operation is performed. The tracking structure fora raw request is called a SDreq. If this operation is unsuccessful, there are 3 ways the error can be returned: 1. by calling error(), 2. by returning a status which is not SDok, 3. by setting SDreq->status. In the first case, the operation is aborted an an error is immediately returned. The third phase isn't executed. If there is any sense data, it can't be retrieved. The second case is the same as the first, except the return value rather than error() is used. The third case is different. It does not abort the 3-phase raw command. And the sense data may be retrieved via the normal methods. Therefore, it would seem to make sense to use the 3d option unless the drive is so sideways that continuing makes no sense. The function used to set the sense data for the 3d return option is sdsetsense. The arguments are int sdsetsense(SDreq *r, int status, int key, int asc, int asq) To avoid aborting raw transactions, sdsetsense returns SDok and sets r->status to the given status if status = SDcheck (CHECK CONDITION). Well, groovy. But what's the problem? The problem is that the bio function, scsibio used for normal (reads and writes to #S/sdXX/data) i/o to many scsi devices only checks the return status, not r->status. Unfortunately, sdsetsense sets that to SDok. The solution currently taken was to modify scsibio to check that r->status is also okay. This should fix a few cases in the ata, ahci and marvell 50xx drivers where a read or write could return 0 instead of an error. I think it also make sense to modify sdsetsense to always return SDok in the future.