From mboxrd@z Thu Jan 1 00:00:00 1970 To: 9fans@cse.psu.edu Subject: Re: [9fans] venti wrarena i/o errors From: "Russ Cox" Date: Tue, 4 Dec 2007 19:29:51 -0500 In-Reply-To: <8151ba5cb4868bc8265e2bd66552dad6@tombob.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Message-Id: <20071205002931.A4C571E8C1C@holo.morphisms.net> Topicbox-Message-UUID: 11b43e98-ead3-11e9-9d60-3106f5b1d025 > lock 0xf0c77cf8 loop key 0xdeaddead pc 0xf01c846f held by pc 0xf01c846f proc 307 > 295: venti pc f01da773 dbgpc 203db Pread (Running) ut 1 st 537 bss 4342000 qpc f01be14f nl 0 nd 0 lpc f01c57a1 pri 3 > 307: venti pc f01cded7 dbgpc 203db Pread (Ready) ut 332 st 1137 bss 4342000 qpc f013ea9a nl 2 nd 0 lpc f01c1026 pri 0 > lock 0xf0c77cf8 loop key 0xdeaddead pc 0xf01c846f held by pc 0xf01c846f proc 307 > 297: venti pc f01cda6c dbgpc 203db Pread (Running) ut 79 st 553 bss 4342000 qpc f01c8d59 nl 0 nd 0 lpc f01c57a1 pri 3 > 307: venti pc f01cded7 dbgpc 203db Pread (Ready) ut 332 st 1137 bss 4342000 qpc f013ea9a nl 2 nd 0 lpc f01c108e pri 0 Can you run: % acid /path/to/your/kernel acid: src(0xf01c846f) and let us know what that prints? > 2007/1204 22:24:20 err 4: write /dev/sdC0/isect offset 0x293ae000 count 65536 buf 337e000 returned -1: i/o error > venti/venti: part /dev/sdC0/isect addr 0x2922e000: icachewritesect writepart: write /dev/sdC0/isect offset 0x293ae000 count 65536 buf 337e000 returned -1: i/o error > 2007/1204 22:24:21 err 4: read /dev/sdC0/isect offset 0x29a2e000 count 65536 buf 31fe000 returned -1: i/o error It looks very much like your disk has bad sectors or something like that. Try running: dd -bs 65536 < /dev/sdC0/isect >/dev/null dd -bs 65536 < /dev/zero >/dev/sdC0/isect dd -bs 65536 < /dev/sdC0/isect >/dev/null If the first dd fails, that would at least exonerate venti. Either way, the second dd might get rid of any bad sectors by overwriting them and letting the disk remap to some of its reserve sectors (but it's probably time to replace the disk anyway). You could also try running the dd immediately after venti fails, in case it's something like the disk getting too hot. I'd feel more confident your disk was bad if I understood the lock loop above. If the lock loop is something "impossible" then it could be that the disk controller is just screwing with memory. It's also possible that your disk cables can't handle the dma speeds that Plan 9 is trying or that they are otherwise just not good enough. The SP1613N looks like a laptop-sized disk, though, so maybe there isn't even a cable! Russ