From mboxrd@z Thu Jan 1 00:00:00 1970 To: 9fans@cse.psu.edu From: anothy@cosym.net MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Message-Id: <20020402203608.1377C19988@mail.cse.psu.edu> Subject: [9fans] file server trouble Date: Tue, 2 Apr 2002 15:36:00 -0500 Topicbox-Message-UUID: 7223c4dc-eaca-11e9-9e20-41e7f4b1d025 my file server's not the least bit happy. details follow. please consider this a distress call and plea for help! the first indication of trouble is the message: WORM SUPER BLOCK READ FAILED printed upon bootup. this sounds very bad to me. the file server does, however, boot, accept connections, and server files as expected. for a little while, anyway. things go bad when i try to actually _do_ things. every time i boot my terminal, i get a message of the form: ark: il: allocating il!10.0.1.105!22788 fworm: read 151856 stack trace of 18 0x8012325b 0x801230c5 0x80105e57 0x80105e57 0x8011ad3a 0x8012586a 0x8011f19d 0x80116eee 0x80106160 0x80105e57 0x80105e57 0x80105e57 0x801132cb 0x8011f49c 0x80126de0 0x80106160 0x80105e57 0x80117ad6 0x80106160 0x80106160 0x80106160 0x801099b9 0x80105e57 0x80105e57 0x80129986 0x801314f0 0x80130c82 0x801321fc 0x80117aa1 0x80117aa1 0x8011772f 0x801073ff 0x80110008 0x8010129d 0x801011f0 0x80101194 0x80123dee 0x801241a5 0x80123cf0 0x8012a7c7 0x80126944 0x80117785 0x80105f0e 0x8012a728 0x80105e57 0x80105e57 0x80129e65 0x80105e57 0x80105eeb 0x80105f0e 0x80105e57 1416 stack used out of 4000 panic: newqid: super block cpu 0 exiting things from the "stack trace of" on vary, so i've included another below ark: il: allocating il!10.0.1.105!25780 fworm: read 151856 stack trace of 13 0x8012325b 0x801230c5 0x80105e57 0x80105e57 0x8011ad3a 0x8012586a 0x8011f19d 0x80116eee 0x80106160 0x80105e57 0x80105e57 0x80105e57 0x801132cb 0x8011f49c 0x80126de0 0x80106160 0x80105e57 0x80117ad6 0x80106160 0x80106160 0x80106160 0x801099b9 0x80105e57 0x80105e57 0x80129986 0x801314f0 0x80130c82 0x801321fc 0x80117aa1 0x80100396 0x80117aa1 0x8011772f 0x80100396 0x801073ff 0x80123b54 0x801175a8 0x8010129d 0x801011f0 0x80101194 0x8010d2f8 0x8010968f 0x80123dee 0x801241a5 0x80123cf0 0x80132414 0x80117785 0x8010d01d 0x8012425f 0x80105e57 0x80129e65 0x80105eeb 0x80105f0e 0x80105e57 1416 stack used out of 4000 panic: newqid: super block cpu 0 exiting the terminal gets about half way through booting before the fs dies. my cpu server boots fine, and runs (at least for a while). shortly after booting, however, i get a bunch of messages on the fs console: il: allocating il!10.0.1.100!16428 fworm: read 151856 bufalloc: super block fworm: read 151856 bufalloc: super block [...last two messages repeated 14 more times...] the fs and cpu server continue operating as before. wanting to figure out what's going on, i did this: ark: check fworm: read 151856 FLAGS=10246 TRAP=e ECODE=0 CS=10 PC=801376e9 AX 00000000 BX 00000073 CX ffffffff DX 80080688 SI 809b6c64 DI 00000000 BP 8013fdd8 DS 0008 ES 0008 FS 0008 GS 0008 CR0 80010011 CR2 00000000 ur 800805f4 FLAGS=246 TRAP=1c ECODE=0 CS=10 PC=8012426e AX 00000046 BX 8014fc50 CX 00000000 DX 8014fc50 SI 8014fc50 DI 80144ae2 BP 8014fc78 DS 0008 ES 0008 FS 0008 GS 0008 CR0 80010011 CR2 00000000 lastur 80150ba0 stack trace of 23 0x8012325b 0x8010d519 0x8011795e 0x80117a19 0x80105e57 0x80105e57 0x8010d737 0x8010e210 0x8010b393 0x801270b5 0x8010c8b4 0x80112dbd 0x8011f19d 0x80105e57 0x80106160 0x80105e57 0x8010129d 0x80105f21 0x8010d01d 0x801132cb 0x8011f49c 0x80126de0 0x80106160 0x80105e57 0x80117a85 0x80123b54 0x80123bde 0x801011f0 0x80123f7a 0x80123c8e 0x801376e9 0x801002a4 0x8010973d 0x801098cb 0x8010987f 0x8011772f 0x801073ff 0x80123b54 0x801175a8 0x8010129d 0x801011f0 0x80101194 0x80123dee 0x80123cf0 0x80117ad6 0x80117ad6 0x801099b9 0x8011772f 0x801073ff 0x80123b54 0x801175a8 0x8010129d 0x801011f0 0x80101194 0x80123dee 0x80123cf0 0x80123cf0 0x80117aa1 0x8011772f 0x801073ff 0x80123b54 0x801175a8 0x8010129d 0x801011f0 0x80101194 0x80123dee 0x80123cf0 1616 stack used out of 4000 panic: page fault cpu 0 exiting whoops. forcing a dump results in a similar print. the disk is a nice relatively new (few months) Seagate SCSI thingy, so i'd be somewhat suprised if it was a hardware issue. the systems didn't loose power uncleanly. could i recover back to the last dump? other things to try? i'm in the dark. any help much appreciated. ア