From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Cross Message-Id: <200104271414.KAA26728@augusta.math.psu.edu> To: 9fans@cse.psu.edu Subject: Re: [9fans] Oh....Hell. File server problems. In-Reply-To: <20010427070646.E8BC2199C1@mail.cse.psu.edu> Cc: Date: Fri, 27 Apr 2001 10:14:53 -0400 Topicbox-Message-UUID: 9444d8cc-eac9-11e9-9e20-41e7f4b1d025 In article <20010427070646.E8BC2199C1@mail.cse.psu.edu> you write: >>>I seem to have done a bad thing; my file server thinks that it's dump >>>disk (pseudo-worm) is full, even though it's really not (uhh, don't ask). >>>Now, every time I try and boot the file server, it panics. I don't care > >don't ask? knowing what the configuration was and what went wrong might >allow recovery. depending on what you did it's possible the data is still >there. Well, it's embarassing. :-) The FS is using Eric Dorman's patches for IDE disks, and the pseudo-worm lives on a 10GB IDE disk. Cache lives on a 9GB SCSI disk. The config is as straight forward as can be; the entire IDE disk is devoted to cache (no partitions, no nothing), and the entire SCSI disk to cache. The problem is that there was a very small bug in the IDE FS code wherein size calculations for disks > ~4GB would overflow; leaving the file server to believe that it had significantly less space available than it really did. A patch was sent out to 9fans for it a few months ago (sorry, I don't remember who wrote the patch!), but I never applied it. Hence, my FS thought that the dump disk was somewhere on the order of ~2GB instead of 10. Whoops. (See? I said it was embarassing.... :-) Anyway, I got Eric's patches again, and the patch to the patch, built another file server kernel (from my stand-alone laptop) and tried rebooting the file server with that. This time, the file server paniced on boot after not being able to find it's superblock. When I switched the kernels back and rebooted, it came up, but a few files were giving me ``phase error--cannot happen'' diagnostics when I tried to cat or otherwise read them. I was going around trying to remove all these so I could get a snapshot of the filesystem when the thing crashed the last time, refusing to come up after that. It occured to me that I should have just tried to tar the latest dump, which seemed to be unaffected. I have no reason to believe that the data itself has been affected; it seems to be more a metadata issue. :-( >have you tried the recover command in config mode, or doesn't it get even >that far? I have tried the recover command, and the machine indeed comes up into config mode, but as soon as I try to ``end'' to make the recover happen, the machine panics with a, ``panic: worm rbounds xxxx'' where xxxx is the size of what the FS thinks the worm is, which is greater than it thinks that it *can* be. It's interesting, and perhaps a little scary, to notice how the file server deals with the worm when it gets full. I've noticed that it will return a diagnostic to the user (``file system full'') and continue working okay for a few seconds after that, but then freeze; even a ``halt'' on the console is ineffective. Yikes! - Dan C.