9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] Oh....Hell.  File server problems.
@ 2001-04-27  7:03 forsyth
  2001-04-27 14:14 ` Dan Cross
  0 siblings, 1 reply; 12+ messages in thread
From: forsyth @ 2001-04-27  7:03 UTC (permalink / raw)
  To: 9fans

>>I seem to have done a bad thing; my file server thinks that it's dump
>>disk (pseudo-worm) is full, even though it's really not (uhh, don't ask).
>>Now, every time I try and boot the file server, it panics.  I don't care

don't ask?  knowing what the configuration was and what went wrong might
allow recovery.  depending on what you did it's possible the data is still there.
have you tried the recover command in config mode, or doesn't it get even that far?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
  2001-04-27  7:03 [9fans] Oh....Hell. File server problems forsyth
@ 2001-04-27 14:14 ` Dan Cross
  0 siblings, 0 replies; 12+ messages in thread
From: Dan Cross @ 2001-04-27 14:14 UTC (permalink / raw)
  To: 9fans

In article <20010427070646.E8BC2199C1@mail.cse.psu.edu> you write:
>>>I seem to have done a bad thing; my file server thinks that it's dump
>>>disk (pseudo-worm) is full, even though it's really not (uhh, don't ask).
>>>Now, every time I try and boot the file server, it panics.  I don't care
>
>don't ask?  knowing what the configuration was and what went wrong might
>allow recovery.  depending on what you did it's possible the data is still
>there.

Well, it's embarassing.  :-)  The FS is using Eric Dorman's patches for
IDE disks, and the pseudo-worm lives on a 10GB IDE disk.  Cache lives on
a 9GB SCSI disk.  The config is as straight forward as can be; the entire
IDE disk is devoted to cache (no partitions, no nothing), and the entire
SCSI disk to cache.

The problem is that there was a very small bug in the IDE FS code wherein
size calculations for disks > ~4GB would overflow; leaving the file server
to believe that it had significantly less space available than it really
did.  A patch was sent out to 9fans for it a few months ago (sorry, I
don't remember who wrote the patch!), but I never applied it.  Hence, my
FS thought that the dump disk was somewhere on the order of ~2GB instead
of 10.  Whoops.  (See?  I said it was embarassing....  :-)

Anyway, I got Eric's patches again, and the patch to the patch, built
another file server kernel (from my stand-alone laptop) and tried
rebooting the file server with that.  This time, the file server
paniced on boot after not being able to find it's superblock.  When I
switched the kernels back and rebooted, it came up, but a few files
were giving me ``phase error--cannot happen'' diagnostics when I tried
to cat or otherwise read them.  I was going around trying to remove all
these so I could get a snapshot of the filesystem when the thing
crashed the last time, refusing to come up after that.  It occured to
me that I should have just tried to tar the latest dump, which seemed
to be unaffected.

I have no reason to believe that the data itself has been affected;
it seems to be more a metadata issue.  :-(

>have you tried the recover command in config mode, or doesn't it get even
>that far?

I have tried the recover command, and the machine indeed comes up into
config mode, but as soon as I try to ``end'' to make the recover happen,
the machine panics with a, ``panic: worm rbounds xxxx'' where xxxx is the
size of what the FS thinks the worm is, which is greater than it thinks
that it *can* be.

It's interesting, and perhaps a little scary, to notice how the file server
deals with the worm when it gets full.  I've noticed that it will return
a diagnostic to the user (``file system full'') and continue working okay
for a few seconds after that, but then freeze; even a ``halt'' on the
console is ineffective.  Yikes!

	- Dan C.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
@ 2001-05-04 16:18 forsyth
  0 siblings, 0 replies; 12+ messages in thread
From: forsyth @ 2001-05-04 16:18 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

>>nothing would fix it except power-cycling the machine.  Charles Forsyth
>>mentioned that this is a known bug, which has been fixed in later
>>versions of the file server code, and is related to not releasing a
>>lock somewhere before returning from a function, or something similar.

i think what i meant was ``i thought/assumed that had been fixed!''
in the last few years, not fixed recently.    i remember finding the cause in the 2nd
edition but perhaps i didn't send the fix along.  i'll need to check
the dump.


[-- Attachment #2: Type: message/rfc822, Size: 3233 bytes --]

To: cse.psu.edu!9fans
Subject: Re: [9fans] Oh....Hell.  File server problems.
Message-ID: <200104301840.OAA19486@augusta.math.psu.edu>

Cc: 
Sender: 9fans-admin@cse.psu.edu
Errors-To: 9fans-admin@cse.psu.edu
X-BeenThere: 9fans@cse.psu.edu
X-Mailman-Version: 2.0.1
Precedence: bulk
Reply-To: 9fans@cse.psu.edu
List-Id: Fans of the OS Plan 9 from Bell Labs <9fans.cse.psu.edu>
List-Archive: <http://lists.cse.psu.edu/archives/9fans/>
Date: Mon, 30 Apr 2001 14:40:43 -0400 (EDT)

In article <20010430073525.4977B1998A@mail.cse.psu.edu> you write:
>	the `patch' was just replacing something like
>
>	x = x * 512 / blocksize 
>to be
>	x = x / blocksize * 512
>
>if I remember well. 
>I'm the one hapilly using the ide fs and (If I'm not
>mistaken) even a >2GB disk would be a problem w/o replacing
>this thing. I don't remember exactly where I changed it (must
>be in the 9fans archives) If
>anyone needs, I'll try to find out.
>
>I think it was in the initialization code,
>but  don't remember exactly.

Thanks Nemo; I did find where you had sent the patch earlier, in
February, and I applied it to my file server and rebuilt it (adding
another 10GB IDE disk for the pseudo-worm in the process).

The patch made the change that you mention, but in the atasize()
function in devata.c.  I also changed devream() in sub.c to add a
``case Devide:'' to the switch statement on d->type.  This allowed me
to put my ``other'' filesystem on part of the IDE disk.

As far as the behavior of the file server when the worm fills up...  It
doesn't react very nicely; mine ``froze'' and appeared to be dead,
nothing would fix it except power-cycling the machine.  Charles Forsyth
mentioned that this is a known bug, which has been fixed in later
versions of the file server code, and is related to not releasing a
lock somewhere before returning from a function, or something similar.

Presumably, it could be fixed by diff'ing the current file server code
against Eric's patched file server, and incorporating changes that have
been made to the mainline 

	- Dan C.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
  2001-05-02 15:20         ` Dan Cross
@ 2001-05-04  1:48           ` Eric Dorman
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dorman @ 2001-05-04  1:48 UTC (permalink / raw)
  To: 9fans

Dan Cross wrote:
> Aww dang.  :-)  Well, then perhaps it is a good idea for eg, Nemo
> and myself or whomever to try and integrate new fileserver changes
> into the IDE-patched kernel.
>         - Dan C.

That's probably a good idea...

--eld


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
  2001-05-02  1:45       ` Eric Dorman
@ 2001-05-02 15:20         ` Dan Cross
  2001-05-04  1:48           ` Eric Dorman
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Cross @ 2001-05-02 15:20 UTC (permalink / raw)
  To: 9fans

In article <3AEF66A2.D0CA097D@san.rr.com> you write:
>I have to admit I haven't been keeping much track of things lately
>since I've been struggling with some other unrelated stuff.  Ugh.

Aww dang.  :-)  Well, then perhaps it is a good idea for eg, Nemo
and myself or whomever to try and integrate new fileserver changes
into the IDE-patched kernel.

	- Dan C.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
  2001-04-30 21:05     ` Dan Cross
@ 2001-05-02  1:45       ` Eric Dorman
  2001-05-02 15:20         ` Dan Cross
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dorman @ 2001-05-02  1:45 UTC (permalink / raw)
  To: 9fans

Dan Cross wrote:
> Yes, I agree....  Perhaps Eric is already tracking changes to the
> mainstream fileserver code?
>         - Dan C.

I have to admit I haven't been keeping much track of things lately
since I've been struggling with some other unrelated stuff.  Ugh.

--eric


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
  2001-04-30 19:12   ` Francisco J Ballesteros
@ 2001-04-30 21:05     ` Dan Cross
  2001-05-02  1:45       ` Eric Dorman
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Cross @ 2001-04-30 21:05 UTC (permalink / raw)
  To: 9fans

In article <3AEDB90B.533FE57@gsyc.escet.urjc.es> you write:
>Perhaps we who are using an IDE fs should extract the ide stuff
>out of the original patch and make a patch for the current
>version of the 3rd edition kernel. Would it be worth?
>I'd say so because it looks like the new fs code wont be
>released soon. If nobody does it before, I may do that when
>I get the time and drop a line to the list.

Yes, I agree....  Perhaps Eric is already tracking changes to the
mainstream fileserver code?

	- Dan C.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
  2001-04-30 18:40 ` Dan Cross
@ 2001-04-30 19:12   ` Francisco J Ballesteros
  2001-04-30 21:05     ` Dan Cross
  0 siblings, 1 reply; 12+ messages in thread
From: Francisco J Ballesteros @ 2001-04-30 19:12 UTC (permalink / raw)
  To: 9fans

Perhaps we who are using an IDE fs should extract the ide stuff
out of the original patch and make a patch for the current
version of the 3rd edition kernel. Would it be worth?
I'd say so because it looks like the new fs code won�t be
released soon. If nobody does it before, I may do that when
I get the time and drop a line to the list.


Dan Cross wrote:
>
> Presumably, it could be fixed by diff'ing the current file server code
> against Eric's patched file server, and incorporating changes that have
> been made to the mainline
>
>         - Dan C.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
  2001-04-30  7:39 nemo
@ 2001-04-30 18:40 ` Dan Cross
  2001-04-30 19:12   ` Francisco J Ballesteros
  0 siblings, 1 reply; 12+ messages in thread
From: Dan Cross @ 2001-04-30 18:40 UTC (permalink / raw)
  To: 9fans

In article <20010430073525.4977B1998A@mail.cse.psu.edu> you write:
>	the `patch' was just replacing something like
>
>	x = x * 512 / blocksize
>to be
>	x = x / blocksize * 512
>
>if I remember well.
>I'm the one hapilly using the ide fs and (If I'm not
>mistaken) even a >2GB disk would be a problem w/o replacing
>this thing. I don't remember exactly where I changed it (must
>be in the 9fans archives) If
>anyone needs, I'll try to find out.
>
>I think it was in the initialization code,
>but  don't remember exactly.

Thanks Nemo; I did find where you had sent the patch earlier, in
February, and I applied it to my file server and rebuilt it (adding
another 10GB IDE disk for the pseudo-worm in the process).

The patch made the change that you mention, but in the atasize()
function in devata.c.  I also changed devream() in sub.c to add a
``case Devide:'' to the switch statement on d->type.  This allowed me
to put my ``other'' filesystem on part of the IDE disk.

As far as the behavior of the file server when the worm fills up...  It
doesn't react very nicely; mine ``froze'' and appeared to be dead,
nothing would fix it except power-cycling the machine.  Charles Forsyth
mentioned that this is a known bug, which has been fixed in later
versions of the file server code, and is related to not releasing a
lock somewhere before returning from a function, or something similar.

Presumably, it could be fixed by diff'ing the current file server code
against Eric's patched file server, and incorporating changes that have
been made to the mainline

	- Dan C.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
@ 2001-04-30  7:43 nemo
  0 siblings, 0 replies; 12+ messages in thread
From: nemo @ 2001-04-30  7:43 UTC (permalink / raw)
  To: 9fans

: It's interesting, and perhaps a little scary, to notice how the file server
: deals with the worm when it gets full.  I've noticed that it will return
: a diagnostic to the user (``file system full'') and continue working okay
: for a few seconds after that, but then freeze; even a ``halt'' on the
: console is ineffective.  Yikes!

In any case, is this supposed to be the fs behaviour when it gets full?
I thought it would at least allow you to somehow copy the the (ide) worm into a
bigger one.

Let us know how it goes. I have an ide pseudo worm too and would like to know
what will happen when it gets full...



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] Oh....Hell.  File server problems.
@ 2001-04-30  7:39 nemo
  2001-04-30 18:40 ` Dan Cross
  0 siblings, 1 reply; 12+ messages in thread
From: nemo @ 2001-04-30  7:39 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

Hi,

	the `patch' was just replacing something like

	x = x * 512 / blocksize
to be
	x = x / blocksize * 512

if I remember well.
I'm the one hapilly using the ide fs and (If I'm not
mistaken) even a >2GB disk would be a problem w/o replacing
this thing. I don't remember exactly where I changed it (must
be in the 9fans archives) If
anyone needs, I'll try to find out.

I think it was in the initialization code,
but  don't remember exactly.


[-- Attachment #2: Type: message/rfc822, Size: 4571 bytes --]

From: Dan Cross <cross@math.psu.edu>
To: 9fans@cse.psu.edu
Cc: 
Subject: Re: [9fans] Oh....Hell.  File server problems.
Date: Fri, 27 Apr 2001 10:14:53 -0400 (EDT)
Message-ID: <200104271414.KAA26728@augusta.math.psu.edu>

In article <20010427070646.E8BC2199C1@mail.cse.psu.edu> you write:
>>>I seem to have done a bad thing; my file server thinks that it's dump
>>>disk (pseudo-worm) is full, even though it's really not (uhh, don't ask).
>>>Now, every time I try and boot the file server, it panics.  I don't care
>
>don't ask?  knowing what the configuration was and what went wrong might
>allow recovery.  depending on what you did it's possible the data is still
>there.

Well, it's embarassing.  :-)  The FS is using Eric Dorman's patches for
IDE disks, and the pseudo-worm lives on a 10GB IDE disk.  Cache lives on
a 9GB SCSI disk.  The config is as straight forward as can be; the entire
IDE disk is devoted to cache (no partitions, no nothing), and the entire
SCSI disk to cache.

The problem is that there was a very small bug in the IDE FS code wherein
size calculations for disks > ~4GB would overflow; leaving the file server
to believe that it had significantly less space available than it really
did.  A patch was sent out to 9fans for it a few months ago (sorry, I
don't remember who wrote the patch!), but I never applied it.  Hence, my
FS thought that the dump disk was somewhere on the order of ~2GB instead
of 10.  Whoops.  (See?  I said it was embarassing....  :-)

Anyway, I got Eric's patches again, and the patch to the patch, built
another file server kernel (from my stand-alone laptop) and tried
rebooting the file server with that.  This time, the file server
paniced on boot after not being able to find it's superblock.  When I
switched the kernels back and rebooted, it came up, but a few files
were giving me ``phase error--cannot happen'' diagnostics when I tried
to cat or otherwise read them.  I was going around trying to remove all
these so I could get a snapshot of the filesystem when the thing
crashed the last time, refusing to come up after that.  It occured to
me that I should have just tried to tar the latest dump, which seemed
to be unaffected.

I have no reason to believe that the data itself has been affected;
it seems to be more a metadata issue.  :-(

>have you tried the recover command in config mode, or doesn't it get even
>that far?

I have tried the recover command, and the machine indeed comes up into
config mode, but as soon as I try to ``end'' to make the recover happen,
the machine panics with a, ``panic: worm rbounds xxxx'' where xxxx is the
size of what the FS thinks the worm is, which is greater than it thinks
that it *can* be.

It's interesting, and perhaps a little scary, to notice how the file server
deals with the worm when it gets full.  I've noticed that it will return
a diagnostic to the user (``file system full'') and continue working okay
for a few seconds after that, but then freeze; even a ``halt'' on the
console is ineffective.  Yikes!

	- Dan C.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [9fans] Oh....Hell.  File server problems.
@ 2001-04-27  3:09 Dan Cross
  0 siblings, 0 replies; 12+ messages in thread
From: Dan Cross @ 2001-04-27  3:09 UTC (permalink / raw)
  To: 9fans

(10 bucks if you spot the literature reference from the title; what a mystery.)

I seem to have done a bad thing; my file server thinks that it's dump
disk (pseudo-worm) is full, even though it's really not (uhh, don't ask).
Now, every time I try and boot the file server, it panics.  I don't care
about that so much, since I know how to fix the file server and I've
resigned myself to the fact that I'll have to rebuild it anyway, but
I'm curious as to how I can get my %&@$ data off of the old dump disk.
Just the last dump would be fine.

Normally, I would just use tar or mkfs, but I can't get the file server
to keep it's noggin together long enough to do even that.  :-(  Has anyone
encountered such problems before?  Any advice would be most welcome,
otherwise, I'll just blitz the thing and bear the pain (normally I keep
my data fairly well synchronized between my laptop and the FS at work,
and I'm pretty much the only one who uses it anyway....).

Thanks for any help!

	- Dan C.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-05-04 16:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-27  7:03 [9fans] Oh....Hell. File server problems forsyth
2001-04-27 14:14 ` Dan Cross
  -- strict thread matches above, loose matches on Subject: below --
2001-05-04 16:18 forsyth
2001-04-30  7:43 nemo
2001-04-30  7:39 nemo
2001-04-30 18:40 ` Dan Cross
2001-04-30 19:12   ` Francisco J Ballesteros
2001-04-30 21:05     ` Dan Cross
2001-05-02  1:45       ` Eric Dorman
2001-05-02 15:20         ` Dan Cross
2001-05-04  1:48           ` Eric Dorman
2001-04-27  3:09 Dan Cross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).