* [9fans] cwfs(4) failing: phase error after recover or suicide after normal startup
@ 2007-09-18 22:28 Anthony Sorace
2007-09-18 23:13 ` erik quanstrom
0 siblings, 1 reply; 8+ messages in thread
From: Anthony Sorace @ 2007-09-18 22:28 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
[-- Attachment #1: Type: text/plain, Size: 1987 bytes --]
Having played around with cwfs for a week or so now, I'm trying to use
it to migrate my old kenfs. The relevant config line is 'filsys main
cw4f[w<0-3>]'; w4 no longer works. I've gotten w0-3 hooked up to my
cpu server and have created a devmap mapping w4 to a 30GB disk file
(the original w4 disk was slightly larger than that).
Starting up cwfs with -f (and other appropriate invocations) seems to
work fine; I give it the config, it reports the correct mapping, and I
end the conversation with 'recover main' and 'end'. The recover seems
to work fine - it reports the block numbers of a few hundred dumps -
but the process ends with this (I have CHAT(cp) always return true):
next dump at Wed Sep 19 05:00:00 2007
c_session 0
c_attach 0
fid = 1
uid = adm
arg = main
fworm: read 1400715
error: phase error -- directory entry not allocated
panic: FID1 attach to root
halted at Tue Sep 18 14:16:07 2007.
That phase error is Ealloc in 9p1.c^f_attach.
If I omit the -f from cwfs's invocation after doing the recover, I get
this, instead:
next dump at Wed Sep 19 05:00:00 2007
c_session 0
c_attach 0
fid = 1
uid = adm
arg = main
cwfs:10685: suicide: sys: trap: divide error pc=0x000131cd
PC there points to /sys/src/cmd/cwfs/cw.c:562 - cwio(); I've got acid
traces in, but I don't see anything obviously wrong (no
divide-by-zero, h->msize is positive, &c).
In this case, only the first cwfs proc has suicided; the rest are
running along, getting 9p requests (although not managing to actually
do anything with them).
I'm going to do more tracing after dinner or tomorrow, but I'm
reasonably stumped at this point. Any pointers or other help is
greatly appreciated. Particularly intriguing is why the behavior
differs, when the first case successfully completes the recover and
moves on. Attached is a summary of an acid debugging run on the
suicided process from the second form, should anyone want to take a
look.
[-- Attachment #2: cwfs.acid.txt --]
[-- Type: text/plain, Size: 2366 bytes --]
: sophia; acid 10685
/proc/10685/text:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
cwio(dev=0x16f408,addr=0x0,buf=0x8250cc0,opcode=0x8)+0x98 /sys/src/cmd/cwfs/cw.c:562
cw=0x85fb188
cb=0xbfd750
h=0x80a8cc0
a1=0x25dd9
bn=0xbfd750
a2=0x34000001
max=0x0
newmax=0x1
p=0xbfd750
b=0x20267
c=0x375bc
state=0x1
p1=0x25dd9
p2=0xbfd750
cwread(dev=0x16f408,b=0x0,c=0x8250cc0)+0x28 /sys/src/cmd/cwfs/cw.c:496
devread(c=0x8250cc0,b=0x0,d=0x16f408)+0x1b6 /sys/src/cmd/cwfs/sub.c:988
e=0x29743
getbuf(d=0x16f408,addr=0x0,flag=0x1)+0x1de /sys/src/cmd/cwfs/iobuf.c:109
hp=0xb5a258
p=0xbffbc0
f_attach(cp=0x85fad28,in=0xdfffebcc,ou=0xdfffeb30)+0x29b /sys/src/cmd/cwfs/9p1.c:241
p=0x0
f=0x1762e0
fs=0x4d3b0
raddr=0x0
d=0x202af
fcall9p1(in=0xdfffebcc,ou=0xdfffeb30,cp=0x85fad28)+0x95 /sys/src/cmd/cwfs/console.c:21
t=0x56
con_attach(fid=0x1,uid=0x4058f,arg=0x1734e8)+0x84 /sys/src/cmd/cwfs/console.c:48
in=0x1ec56
ou=0x1ea57
cmd_cfs(argc=0x1,argv=0xdfffeca8)+0x70 /sys/src/cmd/cwfs/con.c:618
name=0x40571
fs=0x4d3b0
cmd_exec(arg=0x401a0)+0xdc /sys/src/cmd/cwfs/con.c:118
line=0x736663
argv=0xdfffecd4
argc=0x1
i=0x1
consserve()+0x3d /sys/src/cmd/cwfs/con.c:20
i=0x29d0
main(argv=0xdfffef88,argc=0x1)+0x2b0 /sys/src/cmd/cwfs/main.c:331
nets=0x1
_argc=0x6d
_args=0x3fd64
ann=0x0
i=0xf
_main+0x31 /sys/src/libc/386/main9.s:16
acid: print(pcfile(0x000131cd))
/sys/src/cmd/cwfs/cw.c
acid: print(pcline(0x000131cd))
562
acid: include("/sys/src/cmd/cwfs/arkive/cw.acid")
acid: cwio:dev
type 0x08
init 0xf4
link 0x00000000
dlink 0x08250cc0
private 0x00000008
size 582328845860864
_10_ {
_2_ wren {
ctrl 1504264
targ 0
lun 136645824
mapped 11903580
file 0x00029845
fd 12581824
sddir 0x00029743
sddata 0x0001841f
}
_3_ cat {
first 0x0016f408
last 0x00000000
ndev 136645824
}
_4_ cw {
c 0x0016f408
w 0x00000000
ro 0x08250cc0
}
_5_ j {
j 0x0016f408
m 0x00000000
}
_6_ ro {
parent 0x0016f408
}
_7_ fw {
fw 0x0016f408
}
_8_ part {
d 0x0016f408
base 0
size 136645824
}
_9_ swab {
d 0x0016f408
}
}
acid: cwio:h
maddr 134909120
msize 12572496
caddr 12572496
csize 155097
fsize 12572496
wsize 77718
wmax 1504264
sbaddr 0
cwraddr 136645824
roraddr 8
toytime 0
time 135584
acid: cwio:addr
0xdfffea60
acid: cwio:bn
0xdfffea38
acid: rc("cat /dev/text > /mnt/term/Users/anthony/Desktop/cwfs.acid.txt")
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] cwfs(4) failing: phase error after recover or suicide after normal startup
2007-09-18 22:28 [9fans] cwfs(4) failing: phase error after recover or suicide after normal startup Anthony Sorace
@ 2007-09-18 23:13 ` erik quanstrom
2007-09-20 23:13 ` Anthony Sorace
0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2007-09-18 23:13 UTC (permalink / raw)
To: 9fans
bn = addr % h->msize;
msize must be zero.
- erik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] cwfs(4) failing: phase error after recover or suicide after normal startup
2007-09-18 23:13 ` erik quanstrom
@ 2007-09-20 23:13 ` Anthony Sorace
2007-09-21 0:03 ` [9fans] cwfs(4) failing: phase error after recover or suicide erik quanstrom
0 siblings, 1 reply; 8+ messages in thread
From: Anthony Sorace @ 2007-09-20 23:13 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
that was the first thing i checked. in acid, print(cwio:h) showed
seemingly useful non-0 numbers. but apparently i wasn't paying very
close attention: h in cwio is a Cache*, not a Cache, so i needed
print(*cwio:h). yup, msize is zero.
geoff's been providing suggestions and thinks that the recover didn't
actually succeed, so i'm focusing on that case for now. more info as
it presents.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] cwfs(4) failing: phase error after recover or suicide
2007-09-20 23:13 ` Anthony Sorace
@ 2007-09-21 0:03 ` erik quanstrom
2007-09-21 1:53 ` Anthony Sorace
0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2007-09-21 0:03 UTC (permalink / raw)
To: 9fans
> that was the first thing i checked. in acid, print(cwio:h) showed
> seemingly useful non-0 numbers. but apparently i wasn't paying very
> close attention: h in cwio is a Cache*, not a Cache, so i needed
> print(*cwio:h). yup, msize is zero.
>
> geoff's been providing suggestions and thinks that the recover didn't
> actually succeed, so i'm focusing on that case for now. more info as
> it presents.
that's what i guessed your problem was. (since msize just had to be zero.)
i would guess that your new fworm is not exactly the same (calculated)
size as your old worm. i think you can fix this by simply dropping the "f"
from your device string. this will inhibit the maintence of the bitmap
at the end of the fake worm.
anyway, what is the compelling reason to move to cwfs? i have been
spending a lot of time with it in the last 6 weeks. i've made some substantial
performance improvements. and i've added aoe.
- erik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] cwfs(4) failing: phase error after recover or suicide
2007-09-21 0:03 ` [9fans] cwfs(4) failing: phase error after recover or suicide erik quanstrom
@ 2007-09-21 1:53 ` Anthony Sorace
2007-09-21 2:00 ` erik quanstrom
0 siblings, 1 reply; 8+ messages in thread
From: Anthony Sorace @ 2007-09-21 1:53 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 9/20/07, erik quanstrom <quanstro@quanstro.net> wrote:
// i would guess that your new fworm is not exactly the same (calculated)
// size as your old worm.
the fworm, not the cache? hrm, interesting. it's exactly the same
disks, but i suppose that could be it. i'll take a look at that and
how the bitmap is maintained. i'd expect problems there to show up in
the explicit recover phase (which cwfs's prints say has completed),
but it's worth a check. dropping the "f" is non-destructive in the
face of recover?
i've been looking at auth issues for some of the evening, since it's
complaining about things related to attach. maybe that's a red
herring. i'll take a look at the bitmap tomorrow.
// anyway, what is the compelling reason to move to cwfs?
it's prompted by something in my fs hardware going funny. i suspect
it's just the terminator i have to use on the somewhat odd setup in
that box, but it led to the whole "gee, i'd really like fewer PCs to
maintain" line of thought. the kenfs is also quite old now, and the
size reflects that; i'm considering just moving everything on it over
to venti and putting the box in storage. not to mention a desire to
reduce my power consumption and noise production.
i still think the stand-alone fs has its place, but i don't think my
garage is it.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] cwfs(4) failing: phase error after recover or suicide
2007-09-21 1:53 ` Anthony Sorace
@ 2007-09-21 2:00 ` erik quanstrom
2007-09-21 3:10 ` Anthony Sorace
0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2007-09-21 2:00 UTC (permalink / raw)
To: 9fans
> the fworm, not the cache? hrm, interesting. it's exactly the same
> disks, but i suppose that could be it. i'll take a look at that and
> how the bitmap is maintained. i'd expect problems there to show up in
> the explicit recover phase (which cwfs's prints say has completed),
> but it's worth a check. dropping the "f" is non-destructive in the
> face of recover?
yes. recover doesn't touch the w part of the device. it just checks the
block after the last block in each dump to see if it's a sb. if it is it
loops. if it is not, then you're at the end and the cache is cleared.
> maintain" line of thought. the kenfs is also quite old now, and the
> size reflects that; i'm considering just moving everything on it over
> to venti and putting the box in storage. not to mention a desire to
> reduce my power consumption and noise production.
kenfs does run on new hardware. i'm currently running it on an
intel 5000-series processor and a brand new mb at coraid. it also
does great with my valinux pIII at home.
> i still think the stand-alone fs has its place, but i don't think my
> garage is it.
electricity: $5/month.
noise: too much.
not doing maintence to the fs: priceless. ☺
- erik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] cwfs(4) failing: phase error after recover or suicide
2007-09-21 2:00 ` erik quanstrom
@ 2007-09-21 3:10 ` Anthony Sorace
2007-09-21 3:39 ` erik quanstrom
0 siblings, 1 reply; 8+ messages in thread
From: Anthony Sorace @ 2007-09-21 3:10 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
perfect. removing the f from the config did the trick exactly. i've
got my fs back. i'd still like to understand more why the bitmap is
wrong on the same disks, but that's for another day now. very much
thanks.
i agree having a file server "just run" is worth quite a bit; the
problem is the hardware in mine no longer fits that description, and
i'm temporarily budget constrained.
again, much thanks.
a
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] cwfs(4) failing: phase error after recover or suicide
2007-09-21 3:10 ` Anthony Sorace
@ 2007-09-21 3:39 ` erik quanstrom
0 siblings, 0 replies; 8+ messages in thread
From: erik quanstrom @ 2007-09-21 3:39 UTC (permalink / raw)
To: 9fans
> perfect. removing the f from the config did the trick exactly. i've
> got my fs back. i'd still like to understand more why the bitmap is
> wrong on the same disks, but that's for another day now. very much
> thanks.
cool..
the problem is that the calculation of the device size is subject
to rounding error and if it's one sector off, you're bitmap won't
be were its supposed to be.
> i agree having a file server "just run" is worth quite a bit; the
> problem is the hardware in mine no longer fits that description, and
> i'm temporarily budget constrained.
you can get a valinux box on ebay for $100.
- erik
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-09-21 3:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-18 22:28 [9fans] cwfs(4) failing: phase error after recover or suicide after normal startup Anthony Sorace
2007-09-18 23:13 ` erik quanstrom
2007-09-20 23:13 ` Anthony Sorace
2007-09-21 0:03 ` [9fans] cwfs(4) failing: phase error after recover or suicide erik quanstrom
2007-09-21 1:53 ` Anthony Sorace
2007-09-21 2:00 ` erik quanstrom
2007-09-21 3:10 ` Anthony Sorace
2007-09-21 3:39 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).