* [9fans] interesting deadlock
@ 2010-09-17 3:02 erik quanstrom
2010-09-17 21:11 ` Russ Cox
0 siblings, 1 reply; 5+ messages in thread
From: erik quanstrom @ 2010-09-17 3:02 UTC (permalink / raw)
To: 9fans
i have these processes all deadlocked. 8.out
is serving /n/mntpt.
xxx 11921346 0:00 0:00 436K Create 8.out
xxx 11921785 0:00 0:00 24K Open cat /n/mntpt/sos
xxx 11921786 0:00 0:00 24K Unmount unmount /n/mntpt
xxx 11921787 0:00 0:00 44K Pwrite echo x y
minooka# acid -l/tmp/acid -k 11921346 /386/9pccpu
/386/9pccpu:386 plan 9 boot image
/sys/lib/acid/port
/sys/lib/acid/386
acid: stk()
sched()+0x140 /sys/src/9/port/proc.c:156
rlock(q=0xf6400490)+0x9d /sys/src/9/port/qlock.c:113
findmount(qid=0x407325,type=0x7,dev=0x252c,mp=0xf6b27998,cp=0xf6b279c8)+0x25 /sys/src/9/port/chan.c:845
domount(cp=0xf6b279c8,mp=0xf6b27998,path=0xf6b279c0)+0x42 /sys/src/9/port/chan.c:883
namec(aname=0xf6cdb870,amode=0x5,omode=0x11,perm=0x1b6)+0x5a4 /sys/src/9/port/chan.c:1457
syscreate(arg=0xf050cf48)+0x62 /sys/src/9/port/sysfile.c:1126
syscall(ureg=0xf6b27a54)+0x1c6 /sys/src/9/pc/trap.c:714
_syscallintr()+0x18 /sys/src/9/pc/plan9l.s:45
0xf6b27a54 ?file?:0
acid:
echo kill > /proc/11921346/ctl
minooka# acid -l/tmp/acid -k 11921785 /386/9pccpu
/386/9pccpu:386 plan 9 boot image
/sys/lib/acid/port
/sys/lib/acid/386
acid: stk()
sleep(r=0xf742276c,arg=0xf7422540,f=0xf011af60)+0x1ab /sys/src/9/port/proc.c:785
mountio(m=0xf6ccbc10,r=0xf7422540)+0x292 /sys/src/9/port/devmnt.c:808
mountrpc(r=0xf7422540,m=0xf6ccbc10)+0x2e /sys/src/9/port/devmnt.c:745
mntwalk(nc=0xf6a374d0,nname=0x1,c=0xf6c6d140,name=0xf6694898)+0x1b5 /sys/src/9/port/devmnt.c:426
ewalk(c=0xf6c6d140,nc=0x0,name=0xf6694898,nname=0x1)+0x79 /sys/src/9/port/chan.c:937
walk(cp=0xf723c2a8,nnames=0x3,nerror=0xf723c298,names=0xf6694890,nomount=0x0)+0x5d8 /sys/src/9/port/chan.c:1017
namec(aname=0xf741b680,amode=0x3,omode=0x0,perm=0x0)+0x325 /sys/src/9/port/chan.c:1420
sysopen(arg=0xf04eea08)+0x5e /sys/src/9/port/sysfile.c:273
syscall(ureg=0xf723c334)+0x1c6 /sys/src/9/pc/trap.c:714
_syscallintr()+0x18 /sys/src/9/pc/plan9l.s:45
0xf723c334 ?file?:0
acid:
echo kill > /proc/11921785/ctl
minooka# acid -l/tmp/acid -k 11921786 /386/9pccpu
/386/9pccpu:386 plan 9 boot image
/sys/lib/acid/port
/sys/lib/acid/386
acid: stk()
sched()+0x140 /sys/src/9/port/proc.c:156
wlock(q=0xf66921c0)+0xad /sys/src/9/port/qlock.c:168
cunmount(mnt=0xf756a140,mounted=0x0)+0xd4 /sys/src/9/port/chan.c:781
sysunmount(arg=0xf0543ecc)+0x11d /sys/src/9/port/sysfile.c:1111
syscall(ureg=0xf7421934)+0x1c6 /sys/src/9/pc/trap.c:714
_syscallintr()+0x18 /sys/src/9/pc/plan9l.s:45
0xf7421934 ?file?:0
- erik
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] interesting deadlock
2010-09-17 3:02 [9fans] interesting deadlock erik quanstrom
@ 2010-09-17 21:11 ` Russ Cox
2010-09-17 22:12 ` erik quanstrom
2010-09-17 23:57 ` erik quanstrom
0 siblings, 2 replies; 5+ messages in thread
From: Russ Cox @ 2010-09-17 21:11 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Thu, Sep 16, 2010 at 11:02 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> i have these processes all deadlocked. 8.out
> is serving /n/mntpt.
>
> xxx 11921346 0:00 0:00 436K Create 8.out
> xxx 11921785 0:00 0:00 24K Open cat /n/mntpt/sos
> xxx 11921786 0:00 0:00 24K Unmount unmount /n/mntpt
> xxx 11921787 0:00 0:00 44K Pwrite echo x y
tell us why it's interesting. it looks like 8.out can see
itself so you've got a bad loop.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] interesting deadlock
2010-09-17 21:11 ` Russ Cox
@ 2010-09-17 22:12 ` erik quanstrom
2010-09-17 23:21 ` Russ Cox
2010-09-17 23:57 ` erik quanstrom
1 sibling, 1 reply; 5+ messages in thread
From: erik quanstrom @ 2010-09-17 22:12 UTC (permalink / raw)
To: 9fans
On Thu, Sep 16, 2010 at 11:02 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> i have these processes all deadlocked. 8.out
> is serving /n/mntpt.
>
> xxx 11921346 0:00 0:00 436K Create 8.out
> xxx 11921785 0:00 0:00 24K Open cat /n/mntpt/sos
> xxx 11921786 0:00 0:00 24K Unmount unmount /n/mntpt
> xxx 11921787 0:00 0:00 44K Pwrite echo x y
okay, it's not. sorry for the confusion. the reason
for looking at that case was that it seemed at the
time related to a crash here on the indirection
of mh->mount, which is nil. the code in question
was a fileserver which was crashing, getting killed
while concurrent io was being done to the fs.
/sys/src/9/port/chan.c:1009,1019
if((wq = ewalk(c, nil, names+nhave, ntry)) == nil){
/* try a union mount, if any */
if(mh && !nomount){
/*
* mh->mount->to == c, so start at mh->mount->next
*/
rlock(&mh->lock);
for(f = mh->mount->next; f; f = f->next)
if((wq = ewalk(f->to, nil, names+nhave, ntry)) != nil)
break;
i don't know why it can't be nil, since the code here
doesn't have a lock. i think this might be the solution,
but i haven't done a careful lock audit to be sure
/usr/quanstro/src/ysk/port/chan.c:1009,1021
if((wq = ewalk(c, nil, names+nhave, ntry)) == nil){
/* try a union mount, if any */
if(mh && !nomount){
/*
* mh->mount->to == c, so start at mh->mount->next
*/
>> f = nil;
rlock(&mh->lock);
>> if(mh->mount)
for(f = mh->mount->next; f; f = f->next)
if((wq = ewalk(f->to, nil, names+nhave, ntry)) != nil)
break;
runlock(&mh->lock);
- erik
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] interesting deadlock
2010-09-17 22:12 ` erik quanstrom
@ 2010-09-17 23:21 ` Russ Cox
0 siblings, 0 replies; 5+ messages in thread
From: Russ Cox @ 2010-09-17 23:21 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> of mh->mount, which is nil. the code in question
> was a fileserver which was crashing, getting killed
> while concurrent io was being done to the fs.
i believe that. the code is assuming that because
it found mh in the mount table, mh->mount != nil.
that's only true until it releases the rlock, which it has.
if an unmount happens between the runlock and the
rlock, then mh->mount will be nil here. it means
that nothing is mounted there anymore. your fix
looks reasonable.
russ
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] interesting deadlock
2010-09-17 21:11 ` Russ Cox
2010-09-17 22:12 ` erik quanstrom
@ 2010-09-17 23:57 ` erik quanstrom
1 sibling, 0 replies; 5+ messages in thread
From: erik quanstrom @ 2010-09-17 23:57 UTC (permalink / raw)
To: 9fans
On Fri Sep 17 17:13:40 EDT 2010, rsc@swtch.com wrote:
> On Thu, Sep 16, 2010 at 11:02 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> > i have these processes all deadlocked. 8.out
> > is serving /n/mntpt.
> >
> > xxx 11921346 0:00 0:00 436K Create 8.out
> > xxx 11921785 0:00 0:00 24K Open cat /n/mntpt/sos
> > xxx 11921786 0:00 0:00 24K Unmount unmount /n/mntpt
> > xxx 11921787 0:00 0:00 44K Pwrite echo x y
>
> tell us why it's interesting. it looks like 8.out can see
> itself so you've got a bad loop.
on second thought, i'm not sure i fully understand why
not forking the namespace should result in a deadlock.
8.out is opening /dev/sdC0/part OTRUNC
cat is opening /n/mntpt/sos
unmount is opening /n/mntpt
so 8.out isn't waiting for itself.
it seems that any one of the three could resonably
finish, but they deadlock. but the problem is
have want
11921346 rlock(&pg->ns)
11921785 rlock(&mh->lock);
11921786 wlock(&pg->ns) wlock(&m->lock);
rlock(&mh->lock) [walk]
the locks for pg->ns mh->lock and m->lock aren't nested.
it would seem if they were, we would get a winner and all three
operations would complete (or fail). so, why can't the locks
be nested? and why does this have to deadlock? what am i missing?
- erik
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-09-17 23:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-17 3:02 [9fans] interesting deadlock erik quanstrom
2010-09-17 21:11 ` Russ Cox
2010-09-17 22:12 ` erik quanstrom
2010-09-17 23:21 ` Russ Cox
2010-09-17 23:57 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).