9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] interesting deadlock
@ 2010-09-17  3:02 erik quanstrom
  2010-09-17 21:11 ` Russ Cox
  0 siblings, 1 reply; 5+ messages in thread
From: erik quanstrom @ 2010-09-17  3:02 UTC (permalink / raw)
  To: 9fans

i have these processes all deadlocked.  8.out
is serving /n/mntpt.

xxx        11921346    0:00   0:00      436K Create   8.out
xxx        11921785    0:00   0:00       24K Open     cat /n/mntpt/sos
xxx        11921786    0:00   0:00       24K Unmount  unmount /n/mntpt
xxx        11921787    0:00   0:00       44K Pwrite   echo x y

minooka# acid -l/tmp/acid -k 11921346 /386/9pccpu
/386/9pccpu:386 plan 9 boot image
/sys/lib/acid/port
/sys/lib/acid/386
acid: stk()
sched()+0x140 /sys/src/9/port/proc.c:156
rlock(q=0xf6400490)+0x9d /sys/src/9/port/qlock.c:113
findmount(qid=0x407325,type=0x7,dev=0x252c,mp=0xf6b27998,cp=0xf6b279c8)+0x25 /sys/src/9/port/chan.c:845
domount(cp=0xf6b279c8,mp=0xf6b27998,path=0xf6b279c0)+0x42 /sys/src/9/port/chan.c:883
namec(aname=0xf6cdb870,amode=0x5,omode=0x11,perm=0x1b6)+0x5a4 /sys/src/9/port/chan.c:1457
syscreate(arg=0xf050cf48)+0x62 /sys/src/9/port/sysfile.c:1126
syscall(ureg=0xf6b27a54)+0x1c6 /sys/src/9/pc/trap.c:714
_syscallintr()+0x18 /sys/src/9/pc/plan9l.s:45
0xf6b27a54 ?file?:0
acid:
echo kill > /proc/11921346/ctl
minooka# acid -l/tmp/acid -k 11921785 /386/9pccpu
/386/9pccpu:386 plan 9 boot image
/sys/lib/acid/port
/sys/lib/acid/386
acid: stk()
sleep(r=0xf742276c,arg=0xf7422540,f=0xf011af60)+0x1ab /sys/src/9/port/proc.c:785
mountio(m=0xf6ccbc10,r=0xf7422540)+0x292 /sys/src/9/port/devmnt.c:808
mountrpc(r=0xf7422540,m=0xf6ccbc10)+0x2e /sys/src/9/port/devmnt.c:745
mntwalk(nc=0xf6a374d0,nname=0x1,c=0xf6c6d140,name=0xf6694898)+0x1b5 /sys/src/9/port/devmnt.c:426
ewalk(c=0xf6c6d140,nc=0x0,name=0xf6694898,nname=0x1)+0x79 /sys/src/9/port/chan.c:937
walk(cp=0xf723c2a8,nnames=0x3,nerror=0xf723c298,names=0xf6694890,nomount=0x0)+0x5d8 /sys/src/9/port/chan.c:1017
namec(aname=0xf741b680,amode=0x3,omode=0x0,perm=0x0)+0x325 /sys/src/9/port/chan.c:1420
sysopen(arg=0xf04eea08)+0x5e /sys/src/9/port/sysfile.c:273
syscall(ureg=0xf723c334)+0x1c6 /sys/src/9/pc/trap.c:714
_syscallintr()+0x18 /sys/src/9/pc/plan9l.s:45
0xf723c334 ?file?:0
acid:
echo kill > /proc/11921785/ctl
minooka# acid -l/tmp/acid -k 11921786 /386/9pccpu
/386/9pccpu:386 plan 9 boot image
/sys/lib/acid/port
/sys/lib/acid/386
acid: stk()
sched()+0x140 /sys/src/9/port/proc.c:156
wlock(q=0xf66921c0)+0xad /sys/src/9/port/qlock.c:168
cunmount(mnt=0xf756a140,mounted=0x0)+0xd4 /sys/src/9/port/chan.c:781
sysunmount(arg=0xf0543ecc)+0x11d /sys/src/9/port/sysfile.c:1111
syscall(ureg=0xf7421934)+0x1c6 /sys/src/9/pc/trap.c:714
_syscallintr()+0x18 /sys/src/9/pc/plan9l.s:45
0xf7421934 ?file?:0


- erik



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] interesting deadlock
  2010-09-17  3:02 [9fans] interesting deadlock erik quanstrom
@ 2010-09-17 21:11 ` Russ Cox
  2010-09-17 22:12   ` erik quanstrom
  2010-09-17 23:57   ` erik quanstrom
  0 siblings, 2 replies; 5+ messages in thread
From: Russ Cox @ 2010-09-17 21:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Sep 16, 2010 at 11:02 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> i have these processes all deadlocked.  8.out
> is serving /n/mntpt.
>
> xxx        11921346    0:00   0:00      436K Create   8.out
> xxx        11921785    0:00   0:00       24K Open     cat /n/mntpt/sos
> xxx        11921786    0:00   0:00       24K Unmount  unmount /n/mntpt
> xxx        11921787    0:00   0:00       44K Pwrite   echo x y

tell us why it's interesting.  it looks like 8.out can see
itself so you've got a bad loop.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] interesting deadlock
  2010-09-17 21:11 ` Russ Cox
@ 2010-09-17 22:12   ` erik quanstrom
  2010-09-17 23:21     ` Russ Cox
  2010-09-17 23:57   ` erik quanstrom
  1 sibling, 1 reply; 5+ messages in thread
From: erik quanstrom @ 2010-09-17 22:12 UTC (permalink / raw)
  To: 9fans

On Thu, Sep 16, 2010 at 11:02 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> i have these processes all deadlocked.  8.out
> is serving /n/mntpt.
>
> xxx        11921346    0:00   0:00      436K Create   8.out
> xxx        11921785    0:00   0:00       24K Open     cat /n/mntpt/sos
> xxx        11921786    0:00   0:00       24K Unmount  unmount /n/mntpt
> xxx        11921787    0:00   0:00       44K Pwrite   echo x y

okay, it's not.  sorry for the confusion.  the reason
for looking at that case was that it seemed at the
time related to a crash here on the indirection
of mh->mount, which is nil.  the code in question
was a fileserver which was crashing, getting killed
while concurrent io was being done to the fs.

/sys/src/9/port/chan.c:1009,1019
		if((wq = ewalk(c, nil, names+nhave, ntry)) == nil){
			/* try a union mount, if any */
			if(mh && !nomount){
				/*
				 * mh->mount->to == c, so start at mh->mount->next
				 */
				rlock(&mh->lock);
				for(f = mh->mount->next; f; f = f->next)
					if((wq = ewalk(f->to, nil, names+nhave, ntry)) != nil)
						break;

i don't know why it can't be nil, since the code here
doesn't have a lock.  i think this might be the solution,
but i haven't done a careful lock audit to be sure

/usr/quanstro/src/ysk/port/chan.c:1009,1021
		if((wq = ewalk(c, nil, names+nhave, ntry)) == nil){
			/* try a union mount, if any */
			if(mh && !nomount){
				/*
				 * mh->mount->to == c, so start at mh->mount->next
				 */
	>>			f = nil;
				rlock(&mh->lock);
	>>			if(mh->mount)
				for(f = mh->mount->next; f; f = f->next)
					if((wq = ewalk(f->to, nil, names+nhave, ntry)) != nil)
						break;
				runlock(&mh->lock);

- erik



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] interesting deadlock
  2010-09-17 22:12   ` erik quanstrom
@ 2010-09-17 23:21     ` Russ Cox
  0 siblings, 0 replies; 5+ messages in thread
From: Russ Cox @ 2010-09-17 23:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> of mh->mount, which is nil.  the code in question
> was a fileserver which was crashing, getting killed
> while concurrent io was being done to the fs.

i believe that.  the code is assuming that because
it found mh in the mount table, mh->mount != nil.
that's only true until it releases the rlock, which it has.
if an unmount happens between the runlock and the
rlock, then mh->mount will be nil here.  it means
that nothing is mounted there anymore.  your fix
looks reasonable.

russ


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] interesting deadlock
  2010-09-17 21:11 ` Russ Cox
  2010-09-17 22:12   ` erik quanstrom
@ 2010-09-17 23:57   ` erik quanstrom
  1 sibling, 0 replies; 5+ messages in thread
From: erik quanstrom @ 2010-09-17 23:57 UTC (permalink / raw)
  To: 9fans

On Fri Sep 17 17:13:40 EDT 2010, rsc@swtch.com wrote:
> On Thu, Sep 16, 2010 at 11:02 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> > i have these processes all deadlocked.  8.out
> > is serving /n/mntpt.
> >
> > xxx        11921346    0:00   0:00      436K Create   8.out
> > xxx        11921785    0:00   0:00       24K Open     cat /n/mntpt/sos
> > xxx        11921786    0:00   0:00       24K Unmount  unmount /n/mntpt
> > xxx        11921787    0:00   0:00       44K Pwrite   echo x y
>
> tell us why it's interesting.  it looks like 8.out can see
> itself so you've got a bad loop.

on second thought, i'm not sure i fully understand why
not forking the namespace should result in a deadlock.

8.out is opening /dev/sdC0/part OTRUNC
cat is opening /n/mntpt/sos
unmount is opening /n/mntpt

so 8.out isn't waiting for itself.
it seems that any one of the three could resonably
finish, but they deadlock.  but the problem is

		have				want
11921346					rlock(&pg->ns)
11921785	rlock(&mh->lock);
11921786	wlock(&pg->ns)			wlock(&m->lock);
		rlock(&mh->lock) [walk]

the locks for pg->ns mh->lock and m->lock aren't nested.
it would seem if they were, we would get a winner and all three
operations would complete (or fail).  so, why can't the locks
be nested?  and why does this have to deadlock?  what am i missing?

- erik



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-09-17 23:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-17  3:02 [9fans] interesting deadlock erik quanstrom
2010-09-17 21:11 ` Russ Cox
2010-09-17 22:12   ` erik quanstrom
2010-09-17 23:21     ` Russ Cox
2010-09-17 23:57   ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).