From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Mon, 15 Nov 2010 23:40:46 -0500
To: 9fans@9fans.net
Message-ID: <51e87437b774890c36956be747be653c@brasstown.quanstro.net>
In-Reply-To: <be1826728e1b062dc248ad9ccd1f17d5@proxima.alt.za>
References: <be1826728e1b062dc248ad9ccd1f17d5@proxima.alt.za>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] That deadlock, again
Topicbox-Message-UUID: 8158784c-ead6-11e9-9d60-3106f5b1d025

On Mon Nov 15 23:23:12 EST 2010, lucio@proxima.alt.za wrote:
> Regarding the "deadlock" report that I occasionally see on my CPU
> server console, I won't bore anyone with PC addresses or anything like
> that, but I will recommend something I believe to be a possible
> trigger: the failure always seems to occur within "exportfs", which in
> this case is used exclusively to run stats(1) remotely from my
> workstation.  So the recommendation is that somebody like Erik, who is
> infinitely more clued up than I am in the kernel arcana should run one
> or more stats sessions into a cpu server (I happen to be running
> fossil, so maybe Erik won't see this) and see if he can also trigger this behaviour.  I'm hoping that it is not platform specific.
>
> Right now, I'm short of skills as well as a serial console :-(

i run stats all the time.  i've never seen a lock loop caused by stats.

exportfs gets blamed all the time for the sins of others.  possible
culprits are the tcp/ip stack and the kernel devices that stats accesses
and of course, the channel code itself.

it would be a good idea for you to track down all the pcs involved
and send them along.  i can't think of another way of narrowing down
the list of potential suspects.  not all of our usual suspects has an
alibi.

i assume you've fixed this?  (not yet fixed on sources.)

/n/sources/plan9//sys/src/9/port/chan.c:1012,1018 - chan.c:1012,1020
  				/*
  				 * mh->mount->to == c, so start at mh->mount->next
  				 */
+ 				f = nil;
  				rlock(&mh->lock);
+ 				if(mh->mount)
  				for(f = mh->mount->next; f; f = f->next)
  					if((wq = ewalk(f->to, nil, names+nhave, ntry)) != nil)
  						break;

- erik