From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4ee4b7c2dd207e953599fc618df2c456@gmx.de> To: 9fans@9fans.net Date: Wed, 17 Nov 2010 06:22:33 +0100 From: cinap_lenrek@gmx.de In-Reply-To: <1e1e3d7c4781c86aa3a270cecdbaadbb@coraid.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="upas-awrwvgyihzejfyyyaiwyyxpgib" Subject: Re: [9fans] That deadlock, again Topicbox-Message-UUID: 83be1f38-ead6-11e9-9d60-3106f5b1d025 This is a multi-part message in MIME format. --upas-awrwvgyihzejfyyyaiwyyxpgib Content-Disposition: inline Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit qpc is the just the caller of the last successfull *acquired* qlock. what we know is that the exportfs proc spins in the q->use taslock called by qlock() right? this already seems wired... q->use is held just long enougth to test q->locked and manipulate the queue. also sched() will avoid switching to another proc while we are holding tas locks. i would like to know which qlock is the kernel is trying to acquire on behalf of exportfs that is also reachable from the etherread4 code. one could move: up->qpc = getcallerpc(&q); from qlock() before the lock(&q->use); so we can see from where that qlock gets called that hangs the exportfs call, or add another magic debug pointer (qpctry) to the proc stucture and print it in dumpaproc(). -- cinap --upas-awrwvgyihzejfyyyaiwyyxpgib Content-Type: message/rfc822 Content-Disposition: inline Return-Path: <9fans-bounces+cinap_lenrek=gmx.de@9fans.net> Delivered-To: GMX delivery to cinap_lenrek@gmx.de Received: (qmail invoked by alias); 17 Nov 2010 04:23:38 -0000 Received: from gouda.swtch.com (EHLO gouda.swtch.com) [67.207.142.3] by mx0.gmx.net (mx014) with SMTP; 17 Nov 2010 05:23:38 +0100 Received: from localhost ([127.0.0.1] helo=gouda.swtch.com) by gouda.swtch.com with esmtp (Exim 4.69) (envelope-from <9fans-bounces@9fans.net>) id 1PIZUA-00062Y-6Y; Wed, 17 Nov 2010 04:19:06 +0000 Received: from ns1.co-raid.com ([12.51.113.4] helo=coraid.com ident=none) by gouda.swtch.com with esmtp (Exim 4.69) (envelope-from ) id 1PIZU7-00062S-2w for 9fans@9fans.net; Wed, 17 Nov 2010 04:19:03 +0000 From: erik quanstrom Date: Tue, 16 Nov 2010 23:18:09 -0500 To: lucio@proxima.alt.za, 9fans@9fans.net Message-ID: <1e1e3d7c4781c86aa3a270cecdbaadbb@coraid.com> In-Reply-To: <10e606b8715d8e2c9fda5768466036ca@proxima.alt.za> References: <10e606b8715d8e2c9fda5768466036ca@proxima.alt.za> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] That deadlock, again X-BeenThere: 9fans@9fans.net X-Mailman-Version: 2.1.10 Precedence: list Reply-To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> List-Id: Fans of the OS Plan 9 from Bell Labs <9fans.9fans.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: 9fans-bounces@9fans.net Errors-To: 9fans-bounces+cinap_lenrek=gmx.de@9fans.net X-GMX-Antivirus: 0 (no virus found) X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=5D7Q89H36p6i75npGen84eVAEFK/syJmiNoEBJhgjYKpglu1TZLLw7xMZnJMXwBFy+Sxe D/AUQGQOurK3ezVJqUBFH0uN5pjmWoMfpyHp50EZ60/Y6hM43eiKLTaE/W0dI7nIn8+pr4SzneyH Jeytg==V1; > > acid: src(0xf0148c8a) > > /sys/src/9/ip/tcp.c:2096 > > 2091 if(waserror()){ > > 2092 qunlock(s); > > 2093 nexterror(); > > 2094 } > > 2095 qlock(s); > >>2096 qunlock(tcp); > > 2097 > > 2098 /* fix up window */ > > 2099 seg.wnd <<= tcb->rcv.scale; > > 2100 > > 2101 /* every input packet in puts off the keep alive time out */ > > The source actually says (to be pedantic): > > /* The rest of the input state machine is run with the control block > * locked and implements the state machine directly out of the RFC. > * Out-of-band data is ignored - it was always a bad idea. > */ > tcb = (Tcpctl*)s->ptcl; > if(waserror()){ > qunlock(s); > nexterror(); > } > qlock(s); > qunlock(tcp); > > Now, the qunlock(s) should not precede the qlock(s), this is the first > case in this procedure: it doesn't. waserror() can't be executed before the code following it. perhpas it could be more carefully written as > > 2095 qlock(s); > > 2091 if(waserror()){ > > 2092 qunlock(s); > > 2093 nexterror(); > > 2094 } > >>2096 qunlock(tcp); but it really wouldn't make any difference. i'm not completely convinced that tcp's to blame. and if it is, i think the problem is probablly tcp timers. - erik --upas-awrwvgyihzejfyyyaiwyyxpgib--