From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14377 invoked from network); 22 May 2007 17:29:46 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.0 (2007-05-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=no version=3.2.0 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 22 May 2007 17:29:46 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 31303 invoked from network); 22 May 2007 17:29:40 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 22 May 2007 17:29:40 -0000 Received: (qmail 22399 invoked by alias); 22 May 2007 17:29:37 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 23459 Received: (qmail 22390 invoked from network); 22 May 2007 17:29:37 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 22 May 2007 17:29:37 -0000 Received: (qmail 31083 invoked from network); 22 May 2007 17:29:37 -0000 Received: from cluster-d.mailcontrol.com (217.69.20.190) by a.mx.sunsite.dk with SMTP; 22 May 2007 17:29:30 -0000 Received: from cameurexb01.EUROPE.ROOT.PRI ([62.189.241.200]) by rly10d.srv.mailcontrol.com (MailControl) with ESMTP id l4MHTQ3G007928 for ; Tue, 22 May 2007 18:29:26 +0100 Received: from news01.csr.com ([10.103.143.38]) by cameurexb01.EUROPE.ROOT.PRI with Microsoft SMTPSVC(6.0.3790.1830); Tue, 22 May 2007 18:29:25 +0100 Date: Tue, 22 May 2007 18:29:25 +0100 From: Peter Stephenson To: Zsh-Workers Subject: Re: Subshell with multios causes hang Message-ID: <20070522182925.4c43a67e@news01.csr.com> In-Reply-To: <1179832903.3015.505.camel@aston.uk.cyberscience.com> References: <1179832903.3015.505.camel@aston.uk.cyberscience.com> Organization: CSR X-Mailer: Claws Mail 2.9.1 (GTK+ 2.10.8; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 22 May 2007 17:29:25.0857 (UTC) FILETIME=[BE820D10:01C79C96] X-Scanned-By: MailControl A-07-07-05 (www.mailcontrol.com) on 10.68.0.120 On Tue, 22 May 2007 12:21:43 +0100 John Buddery wrote: > Hi, since upgrading from 2.4.5 to 2.4.6 I find that one of my > functions which uses a multios redirect on a subshell list is > hanging. I tried 4.3.4 as well with no luck. > > Essentially I run the equivalent of: > > ( echo hello ) >| /tmp/out >| /tmp/out2 > > and in an interactive shell (or any with job control) this hangs. >... > All of the following fixes solve this problem, but I don't know what > else they break: >.. > Setting thisjob = -1 in clearjobtab(), since there is no current > job, and making addproc() ignore the addition of aux processes if > thisjob == -1. This also seems wrong, as we are completely loosing the > pid information for the multios, so for example we can't kill it. > > Setting thisjob = 1 in clearjobtab (if it was >= 0), and setting > jobtab[thisjob].stat = STAT_INUSE after clearing jobtab. This is what > I ended up with, but is it a valid thing to do ? >... Thanks for the detailed analysis, which will have saved me hours. There's clearly something of a design flaw here: we're using (an effect of) job control when no job control is present. However, the shell does use the so-called job table for this purpose (managing processes even if they're not strictly associated with a job), so we have to live with it. In that spirit what I'd *like* to suggest is something close to what you came up with: set thisjob to -1 in clearjobtab() (it's sure as heck invalid), and then when we need a job table entry in closemn(), detect that thisjob is -1 and initialise a new job. Problem 1: this happens before execpline() runs in the subshell, which grabs a different job table entry. The one generated by closemn() is forgotten. We can fix this by setting a temporary job number saying "use me! use me!". This isn't very nice but doesn't involve redesigning the shell from scratch. Problem 2: this is where it gets really nasty to the extent that I'm worried I must be missing something basic about multios. We now do the "echo" in the subshell, and on return to execpline() wait for the auxiliary process handling the multios to exit. But it's never going to! It's waiting for end-of-file on the data it's reading from the subshell that's waiting for it. Because we attached the multios process after the fork, we have deadlock. Wossgoingon? How do multios ever work? Is there some call to close the shell fd's (giving the EOF the aux proc is waiting for) that hasn't quite been handled at that point, but usually has? Possible clue: last1 is 1 in this version of execpline(), indicating we're about to leave the shell. The auxprocs are the only reason we can't. So there must be some solution... I'll carry on looking at this when I get a chance, but for now I'm confused enough to go to the beer festival. -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview