From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14029 invoked by alias); 24 Dec 2011 18:23:55 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 30057 Received: (qmail 20010 invoked from network); 24 Dec 2011 18:23:42 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 Received-SPF: none (ns1.primenet.com.au: domain at closedmail.com does not designate permitted sender hosts) From: Bart Schaefer Message-id: <111224102333.ZM18731@torch.brasslantern.com> Date: Sat, 24 Dec 2011 10:23:33 -0800 In-reply-to: <878vm2uw6i.fsf@ft.bewatermyfriend.org> Comments: In reply to Frank Terbeck "Re: $pipestatus broken?" (Dec 24, 10:59am) References: <87borgzkap.fsf@ft.bewatermyfriend.org> <877h24zj69.fsf@ft.bewatermyfriend.org> <111210065833.ZM6198@torch.brasslantern.com> <877h1nwojx.fsf@ft.bewatermyfriend.org> <111223133128.ZM17298@torch.brasslantern.com> <87hb0rueem.fsf@ft.bewatermyfriend.org> <111224013252.ZM22819@torch.brasslantern.com> <878vm2uw6i.fsf@ft.bewatermyfriend.org> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: zsh-workers@zsh.org Subject: Re: $pipestatus broken? MIME-version: 1.0 Content-type: text/plain; charset=us-ascii On Dec 24, 10:59am, Frank Terbeck wrote: } Subject: Re: $pipestatus broken? } } Bart Schaefer wrote: } > } > In the loop case { echo foo | repeat 1; read -E } there is a job table } > entry for the loop which is the group leader, but a new entry is } > created for "read -E". execpline() remembers the previous thisjob as } > the local "pj" and restores thisjob = pj at line 1619, but by that } > time it is too late -- waitjobs() has set thisjob = -1 for just long } > enough for zhandler() to call update_job(), which fails to update the } > pipestats because thisjob = -1 tells it there is no current job. } > } > The following seems to fix it, by telling waitjobs() what the previous } > job number was so it can be reset immediately. There may still be a } > race condition that requires fiddling with signal blocks to make sure } > thisjob is correct at the time the zhandler() catches the signal, but } > if so this should at least allow the block/unblock to be localized. } } Hm. I'm having a hard time following what's going on... A bit more explanation, then; let's use your Test/A04 example: : | while read a; do :; done In execpline(), zsh wants to keep the right side ("while ...") in the current shell. So it creates a job entry jobtab[1] for the pipeline, forks to run ":" on the left side which becomes jobtab[1]->procs, and enters execwhile() for the right side. At this point thisjob = 1. Now it needs to run "read a", so execpline temporarily creates a new entry jobtab[2], saves pj = thisjob, sets thisjob = 2 and enters execbuiltin() [whether it's a builtin isn't important to the bug]. When the builtin completes, execpline restores thisjob = pj to make the loop the current job again. EXCEPT ... at various times including during execbuiltin(), the child forked off to run ":" may exit and hit the parent with SIGCHLD. This invokes zhandler() which reaps the process and calls update_job() to change status in the job table, including $pipestatus. update_job() compares the job that just exited (a process linked to jobtab[1]) with the current foreground process (which thisjob says is jobtab[2]) and concludes that a background job has exited. Therefore it skips the update of $pipestatus and instead resets it as if there were no pipe. When the shell then gets around to waiting for jobtab[1] at the end of the loop, it has lost the reaped left-hand-side and behaves as if there is only one job in the pipeline. What we have is a case where the shell is juggling two "current" jobs (the loop itself, and the command executed from within the loop) and it loses track of one of them at a crucial instant. } With this change, the test I posted in workers-30047 changes a bit. } Before, there were only lines that either looked like "1" or "0 0". Now } I'm getting "0 1", too. Yes, when I actually put your test into my patched sandbox and run "make check" I also get all three results. Obviously waitjobs() is not the only place where the SIGCHLD can sneak in, it's just the only one I was able to catch with the debugger. So that patch is not sufficient (and also probably not necessary if we resolve the race). I don't know exactly where the "0 1" is coming from -- or rather, it must be coming from this in update_job() but I'm not sure why: if ((jn->stat & STAT_CURSH) && i < MAX_PIPESTATS) pipestats[i++] = lastval; In this case I'm *guessing* lastval = 1 because we're catching the signal after thisjob = pj but before the actual wait for jobtab[1], so lastval reflects that "read a" has failed.