From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24568 invoked by alias); 12 Aug 2015 09:44:04 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36124 Received: (qmail 23108 invoked from network); 12 Aug 2015 09:44:01 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS autolearn=ham autolearn_force=no version=3.4.0 X-AuditID: cbfec7f5-f794b6d000001495-2f-55cb155a5aaa Date: Wed, 12 Aug 2015 10:43:51 +0100 From: Peter Stephenson To: zsh-workers@zsh.org Subject: Re: 5.0.8 regression when waiting for suspended jobs Message-id: <20150812104351.65a4cbea@pwslap01u.europe.root.pri> In-reply-to: <150811165655.ZM31504@torch.brasslantern.com> References: <87wpxhk970.fsf@gmail.com> <150730123904.ZM11774@torch.brasslantern.com> <87si84k9uf.fsf@gmail.com> <150731085638.ZM15733@torch.brasslantern.com> <150811165655.ZM31504@torch.brasslantern.com> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMLMWRmVeSWpSXmKPExsVy+t/xy7pRoqdDDe7t4bI42PyQyYHRY9XB D0wBjFFcNimpOZllqUX6dglcGT9+vWYpmCRa8f6tbAPjA4EuRk4OCQETiR8tz1ggbDGJC/fW s3UxcnEICSxllGh5uI8JwpnBJDFx41FGCGcbo8T17o/sXYwcHCwCqhLXV7qBdLMJGEpM3TSb EcQWERCXOLv2PNhUYQFbiQN//oLFeQXsJd62/AOzOQWsJKZ+n80CMfMio8SpS++YQRL8AvoS V/9+YoI4yV5i5pUzUM2CEj8m3wMbyiygJbF5WxMrhC0vsXnNW7BeIQF1iRt3d7NPYBSahaRl FpKWWUhaFjAyr2IUTS1NLihOSs810itOzC0uzUvXS87P3cQICdqvOxiXHrM6xCjAwajEw3uj 71SoEGtiWXFl7iFGCQ5mJRHenvtAId6UxMqq1KL8+KLSnNTiQ4zSHCxK4rwzd70PERJITyxJ zU5NLUgtgskycXBKNTBWhU19wHn8wfJ6Dqud75y2a6sL6pxp5uTPfLnx5ZRjP1j3drApL/Ce +Tr8Z9rn6R8T5dk/nHAKD2LRySm3k9/MHrX32aQA5cg1Xu/b4yuKVoRqPPyff7KE2+O9g1OL 2csjHJUTDvyoESlg2N759lBiSsyHtoeTm5fqXrMMK/wYmRm6bedrjXlKLMUZiYZazEXFiQAN NvvgVgIAAA== On Tue, 11 Aug 2015 16:56:55 -0700 Bart Schaefer wrote: > On Jul 31, 8:56am, Bart Schaefer wrote: > I still only suspect what changed to make 5.0.8 different from 5.0.7 in > this regard, but here's what's going on: > - "wait $!" - > } zsh-5.0.7 > } - "wait $!" blocks (looping on repeated wait3() nonzero) > } zsh-5.0.8 > } - "wait $!" loops but also printing status every time > > bin_fg() calls waitforpid() which discovers the job is stopped and goes > into a loop calling kill(pid, SIGCONT) to try to get the job to run > again. In the 5.0.8 case, each time this happens the job briefly wakes > up, gets stopped with SIGTTIN, thus causes another SIGCHLD to go to the > parent zsh, which then prints the "suspended" message and loops right > back to kill(pid, SIGCONT) again. > > All of this is exactly the same as in 5.0.7 except that because of the > SIGCONT change in workers/35032 we notice the stopped -> continued -> > stopped again status change and therefore print the new status even > though it's actually the same as the last time we printed the status, > because we skipped printing the "continued" status. Or so I surmise. So you might have thought the right thing to do was note it had been stopped immediately, possibly warn the user, and not try to continue it again without further user action? Is that easy? Can we pin down "immediately" well enough? Clearly there's a race in the real world where the programme could get SIGTTIN at any time, but in the general case (i.e. where a background process got SIGTTIN when the foreground was doing something irrelevant) you clearly *don't* want it to continue every time. In that case the difference between 5.0.7 and 5.0.8 becomes basically moot (it's different but in a sane fashion). Do we even understand what the loop with SIGCONT is doing for us? Under what circumstances would this help? Some (other sort of) race where something else (what? Not zsh and not the process that's suspended) takes a while to get going, so the SIGCONT only succeeds after a few attempts? > - wait %1" - > > bin_fg() calls zwaitjob() which does NOT do kill(pid, SIGCONT) instead > simply blocking forever waiting for a SIGCHLD that will never arrive. Hmm... I can't think of a good reason from the user point of view why this should behave differently. It just seems confusing. It's certainly not documented as a zsh feature, is it? > - "wait" - > > bin_fg() goes into a loop calling zwaitjob() on every entry in the job > table; i.e., identical to "wait %1" repeated for every job number. In which case I think the same reaction arises. pws