From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18493 invoked by alias); 16 Sep 2016 12:02:50 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 39359 Received: (qmail 17365 invoked from network); 16 Sep 2016 12:02:50 -0000 X-Qmail-Scanner-Diagnostics: from mailout3.w1.samsung.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(210.118.77.13):SA:0(-2.2/5.0):. Processed in 0.631168 secs); 16 Sep 2016 12:02:50 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=RP_MATCHES_RCVD autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: p.stephenson@samsung.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: none (ns1.primenet.com.au: domain at samsung.com does not designate permitted sender hosts) X-AuditID: cbfec7f2-f79556d000002c42-b9-57dbdf6149f7 Date: Fri, 16 Sep 2016 13:02:37 +0100 From: Peter Stephenson To: zsh-workers@zsh.org Subject: Re: Bug related to stdin/always/jobcontrol Message-id: <20160916130237.04f1ef72@pwslap01u.europe.root.pri> In-reply-to: <160915180855.ZM3928@torch.brasslantern.com> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrDIsWRmVeSWpSXmKPExsWy7djPc7qJ92+HG7ztMrU42PyQyYHRY9XB D0wBjFFcNimpOZllqUX6dglcGf2N81kLVhpW3P+wibmB8a1yFyMnh4SAicSnWzPYIGwxiQv3 1gPZXBxCAksZJa5NX8kM4fQySdw6vp4ZpuPU5GuMEIlljBILzn9lgXCmMUn8PtgC5ZxhlLhw 9ywrSIuQwFlGifvLokBsFgFVidt/l4DF2QQMJaZums0IYosIiEucXXueBcQWFjCW+Hz3DZjN K2AvcfpxB1ANBwengKXEgV/iIGF+AX2Jq38/MUFcZC8x88oZRohyQYkfk++BtTIL6Ehs2/aY HcKWl9i85i3YOxICzewSsy+fYAWZKSEgK7HpANRnLhJd+6ZC2cISr45vYYewZSQuT+5mgbD7 GSWedPtCzJnBKHH6zA5o4FlL9N2+yAixjE9i0rbpzBDzeSU62oQgSjwknr/qYIWwHSWufuln msCoOAvJ2bOQnD0LydkLGJlXMYqklhbnpqcWG+sVJ+YWl+al6yXn525iBKaB0/+Of9rB+PWE 1SFGAQ5GJR7eFXNvhQuxJpYVV+YeYpTgYFYS4eW8eztciDclsbIqtSg/vqg0J7X4EKM0B4uS OO+eBVfChQTSE0tSs1NTC1KLYLJMHJxSDYzJq7yuq8tP5Xl422YJu5ul1VNr+YYdsUuWOVv8 ynq/57Dd5EPajoc3f55ptSbg26L1eoYJVXK8DkePBUoeq5p8eRW3lpfvu4eHl1ZEPbDVfqP0 pvndUTbNoteH3U4Fd5gc5PpTeT1W585Bg0fzOPf/n9Smd+N+bPN8wfMdLFG8W7YLvN8jqvRO iaU4I9FQi7moOBEAXny7tP8CAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrKIsWRmVeSWpSXmKPExsVy+t/xq7qG92+HGyyaZmNxsPkhkwOjx6qD H5gCGKPcbDJSE1NSixRS85LzUzLz0m2VQkPcdC2UFPISc1NtlSJ0fUOClBTKEnNKgTwjAzTg 4BzgHqykb5fgltHfOJ+1YKVhxf0Pm5gbGN8qdzFyckgImEicmnyNEcIWk7hwbz1bFyMXh5DA EkaJRcenQzkzmCQ+z+9ignDOMUrMWtHNDNIiJHCWUeLGIT4Qm0VAVeL23yWsIDabgKHE1E2z wcaKCIhLnF17ngXEFhYwlvh89w2YzStgL3H6cQdQDQcHp4ClxIFf4hDzZ7JITN52hAmkhl9A X+Lq309MEOfZS8y8coYRoldQ4sfke2BzmAW0JDZva2KFsOUlNq95C3WbusSNu7vZJzAKz0LS MgtJyywkLQsYmVcxiqSWFuem5xYb6RUn5haX5qXrJefnbmIERtG2Yz+37GDsehd8iFGAg1GJ h3fF3FvhQqyJZcWVuYcYJTiYlUR4Oe/eDhfiTUmsrEotyo8vKs1JLT7EaAoMmInMUqLJ+cAI zyuJNzQxNLc0NDK2sDA3MlIS55364Uq4kEB6YklqdmpqQWoRTB8TB6dUA+PyPGn9S2bbFNaa njK6+Y3zXFLruT+im31vd4rUHLGKuGt/mbv8nF/N5HrTN6lz305S//q8c8YGmemSyvKLL2pf ml0tdXX2Lv85C3bX6h59qTs5rnh+1PyTJ5lnzmF/vWVFqQnX9X/zpi8yt38n+r752Gtu0d3s wRdy4uMOGCcti7/4mFVWy0JUiaU4I9FQi7moOBEABKlSYLgCAAA= X-MTR: 20000000000000000@CPGS X-CMS-MailID: 20160916120241eucas1p299e882c57269e5ea2e8bdc46589643cd X-Msg-Generator: CA X-Sender-IP: 182.198.249.180 X-Local-Sender: =?UTF-8?B?UGV0ZXIgU3RlcGhlbnNvbhtTQ1NDLURhdGEgUGxhbmUb?= =?UTF-8?B?7IK87ISx7KCE7J6QG1ByaW5jaXBhbCBFbmdpbmVlciwgU29mdHdhcmU=?= X-Global-Sender: =?UTF-8?B?UGV0ZXIgU3RlcGhlbnNvbhtTQ1NDLURhdGEgUGxhbmUbU2Ft?= =?UTF-8?B?c3VuZyBFbGVjdHJvbmljcxtQcmluY2lwYWwgRW5naW5lZXIsIFNvZnR3YXJl?= X-Sender-Code: =?UTF-8?B?QzEwG0VIURtDMTBDRDA1Q0QwNTAwNTg=?= CMS-TYPE: 201P X-HopCount: 7 X-CMS-RootMailID: 20160914173110eucas1p1f0567e319ae439b975b19f4e55676fed X-RootMTR: 20160914173110eucas1p1f0567e319ae439b975b19f4e55676fed References: <87r392jgd0.fsf@juno.home.vuxu.org> <20160905164207.4630643b@pwslap01u.europe.root.pri> <20160914183105.69862fa9@pwslap01u.europe.root.pri> <20160914223553.3173c8ca@ntlworld.com> <160914202450.ZM32479@torch.brasslantern.com> <20160915092721.3d669bc9@pwslap01u.europe.root.pri> <20160915113315.1655019c@pwslap01u.europe.root.pri> <20160915184040.5fc1b9d0@pwslap01u.europe.root.pri> <160915180855.ZM3928@torch.brasslantern.com> On Thu, 15 Sep 2016 18:08:55 -0700 Bart Schaefer wrote: > On Sep 15, 6:40pm, Peter Stephenson wrote: > } Subject: Re: Bug related to stdin/always/jobcontrol > } > } The race is related to the fact that when the new SUPERJOB takes over > } it's actually in the same process group as the SUBJOB it's taking over, > } which is different from it's own PID, list_pipe_pid [...] > } > } However, fixing that up doesn't help --- apparently attaching it to the > } TTY after the SUBJOB is finished works OK, sending it SIGCONT seems to > } be OK, but then it always gets SIGSTOP > > Is it definitely getting SIGSTOP and not SIGTSTP ? > > The operating system will never[*] automatically send a true STOP signal > because it's un-catchable. If a STOP is being received, it's probably > being sent from the "for (; !nowait;)" loop in execpline(), either at > line 1674 [ killpg(jobtab[list_pipe_job].gleader, SIGSTOP); ] or at > line 1688 [ kill(getpid(), SIGSTOP); ]. I've found the double SIGSTOP --- it was getting *both* of the above. In the case in question, the newly forked child zsh was picking up the pgrp of the vim process, so when we sent SIGSTOP to that pgrp it got stopped, too. When it got restarted, it immediately stopped itself. There's some code in the child to decide on the pgrp. It looks like it needs to have its own pgrp if the old superjob has already gone, the basis of the problems we are seeing here. I'm not sure if this leaves some more races but, again, it looks like it's an improvement. The rest of the patch is cosmetic. The bad news is there are other problems with this logic that have crept in since 5.2 --- this appears to be unrelated, or at least not made any worse by this change, so I'll start a separate thread. pws diff --git a/Src/exec.c b/Src/exec.c index 21270c8..0755d55 100644 --- a/Src/exec.c +++ b/Src/exec.c @@ -1652,7 +1652,13 @@ execpline(Estate state, wordcode slcode, int how, int last1) int synch[2]; struct timeval bgtime; + /* + * A pipeline with the shell handling the right + * hand side was stopped. We'll fork to allow + * it to continue. + */ if (pipe(synch) < 0 || (pid = zfork(&bgtime)) == -1) { + /* Failure */ if (pid < 0) { close(synch[0]); close(synch[1]); @@ -1666,6 +1672,18 @@ execpline(Estate state, wordcode slcode, int how, int last1) thisjob = newjob; } else if (pid) { + /* + * Parent: job control is here. If the job + * started for the RHS of the pipeline is still + * around, then its a SUBJOB and the job for + * earlier parts of the pipeeline is its SUPERJOB. + * The newly forked shell isn't recorded as a + * separate job here, just as list_pipe_pid. + * If the superjob exits (it may already have + * done so, see child branch below), we'll use + * list_pipe_pid to form the basis of a + * replacement job --- see SUBLEADER code above. + */ char dummy; lpforked = @@ -1692,9 +1710,33 @@ execpline(Estate state, wordcode slcode, int how, int last1) break; } else { + Job pjn = jobtab + list_pipe_job; close(synch[0]); entersubsh(ESUB_ASYNC); - if (jobtab[list_pipe_job].procs) { + /* + * At this point, we used to attach this process + * to the process group of list_pipe_job (the + * new superjob) any time that was still available. + * That caused problems when the fork happened + * early enough that the subjob is in that + * process group, since then we will stop this + * job when we signal the subjob, and we have + * no way here to know that we shouldn't also + * send STOP to itself, resulting in two stops. + * So don't do that if the original + * list_pipe_job has exited. + * + * The choice here needs to match the assumption + * made when handling SUBLEADER above that the + * process group is our own PID. I'm not sure + * if there's a potential race for that. But + * setting up a new process group if the + * superjob is still functioning seems the wrong + * thing to do. + */ + if (pjn->procs && + (pjn->stat & STAT_INUSE) && + !(pjn->stat & STAT_DONE)) { if (setpgrp(0L, mypgrp = jobtab[list_pipe_job].gleader) == -1) { setpgrp(0L, mypgrp = getpid()); diff --git a/Src/jobs.c b/Src/jobs.c index 9284c71..d1b98ac 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -232,11 +232,12 @@ super_job(int sub) static int handle_sub(int job, int fg) { + /* job: superjob; sj: subjob. */ Job jn = jobtab + job, sj = jobtab + jn->other; if ((sj->stat & STAT_DONE) || (!sj->procs && !sj->auxprocs)) { struct process *p; - + for (p = sj->procs; p; p = p->next) { if (WIFSIGNALED(p->status)) { if (jn->gleader != mypgrp && jn->procs->next) diff --git a/Src/signals.c b/Src/signals.c index 2eefc07..30dde71 100644 --- a/Src/signals.c +++ b/Src/signals.c @@ -723,7 +723,7 @@ killjb(Job jn, int sig) { Process pn; int err = 0; - + if (jobbing) { if (jn->stat & STAT_SUPERJOB) { if (sig == SIGCONT) { @@ -731,11 +731,21 @@ killjb(Job jn, int sig) if (killpg(pn->pid, sig) == -1) if (kill(pn->pid, sig) == -1 && errno != ESRCH) err = -1; - + + /* + * Note this does not kill the last process, + * which is assumed to be the one controlling the + * subjob, i.e. the forked zsh that was originally + * list_pipe_pid... + */ for (pn = jn->procs; pn->next; pn = pn->next) if (kill(pn->pid, sig) == -1 && errno != ESRCH) err = -1; + /* + * ...we only continue that once the external processes + * currently associated with the subjob are finished. + */ if (!jobtab[jn->other].procs && pn) if (kill(pn->pid, sig) == -1 && errno != ESRCH) err = -1; @@ -744,7 +754,7 @@ killjb(Job jn, int sig) } if (killpg(jobtab[jn->other].gleader, sig) == -1 && errno != ESRCH) err = -1; - + if (killpg(jn->gleader, sig) == -1 && errno != ESRCH) err = -1;