Bug related to stdin/always/jobcontrol

zsh-workers
 help / color / mirror / code / Atom feed

* Bug related to stdin/always/jobcontrol
@ 2016-09-02 18:39 Christian Neukirchen
  2016-09-05 15:04 ` Christian Neukirchen
  2016-09-05 15:42 ` Peter Stephenson
  0 siblings, 2 replies; 20+ messages in thread
From: Christian Neukirchen @ 2016-09-02 18:39 UTC (permalink / raw)
  To: zsh-workers

Hi,

Stripped down test case for a mysterious loss of child:

zsh 5.2 (x86_64-unknown-linux-gnu)
zsh-5.2-0-gc86c20a
VIM - Vi IMproved 7.4 (2013 Aug 10, compiled Aug 29 2016 13:06:04)
Included patches: 1-2207

zsh -f
juno% v() { { vim - } always { true } }
juno% ls | v
^Z
zsh: running    v
juno% jobs -p
[1]    4421 running    v
juno% fg
fg: no current job
juno% fg %1
fg: %1: no such job
juno% kill %1
kill: kill %1 failed: no such process
juno% echo ${jobstates}
suspended::4421=running
juno% ^D
Vim: Caught deadly signal HUP
...

Everything works ok when
- always is not used
- stdin is not used

Thanks,
-- 
Christian Neukirchen  <chneukirchen@gmail.com>  http://chneukirchen.org


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-02 18:39 Bug related to stdin/always/jobcontrol Christian Neukirchen
@ 2016-09-05 15:04 ` Christian Neukirchen
  2016-09-05 15:42 ` Peter Stephenson
  1 sibling, 0 replies; 20+ messages in thread
From: Christian Neukirchen @ 2016-09-05 15:04 UTC (permalink / raw)
  To: zsh-workers

Christian Neukirchen <chneukirchen@gmail.com> writes:

> Hi,
>
> Stripped down test case for a mysterious loss of child:
>
> zsh 5.2 (x86_64-unknown-linux-gnu)
> zsh-5.2-0-gc86c20a
> VIM - Vi IMproved 7.4 (2013 Aug 10, compiled Aug 29 2016 13:06:04)
> Included patches: 1-2207
>
> zsh -f
> juno% v() { { vim - } always { true } }
> juno% ls | v
> ^Z
> zsh: running    v
> juno% jobs -p
> [1]    4421 running    v
> juno% fg
> fg: no current job
> juno% fg %1
> fg: %1: no such job
> juno% kill %1
> kill: kill %1 failed: no such process
> juno% echo ${jobstates}
> suspended::4421=running
> juno% ^D
> Vim: Caught deadly signal HUP
> ...
>
> Everything works ok when
> - always is not used
> - stdin is not used
>
> Thanks,

For the record, this is also broken on 5.0.7 (Linux 3.10.42-1-lts),
but works on 4.3.17 (3.2.0-4-amd64).

-- 
Christian Neukirchen  <chneukirchen@gmail.com>  http://chneukirchen.org


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-02 18:39 Bug related to stdin/always/jobcontrol Christian Neukirchen
  2016-09-05 15:04 ` Christian Neukirchen
@ 2016-09-05 15:42 ` Peter Stephenson
       [not found]   ` <CGME20160914173110eucas1p1f0567e319ae439b975b19f4e55676fed@eucas1p1.samsung.com>
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2016-09-05 15:42 UTC (permalink / raw)
  Cc: zsh-workers

On Fri, 2 Sep 2016 20:39:39 +0200
Christian Neukirchen <chneukirchen@gmail.com> wrote:
> Stripped down test case for a mysterious loss of child:
> 
> zsh 5.2 (x86_64-unknown-linux-gnu)
> zsh-5.2-0-gc86c20a
> VIM - Vi IMproved 7.4 (2013 Aug 10, compiled Aug 29 2016 13:06:04)
> Included patches: 1-2207
> 
> zsh -f
> juno% v() { { vim - } always { true } }
> juno% ls | v
> ^Z
> zsh: running    v
> juno% fg
> fg: no current job

It looks like the job state is quite seriously challenged.

I think the relevance of the always and stdin is probably that there's
something for the job to fix up after it finishes, so the vim process
can't be fully detached from the subshell it's in.

When fg is run, the state of the job you can see has flags 0x2012:

#define STAT_STOPPED	(0x0002) /* all procs stopped or exited          */
#define STAT_LOCKED	(0x0010) /* shell is finished creating this job, */
                                 /*   may be deleted from job table      */
#define STAT_SUBLEADER  (0x2000) /* is super-job, but leader is sub-shell */

This doesn't have STAT_INUSE, so it's not clear "jobs" should be
reporting it at all.  Adding STAT_INUSE and fg'ing doesn't help; it does
attempt to bring the command to the foreground but then gets stuck and
stays stuck even if the vim command is killed from elsewhere.

pws      27402 25329  7 16:28 pts/1    00:00:01 ./zsh
pws      27423 27402  0 16:29 pts/1    00:00:00 vim -
pws      27425 27402  0 16:29 pts/1    00:00:00 ./zsh

That first ./zsh is the parent shell; I guess 27425 is the subshell
process.

job 3 in the job table appears to be valid and is in use but is marked
as STAT_NOPRINT.  It has status 0x173:

#define STAT_CHANGED	(0x0001) /* status changed and not reported      */
#define STAT_STOPPED	(0x0002) /* all procs stopped or exited          */
#define STAT_LOCKED	(0x0010) /* shell is finished creating this job, */
                                 /*   may be deleted from job table      */
#define STAT_NOPRINT	(0x0020) /* job was killed internally,           */
                                 /*   we don't want to show that         */
#define STAT_INUSE	(0x0040) /* this job entry is in use             */
#define STAT_SUBJOB	(0x0100) /* job is a subjob                      */

I'm guessing what should happen is something like foregrounding the
SUBLEADER job should also foreground the SUBJOB job.  But that doesn't
tell us whether a job that's not INUSE should show up, or why the INUSE
flag is missing in the first place.

pws

^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <CGME20160914173110eucas1p1f0567e319ae439b975b19f4e55676fed@eucas1p1.samsung.com>]

* Re: Bug related to stdin/always/jobcontrol
       [not found]   ` <CGME20160914173110eucas1p1f0567e319ae439b975b19f4e55676fed@eucas1p1.samsung.com>
@ 2016-09-14 17:31     ` Peter Stephenson
  2016-09-14 21:35       ` Peter Stephenson
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2016-09-14 17:31 UTC (permalink / raw)
  To: zsh-workers

On Mon, 5 Sep 2016 16:42:07 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> > juno% v() { { vim - } always { true } }
> > juno% ls | v
> > ^Z
> > zsh: running    v
> > juno% fg
> > fg: no current job
> 
> It looks like the job state is quite seriously challenged.

I think I might be getting close with the attached patch, but not quite
far enough so I know I'm not doing something fundamentally screwy.
With it, I get as far as continuing the process, but vim gets stopped
again because it can't write to the terminal.  Hence I'm suspecting
something's still amiss in process group handling.  I never understood
process group handling.  In case anyone does, as you'll see if you try
it out, you get the vim process and the fix-up subprocess zsh (as below)
in a process group which is the PID of an exited process.  So first
guess is we need to create a new process group, or something.

That exited process is what was originally SUPERJOB, which is the ls which
has exited (I think: I haven't trapped it in the act but this seems to
be the only thing that makes sense).  The newly forked zsh is the result
of the ^Z and so needs to be in charge of the vim process (and, in the
zsh model, subjob) --- again, I think, because that's basically why we
do the fork, so we can still bring the right hand side of the pipeline
which was originally in the current shell into the foreground.

> When fg is run, the state of the job you can see has flags 0x2012:
> 
> #define STAT_STOPPED	(0x0002) /* all procs stopped or exited          */
> #define STAT_LOCKED	(0x0010) /* shell is finished creating this job, */
>                                  /*   may be deleted from job table      */
> #define STAT_SUBLEADER  (0x2000) /* is super-job, but leader is sub-shell */
> 
> This doesn't have STAT_INUSE, so it's not clear "jobs" should be
> reporting it at all.  Adding STAT_INUSE and fg'ing doesn't help; it does
> attempt to bring the command to the foreground but then gets stuck and
> stays stuck even if the vim command is killed from elsewhere.
> 
> pws      27402 25329  7 16:28 pts/1    00:00:01 ./zsh
> pws      27423 27402  0 16:29 pts/1    00:00:00 vim -
> pws      27425 27402  0 16:29 pts/1    00:00:00 ./zsh
> 
> That first ./zsh is the parent shell; I guess 27425 is the subshell
> process.

This appears to be the case.

I've added the INUSE back in, because the job doesn't seem to be any use
to person nor best without it.

I've then made it look for a SUBJOB of which it can become SUPERJOB, and
fixed up the associations.  This also assumes I've worked out the
difference between being a SUBLEADER and a SUPERJOB.  Note: I'm not
sure if we need to search for an existing SUPERJOB first because I'm
not sure in what circumstances we can get to the point we have.

The final hunk is an obvious typo --- "other" is always a job (possibly
sounds better in French).

Now fg is trying to do something looks plausible, but with the effect
noted above.

pws

diff --git a/Src/exec.c b/Src/exec.c
index cfd633a..2638b01 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1570,6 +1570,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)

 	    if (nowait) {
 		if(!pline_level) {
+		    int jobsub;
 		    struct process *pn, *qn;

 		    curjob = newjob;
@@ -1593,7 +1594,23 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 		    }

 		    jn->stat &= ~(STAT_DONE | STAT_NOPRINT);
-		    jn->stat |= STAT_STOPPED | STAT_CHANGED | STAT_LOCKED;
+		    jn->stat |= STAT_STOPPED | STAT_CHANGED | STAT_LOCKED |
+			STAT_INUSE;
+		    /*
+		     * Pick up any subjob that's still lying around
+		     * as it's now our responsibility.
+		     * If we find it we're a regular SUPERJOB
+		     * instead of a mere SUBLEADER.
+		     */
+		    for (jobsub = 1; jobsub <= maxjob; jobsub++) {
+			Job jnsub = jobtab + jobsub;
+			if (jnsub->stat & STAT_SUBJOB) {
+			    jn->other = jobsub;
+			    jn->stat |= STAT_SUPERJOB;
+			    jn->stat &= ~STAT_SUBLEADER;
+			    jnsub->other = newjob;
+			}
+		    }
 		    printjob(jn, !!isset(LONGLISTJOBS), 1);
 		}
 		else if (newjob != list_pipe_job)
@@ -1668,7 +1685,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 			    jobtab[list_pipe_job].other = newjob;
 			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
 			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
-			    jn->other = pid;
+			    jn->other = list_pipe_job;
 			}
 			if ((list_pipe || last1) && hasprocs(list_pipe_job))
 			    killpg(jobtab[list_pipe_job].gleader, SIGSTOP);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-14 17:31     ` Peter Stephenson
@ 2016-09-14 21:35       ` Peter Stephenson
  2016-09-15  3:24         ` Bart Schaefer
  2016-09-22 11:59         ` Daniel Shahaf
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Stephenson @ 2016-09-14 21:35 UTC (permalink / raw)
  To: zsh-workers

On Wed, 14 Sep 2016 18:31:05 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> I think I might be getting close with the attached patch... first
> guess is we need to create a new process group, or something.

The subjob needs to keep its process group because it's too late to
change it; I found out what was missing, it needs process group leader
assigning from that of list_pipe_job (obviously, right?)  Seems a bit
weird this never got noticed before, but it's hard to see how it can be
wrong since this is the way the pipeline always works, regardless of
whether the ls process or equivalent has exited.  I've tried a few bits
of job control with multiple different list-pipe-style constructs lying
around and they still seem to work.

Looks like the pgrp for the new superjob is irrelevant as it's kept out
of the way till the last minute, so no changes there.

I was shooting myself in the foot wth the change for "other".  It turns
out "other" means other job if this is the superjob, and other process
if this is the subjob (obviously, right?)  Je est veritablement un
autre.

I've made the claiming of the subjob safe by marking it as orphaned when
the original superjob, the ls process in the example, exits.  So if that
doesn't happen this doesn't kick in, so this shouldn't be stomping on
anything that already works.

I present the solution for verification.  If accepted, I will be
demanding a certifcate of some sort.

But there's still one thing I don't understand.  When
Scooby... er... sorry, force of habit... it's this chunk which is for
SIGCONT when we're dealing with a superjob in killjb():

                for (pn = jn->procs; pn->next; pn = pn->next)
                    if (kill(pn->pid, sig) == -1 && errno != ESRCH)
			err = -1;

The exit test is pn->next, so we skip the last process in the superjob,
which is a bit weird.  Certainly, changing that to the more obvious pn
causes the shell process acting as superjob in this case to be started
too early.  However, I don't know what happens in more normal cases.
I've left it alone.

pws

diff --git a/Src/exec.c b/Src/exec.c
index cfd633a..21270c8 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1570,6 +1570,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 
 	    if (nowait) {
 		if(!pline_level) {
+		    int jobsub;
 		    struct process *pn, *qn;
 
 		    curjob = newjob;
@@ -1582,6 +1583,20 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 		    if (!jn->procs->next || lpforked == 2) {
 			jn->gleader = list_pipe_pid;
 			jn->stat |= STAT_SUBLEADER;
+			/*
+			 * Pick up any subjob that's still lying around
+			 * as it's now our responsibility.
+			 * If we find it we're a SUPERJOB.
+			 */
+			for (jobsub = 1; jobsub <= maxjob; jobsub++) {
+			    Job jnsub = jobtab + jobsub;
+			    if (jnsub->stat & STAT_SUBJOB_ORPHANED) {
+				jn->other = jobsub;
+				jn->stat |= STAT_SUPERJOB;
+				jnsub->stat &= ~STAT_SUBJOB_ORPHANED;
+				jnsub->other = list_pipe_pid;
+			    }
+			}
 		    }
 		    for (pn = jobtab[jn->other].procs; pn; pn = pn->next)
 			if (WIFSTOPPED(pn->status))
@@ -1593,7 +1608,8 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 		    }
 
 		    jn->stat &= ~(STAT_DONE | STAT_NOPRINT);
-		    jn->stat |= STAT_STOPPED | STAT_CHANGED | STAT_LOCKED;
+		    jn->stat |= STAT_STOPPED | STAT_CHANGED | STAT_LOCKED |
+			STAT_INUSE;
 		    printjob(jn, !!isset(LONGLISTJOBS), 1);
 		}
 		else if (newjob != list_pipe_job)
@@ -1669,6 +1685,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
 			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
 			    jn->other = pid;
+			    jn->gleader = jobtab[list_pipe_job].gleader;
 			}
 			if ((list_pipe || last1) && hasprocs(list_pipe_job))
 			    killpg(jobtab[list_pipe_job].gleader, SIGSTOP);
diff --git a/Src/jobs.c b/Src/jobs.c
index 6bc3616..9284c71 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -128,7 +128,7 @@ makerunning(Job jn)
     Process pn;
 
     jn->stat &= ~STAT_STOPPED;
-    for (pn = jn->procs; pn; pn = pn->next)
+    for (pn = jn->procs; pn; pn = pn->next) {
 #if 0
 	if (WIFSTOPPED(pn->status) && 
 	    (!(jn->stat & STAT_SUPERJOB) || pn->next))
@@ -136,6 +136,7 @@ makerunning(Job jn)
 #endif
         if (WIFSTOPPED(pn->status))
 	    pn->status = SP_RUNNING;
+    }
 
     if (jn->stat & STAT_SUPERJOB)
 	makerunning(jobtab + jn->other);
@@ -236,7 +237,7 @@ handle_sub(int job, int fg)
     if ((sj->stat & STAT_DONE) || (!sj->procs && !sj->auxprocs)) {
 	struct process *p;
 		    
-	for (p = sj->procs; p; p = p->next)
+	for (p = sj->procs; p; p = p->next) {
 	    if (WIFSIGNALED(p->status)) {
 		if (jn->gleader != mypgrp && jn->procs->next)
 		    killpg(jn->gleader, WTERMSIG(p->status));
@@ -246,6 +247,7 @@ handle_sub(int job, int fg)
 		kill(sj->other, WTERMSIG(p->status));
 		break;
 	    }
+	}
 	if (!p) {
 	    int cp;
 
@@ -1316,6 +1318,11 @@ deletejob(Job jn, int disowning)
 	attachtty(mypgrp);
 	adjustwinsize(0);
     }
+    if (jn->stat & STAT_SUPERJOB) {
+	Job jno = jobtab + jn->other;
+	if (jno->stat & STAT_SUBJOB)
+	    jno->stat |= STAT_SUBJOB_ORPHANED;
+    }
 
     freejob(jn, 1);
 }
diff --git a/Src/zsh.h b/Src/zsh.h
index 2dc5e7e..bb8ce13 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -983,7 +983,8 @@ struct jobfile {
 
 struct job {
     pid_t gleader;		/* process group leader of this job  */
-    pid_t other;		/* subjob id or subshell pid         */
+    pid_t other;		/* subjob id (SUPERJOB)
+				 * or subshell pid (SUBJOB) */
     int stat;                   /* see STATs below                   */
     char *pwd;			/* current working dir of shell when *
 				 * this job was spawned              */
@@ -1015,6 +1016,8 @@ struct job {
 #define STAT_SUBLEADER  (0x2000) /* is super-job, but leader is sub-shell */
 
 #define STAT_BUILTIN    (0x4000) /* job at tail of pipeline is a builtin */
+#define STAT_SUBJOB_ORPHANED (0x8000)
+                                 /* STAT_SUBJOB with STAT_SUPERJOB exited */
 
 #define SP_RUNNING -1		/* fake status for jobs currently running */
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-14 21:35       ` Peter Stephenson
@ 2016-09-15  3:24         ` Bart Schaefer
  2016-09-15  8:27           ` Peter Stephenson
  2016-09-22 11:59         ` Daniel Shahaf
  1 sibling, 1 reply; 20+ messages in thread
From: Bart Schaefer @ 2016-09-15  3:24 UTC (permalink / raw)
  To: zsh-workers

On Sep 14, 10:35pm, Peter Stephenson wrote:
} Subject: Re: Bug related to stdin/always/jobcontrol
}
} I present the solution for verification.  If accepted, I will be
} demanding a certifcate of some sort.

Thank you for digging through this.  I was expecting to need to dive
into it, and my schedule the next several days would not have been
amenable.

Your explanation makes sense and the code looks sane, I think you can
claim your certificate.

} SIGCONT when we're dealing with a superjob in killjb():
} 
}                 for (pn = jn->procs; pn->next; pn = pn->next)
}                     if (kill(pn->pid, sig) == -1 && errno != ESRCH)
} 			err = -1;
} 
} The exit test is pn->next, so we skip the last process in the superjob,
} which is a bit weird.

I think this has to do with fork-to-the-left on pipelines, i.e., in
the "more normal case" the last process in the superjob is either
the current shell or the process group leader.  The loop leaves pn
pointing to the last process, and then the following bit --

                if (!jobtab[jn->other].procs && pn)
                    if (kill(pn->pid, sig) == -1 && errno != ESRCH)
                        err = -1;

-- sends that process the signal, except not when there are "other"
procs, so I don't quite grok that part.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-15  3:24         ` Bart Schaefer
@ 2016-09-15  8:27           ` Peter Stephenson
  2016-09-15 10:33             ` Peter Stephenson
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2016-09-15  8:27 UTC (permalink / raw)
  To: zsh-workers

On Wed, 14 Sep 2016 20:24:50 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> I think this has to do with fork-to-the-left on pipelines, i.e., in
> the "more normal case" the last process in the superjob is either
> the current shell or the process group leader.  The loop leaves pn
> pointing to the last process, and then the following bit --
> 
>                 if (!jobtab[jn->other].procs && pn)
>                     if (kill(pn->pid, sig) == -1 && errno != ESRCH)
>                         err = -1;
> 
> -- sends that process the signal, except not when there are "other"
> procs, so I don't quite grok that part.

Ah, I think that *is* the sort of thing I'm looking at (as well as in
the normal superjob case).  If that's the right process for this (which
is the bit I didn't really undersand), i.e. what at least at one point
in its life cycle was list_pipe_pid when a subjob was associated with
it, then the presence of processes associated with the "other" job (the
subjob) means we should kick those off instead.  There are then efforts
to ensure this gets restarted when the subjob has exited (that's the bit
that sprang back to life when I realised "other" was a process when used
within the subjob structure, and it's this process it's pointing to).
So I think this does make sense, depsite being a bit hairy and
underdocumented.  I'll add some tentative notes (though they might be
closer to eight- or ninetative).

pws

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-15  8:27           ` Peter Stephenson
@ 2016-09-15 10:33             ` Peter Stephenson
  2016-09-15 17:40               ` Peter Stephenson
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2016-09-15 10:33 UTC (permalink / raw)
  To: zsh-workers

Still issues here --- I tried the last patch on another machine and it's
OK up to the point where you resume and then exit the job, at which
point vim exits cleanly but the new superjob (the forked zsh) is
continued but suspends itself again. Sending it SIGCONT causes it to
exit normally, so this looks like a race.

pws

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-15 10:33             ` Peter Stephenson
@ 2016-09-15 17:40               ` Peter Stephenson
  2016-09-16  1:08                 ` Bart Schaefer
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2016-09-15 17:40 UTC (permalink / raw)
  To: zsh-workers

On Thu, 15 Sep 2016 11:33:15 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> Still issues here --- I tried the last patch on another machine and it's
> OK up to the point where you resume and then exit the job, at which
> point vim exits cleanly but the new superjob (the forked zsh) is
> continued but suspends itself again. Sending it SIGCONT causes it to
> exit normally, so this looks like a race.

I'm a bit stuck on this.

The race is related to the fact that when the new SUPERJOB takes over
it's actually in the same process group as the SUBJOB it's taking over,
which is different from it's own PID, list_pipe_pid, that the code
currently assumes is the pgrp (and is indeed the pgrp in the cases where
everything works --- I've again verified the code on a different
machine here and get the impression faster machines work better).

However, fixing that up doesn't help --- apparently attaching it to the
TTY after the SUBJOB is finished works OK, sending it SIGCONT seems to
be OK, but then it always gets SIGSTOP --- and sending it SIGCONT from
the keyboard allows it to finish.  So I have no idea why it's being
stopped after the first SIGCONT.  (I tried putting it from the main
shell into a new process group of its own; that succeeds, but still
doesn't help.)

Stepping through the forked zsh that was stopped doesn't suggest
anything, either, as it just executes the remainder of the shell code in
the v function.  However, I should have another look at the point where
it gets the pukka SIGSTOP (the one that keeps it out of the way until
the SUBJOB has finisehd) to make sure I'm not missing something there.

I think the code is enough better that I may submit it and wonder what
to do with the remaining problem later, or let Bart have a go when he
gets some time.

pws

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-15 17:40               ` Peter Stephenson
@ 2016-09-16  1:08                 ` Bart Schaefer
  2016-09-16 12:02                   ` Peter Stephenson
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Schaefer @ 2016-09-16  1:08 UTC (permalink / raw)
  To: zsh-workers

On Sep 15,  6:40pm, Peter Stephenson wrote:
} Subject: Re: Bug related to stdin/always/jobcontrol
}
} The race is related to the fact that when the new SUPERJOB takes over
} it's actually in the same process group as the SUBJOB it's taking over,
} which is different from it's own PID, list_pipe_pid [...]
} 
} However, fixing that up doesn't help --- apparently attaching it to the
} TTY after the SUBJOB is finished works OK, sending it SIGCONT seems to
} be OK, but then it always gets SIGSTOP

Is it definitely getting SIGSTOP and not SIGTSTP ?

The operating system will never[*] automatically send a true STOP signal
because it's un-catchable.  If a STOP is being received, it's probably
being sent from the "for (; !nowait;)" loop in execpline(), either at
line 1674 [ killpg(jobtab[list_pipe_job].gleader, SIGSTOP); ] or at
line 1688 [ kill(getpid(), SIGSTOP); ].

[*] To my knowledge, anyway.

} I think the code is enough better that I may submit it and wonder what
} to do with the remaining problem later, or let Bart have a go when he
} gets some time.

That seems reasonable.  This isn't any worse than it ever was, so it
shouldn't hold up the release.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-16  1:08                 ` Bart Schaefer
@ 2016-09-16 12:02                   ` Peter Stephenson
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Stephenson @ 2016-09-16 12:02 UTC (permalink / raw)
  To: zsh-workers

On Thu, 15 Sep 2016 18:08:55 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 15,  6:40pm, Peter Stephenson wrote:
> } Subject: Re: Bug related to stdin/always/jobcontrol
> }
> } The race is related to the fact that when the new SUPERJOB takes over
> } it's actually in the same process group as the SUBJOB it's taking over,
> } which is different from it's own PID, list_pipe_pid [...]
> } 
> } However, fixing that up doesn't help --- apparently attaching it to the
> } TTY after the SUBJOB is finished works OK, sending it SIGCONT seems to
> } be OK, but then it always gets SIGSTOP
> 
> Is it definitely getting SIGSTOP and not SIGTSTP ?
> 
> The operating system will never[*] automatically send a true STOP signal
> because it's un-catchable.  If a STOP is being received, it's probably
> being sent from the "for (; !nowait;)" loop in execpline(), either at
> line 1674 [ killpg(jobtab[list_pipe_job].gleader, SIGSTOP); ] or at
> line 1688 [ kill(getpid(), SIGSTOP); ].

I've found the double SIGSTOP --- it was getting *both* of the above.

In the case in question, the newly forked child zsh was picking up the
pgrp of the vim process, so when we sent SIGSTOP to that pgrp it got
stopped, too.  When it got restarted, it immediately stopped itself.

There's some code in the child to decide on the pgrp.  It looks like it
needs to have its own pgrp if the old superjob has already gone, the
basis of the problems we are seeing here.

I'm not sure if this leaves some more races but, again, it looks like
it's an improvement.

The rest of the patch is cosmetic.

The bad news is there are other problems with this logic that have crept
in since 5.2 --- this appears to be unrelated, or at least not made any
worse by this change, so I'll start a separate thread.

pws

diff --git a/Src/exec.c b/Src/exec.c
index 21270c8..0755d55 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1652,7 +1652,13 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 		    int synch[2];
 		    struct timeval bgtime;
 
+		    /*
+		     * A pipeline with the shell handling the right
+		     * hand side was stopped.  We'll fork to allow
+		     * it to continue.
+		     */
 		    if (pipe(synch) < 0 || (pid = zfork(&bgtime)) == -1) {
+			/* Failure */
 			if (pid < 0) {
 			    close(synch[0]);
 			    close(synch[1]);
@@ -1666,6 +1672,18 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 			thisjob = newjob;
 		    }
 		    else if (pid) {
+			/*
+			 * Parent: job control is here.  If the job
+			 * started for the RHS of the pipeline is still
+			 * around, then its a SUBJOB and the job for
+			 * earlier parts of the pipeeline is its SUPERJOB.
+			 * The newly forked shell isn't recorded as a
+			 * separate job here, just as list_pipe_pid.
+			 * If the superjob exits (it may already have
+			 * done so, see child branch below), we'll use
+			 * list_pipe_pid to form the basis of a
+			 * replacement job --- see SUBLEADER code above.
+			 */
 			char dummy;
 
 			lpforked =
@@ -1692,9 +1710,33 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 			break;
 		    }
 		    else {
+			Job pjn = jobtab + list_pipe_job;
 			close(synch[0]);
 			entersubsh(ESUB_ASYNC);
-			if (jobtab[list_pipe_job].procs) {
+			/*
+			 * At this point, we used to attach this process
+			 * to the process group of list_pipe_job (the
+			 * new superjob) any time that was still available.
+			 * That caused problems when the fork happened
+			 * early enough that the subjob is in that
+			 * process group, since then we will stop this
+			 * job when we signal the subjob, and we have
+			 * no way here to know that we shouldn't also
+			 * send STOP to itself, resulting in two stops.
+			 * So don't do that if the original
+			 * list_pipe_job has exited.
+			 *
+			 * The choice here needs to match the assumption
+			 * made when handling SUBLEADER above that the
+			 * process group is our own PID.  I'm not sure
+			 * if there's a potential race for that.  But
+			 * setting up a new process group if the
+			 * superjob is still functioning seems the wrong
+			 * thing to do.
+			 */
+			if (pjn->procs &&
+			    (pjn->stat & STAT_INUSE) &&
+			    !(pjn->stat & STAT_DONE)) {
 			    if (setpgrp(0L, mypgrp = jobtab[list_pipe_job].gleader)
 				== -1) {
 				setpgrp(0L, mypgrp = getpid());
diff --git a/Src/jobs.c b/Src/jobs.c
index 9284c71..d1b98ac 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -232,11 +232,12 @@ super_job(int sub)
 static int
 handle_sub(int job, int fg)
 {
+    /* job: superjob; sj: subjob. */
     Job jn = jobtab + job, sj = jobtab + jn->other;
 
     if ((sj->stat & STAT_DONE) || (!sj->procs && !sj->auxprocs)) {
 	struct process *p;
-		    
+
 	for (p = sj->procs; p; p = p->next) {
 	    if (WIFSIGNALED(p->status)) {
 		if (jn->gleader != mypgrp && jn->procs->next)
diff --git a/Src/signals.c b/Src/signals.c
index 2eefc07..30dde71 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -723,7 +723,7 @@ killjb(Job jn, int sig)
 {
     Process pn;
     int err = 0;
- 
+
     if (jobbing) {
         if (jn->stat & STAT_SUPERJOB) {
             if (sig == SIGCONT) {
@@ -731,11 +731,21 @@ killjb(Job jn, int sig)
                     if (killpg(pn->pid, sig) == -1)
 			if (kill(pn->pid, sig) == -1 && errno != ESRCH)
 			    err = -1;
- 
+
+		/*
+		 * Note this does not kill the last process,
+		 * which is assumed to be the one controlling the
+		 * subjob, i.e. the forked zsh that was originally
+		 * list_pipe_pid...
+		 */
                 for (pn = jn->procs; pn->next; pn = pn->next)
                     if (kill(pn->pid, sig) == -1 && errno != ESRCH)
 			err = -1;
 
+		/*
+		 * ...we only continue that once the external processes
+		 * currently associated with the subjob are finished.
+		 */
 		if (!jobtab[jn->other].procs && pn)
 		    if (kill(pn->pid, sig) == -1 && errno != ESRCH)
 			err = -1;
@@ -744,7 +754,7 @@ killjb(Job jn, int sig)
             }
             if (killpg(jobtab[jn->other].gleader, sig) == -1 && errno != ESRCH)
 		err = -1;
-		
+
 	    if (killpg(jn->gleader, sig) == -1 && errno != ESRCH)
 		err = -1;
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-14 21:35       ` Peter Stephenson
  2016-09-15  3:24         ` Bart Schaefer
@ 2016-09-22 11:59         ` Daniel Shahaf
  2016-09-22 16:38           ` Bart Schaefer
  2016-09-25 15:16           ` Bug related to stdin/always/jobcontrol Peter Stephenson
  1 sibling, 2 replies; 20+ messages in thread
From: Daniel Shahaf @ 2016-09-22 11:59 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

Peter Stephenson wrote on Wed, Sep 14, 2016 at 22:35:53 +0100:
> On Wed, 14 Sep 2016 18:31:05 +0100
> Peter Stephenson <p.stephenson@samsung.com> wrote:
> > I think I might be getting close with the attached patch... first
> > guess is we need to create a new process group, or something.
> 
> The subjob needs to keep its process group because it's too late to
> change it; I found out what was missing, it needs process group leader
> assigning from that of list_pipe_job (obviously, right?)  Seems a bit
> weird this never got noticed before, but it's hard to see how it can be
> wrong since this is the way the pipeline always works, regardless of
> whether the ls process or equivalent has exited.  I've tried a few bits
> of job control with multiple different list-pipe-style constructs lying
> around and they still seem to work.

This (39331) broke fg'ing a function for me:

% print $ZSH_PATCHLEVEL 
zsh-5.2-dev-1-401-g4e51079
% f() { $EDITOR } 
% f
<press ^Z in the editor>
zsh: suspended  f
% fg
[1]  + continued  f
zsh: suspended (tty output)  f

After the 'fg' I get another prompt immediately, instead of being
returend to $EDITOR.

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-22 11:59         ` Daniel Shahaf
@ 2016-09-22 16:38           ` Bart Schaefer
  2016-09-23  1:18             ` Bart Schaefer
  2016-09-25 15:16           ` Bug related to stdin/always/jobcontrol Peter Stephenson
  1 sibling, 1 reply; 20+ messages in thread
From: Bart Schaefer @ 2016-09-22 16:38 UTC (permalink / raw)
  To: zsh-workers

On Sep 22, 11:59am, Daniel Shahaf wrote:
}
} % f() { $EDITOR } 
} % f
} <press ^Z in the editor>
} zsh: suspended  f
} % fg
} [1]  + continued  f
} zsh: suspended (tty output)  f
} 
} After the 'fg' I get another prompt immediately, instead of being
} returend to $EDITOR.

Worse that that, signals go to the parent shell:

torch% fg
[1]  + continued  f
zsh: suspended (tty output)  f
torch% kill -9 %1
Vim: Caught deadly signal HUP
Vim: Finished.
zsh: killed     Src/zsh -f


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-22 16:38           ` Bart Schaefer
@ 2016-09-23  1:18             ` Bart Schaefer
  2016-09-23  6:06               ` Daniel Shahaf
  2016-09-25 14:39               ` Peter Stephenson
  0 siblings, 2 replies; 20+ messages in thread
From: Bart Schaefer @ 2016-09-23  1:18 UTC (permalink / raw)
  To: zsh-workers

On Sep 22,  9:38am, Bart Schaefer wrote:
} Subject: Re: Bug related to stdin/always/jobcontrol
}
} On Sep 22, 11:59am, Daniel Shahaf wrote:
} }
} } % f() { $EDITOR } 
} } % f
} } <press ^Z in the editor>
} } zsh: suspended  f
} } % fg
} } [1]  + continued  f
} } zsh: suspended (tty output)  f
} 
} Worse that that, signals go to the parent shell:
} 
} torch% kill -9 %1
} Vim: Caught deadly signal HUP
} Vim: Finished.
} zsh: killed     Src/zsh -f

The signal is being sent from here (signals.c):

755                 if (killpg(jobtab[jn->other].gleader, sig) == -1 && errno != ESRCH)

The problem is that jobtab[jn->other].gleader is 0, so killpg kills the
current process.

Obviously the group leader should never be zero, but I haven't figured
out how to track down where it's [not] being set.  Probably this is
related to the shell function not properly attaching to the tty when
foregrounded.

However, in attempting to figure it out, I found some sixteen-year-old
code that is clearly wrong:

diff --git a/Src/exec.c b/Src/exec.c
index d924148..bf97b5c 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1702,7 +1702,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 			    jobtab[list_pipe_job].other = newjob;
 			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
 			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
-			    jn->other = pid;
+			    jn->other = list_pipe_job;
 			    jn->gleader = jobtab[list_pipe_job].gleader;
 			}
 			if ((list_pipe || last1) && hasprocs(list_pipe_job))

jn->other is a job table index, not a process ID.  However, (a) I don't
know if list_pip_job is the right value (although a few lines earlier
list_pipe_pid = pid is assigned, so it would seem to makes sense) and
(b) that this has never caused a wild pointer dereference in all this
time seems to indicate that this code is never actually reached.

I don't have any more time to look at this now; hoping this information
is helpful to PWS if he has a chance to look when he wakes up on Friday.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-23  1:18             ` Bart Schaefer
@ 2016-09-23  6:06               ` Daniel Shahaf
  2016-09-25 14:39               ` Peter Stephenson
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Shahaf @ 2016-09-23  6:06 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Thu, Sep 22, 2016 at 18:18:47 -0700:
> However, in attempting to figure it out, I found some sixteen-year-old
> code that is clearly wrong:
> 
> diff --git a/Src/exec.c b/Src/exec.c
> index d924148..bf97b5c 100644
> --- a/Src/exec.c
> +++ b/Src/exec.c
> @@ -1702,7 +1702,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
>  			    jobtab[list_pipe_job].other = newjob;
>  			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
>  			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
> -			    jn->other = pid;
> +			    jn->other = list_pipe_job;
>  			    jn->gleader = jobtab[list_pipe_job].gleader;
>  			}
>  			if ((list_pipe || last1) && hasprocs(list_pipe_job))
> 
> jn->other is a job table index, not a process ID.  However, (a) I don't
> know if list_pip_job is the right value (although a few lines earlier
> list_pipe_pid = pid is assigned, so it would seem to makes sense) and
> (b) that this has never caused a wild pointer dereference in all this
> time seems to indicate that this code is never actually reached.

The code does get reached: if you do
.
    % f() vim
    % f
    ^Z
.
at a new shell, it executes the line you changed.  Perhaps nothing reads
the value assigned to jn->other, though?

> I don't have any more time to look at this now; hoping this information
> is helpful to PWS if he has a chance to look when he wakes up on Friday.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-23  1:18             ` Bart Schaefer
  2016-09-23  6:06               ` Daniel Shahaf
@ 2016-09-25 14:39               ` Peter Stephenson
  2016-09-25 22:54                 ` struct job.other (was Re: Bug related to stdin/always/jobcontrol) Bart Schaefer
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2016-09-25 14:39 UTC (permalink / raw)
  To: zsh-workers

On Thu, 22 Sep 2016 18:18:47 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> However, in attempting to figure it out, I found some sixteen-year-old
> code that is clearly wrong:
> 
> diff --git a/Src/exec.c b/Src/exec.c
> index d924148..bf97b5c 100644
> --- a/Src/exec.c
> +++ b/Src/exec.c
> @@ -1702,7 +1702,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
>  			    jobtab[list_pipe_job].other = newjob;
>  			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
>  			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
> -			    jn->other = pid;
> +			    jn->other = list_pipe_job;
>  			    jn->gleader = jobtab[list_pipe_job].gleader;
>  			}
>  			if ((list_pipe || last1) && hasprocs(list_pipe_job))
> 
> jn->other is a job table index, not a process ID.

The code's correct as it stands (modulo problems setting the values it's
using) --- read my previous explanations and the updated comment in
zsh.h about "other" which now explicitly states this is a pid if the job
is a SUBJOB.  "other" could perhaps be a union to indicate this.

pws


^ permalink raw reply	[flat|nested] 20+ messages in thread

* struct job.other (was Re: Bug related to stdin/always/jobcontrol)
  2016-09-25 14:39               ` Peter Stephenson
@ 2016-09-25 22:54                 ` Bart Schaefer
  2016-09-26 11:20                   ` Peter Stephenson
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Schaefer @ 2016-09-25 22:54 UTC (permalink / raw)
  To: zsh-workers

On Sep 25,  3:39pm, Peter Stephenson wrote:
}
} > jn->other is a job table index, not a process ID.
} 
} The code's correct as it stands (modulo problems setting the values it's
} using) -- read my previous explanations and the updated comment in zsh.h

OK, thanks ... how about this:

diff --git a/Src/exec.c b/Src/exec.c
index 4e89340..c27c41c 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1702,7 +1702,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 			    jobtab[list_pipe_job].other = newjob;
 			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
 			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
-			    jn->other = pid;
+			    jn->other = list_pipe_pid;	/* see zsh.h */
 			    if (hasprocs(list_pipe_job))
 				jn->gleader = jobtab[list_pipe_job].gleader;
 			}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: struct job.other (was Re: Bug related to stdin/always/jobcontrol)
  2016-09-25 22:54                 ` struct job.other (was Re: Bug related to stdin/always/jobcontrol) Bart Schaefer
@ 2016-09-26 11:20                   ` Peter Stephenson
  2016-09-26 16:33                     ` Bart Schaefer
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2016-09-26 11:20 UTC (permalink / raw)
  To: zsh-workers

On Sun, 25 Sep 2016 15:54:36 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 25,  3:39pm, Peter Stephenson wrote:
> }
> } > jn->other is a job table index, not a process ID.
> } 
> } The code's correct as it stands (modulo problems setting the values it's
> } using) -- read my previous explanations and the updated comment in zsh.h
> 
> OK, thanks ... how about this:
> 
> diff --git a/Src/exec.c b/Src/exec.c
> index 4e89340..c27c41c 100644
> --- a/Src/exec.c
> +++ b/Src/exec.c
> @@ -1702,7 +1702,7 @@ execpline(Estate state, wordcode slcode, int how, int last1)
>  			    jobtab[list_pipe_job].other = newjob;
>  			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
>  			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
> -			    jn->other = pid;
> +			    jn->other = list_pipe_pid;	/* see zsh.h */
>  			    if (hasprocs(list_pipe_job))
>  				jn->gleader = jobtab[list_pipe_job].gleader;
>  			}
> 

That's OK; I assume there's no race where list_pipe_pid is set to
something else since signals are blocked.

Same thing happens at line 1597, which is the code I added for the new
case with "ls | func".

pws


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: struct job.other (was Re: Bug related to stdin/always/jobcontrol)
  2016-09-26 11:20                   ` Peter Stephenson
@ 2016-09-26 16:33                     ` Bart Schaefer
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Schaefer @ 2016-09-26 16:33 UTC (permalink / raw)
  To: zsh-workers

On Sep 26, 12:20pm, Peter Stephenson wrote:
} Subject: Re: struct job.other (was Re: Bug related to stdin/always/jobcont
}
} On Sun, 25 Sep 2016 15:54:36 -0700
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > -			    jn->other = pid;
} > +			    jn->other = list_pipe_pid;	/* see zsh.h */
} 
} That's OK; I assume there's no race where list_pipe_pid is set to
} something else since signals are blocked.

If there is such a race, then list_pipe_job is also going to be wrong
in all of the surrounding lines, so I think the above is at least not
more broken.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bug related to stdin/always/jobcontrol
  2016-09-22 11:59         ` Daniel Shahaf
  2016-09-22 16:38           ` Bart Schaefer
@ 2016-09-25 15:16           ` Peter Stephenson
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Stephenson @ 2016-09-25 15:16 UTC (permalink / raw)
  To: zsh-workers

On Thu, 22 Sep 2016 11:59:21 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> This (39331) broke fg'ing a function for me:
> 
> % print $ZSH_PATCHLEVEL 
> zsh-5.2-dev-1-401-g4e51079
> % f() { $EDITOR } 
> % f
> <press ^Z in the editor>
> zsh: suspended  f
> % fg
> [1]  + continued  f
> zsh: suspended (tty output)  f
> 
> After the 'fg' I get another prompt immediately, instead of being
> returend to $EDITOR.

I think this is a fairly straightforward additional race fix, luckily, at
least to get us to the point where known problems are fixed.

In the cases I was looking at before, the processes associated with the
SUPERJOB already existed, and there was no further opportunity of fixing
up the pgrp for the SUBJOB.  The fact those processes already existed
also determined the pgrp for the SUBJOB, which is what's being set here
that caused the case above to be mishandled.

In this case, there's no process associated with the SUPERJOB yet, but
that's OK: everything gets handled properly when there is one.  I think
the gleader of the SUBJOB doesn't need any further special handling here
(everything seems to work, anyway).  Process handling and SUPERJOB
creation are murkily separate, so I don't think this is (necessarily)
buggy.

There could well be more aspects as I certainly haven't got my mind
round everything --- in particular, I don't know for sure that the fact
the shell has registered processes associated with the superjob is
actually tied to the subjob's pgrp, which is what this test assumes,
just that that appears to be so in the cases I've tried.

pws

diff --git a/Src/exec.c b/Src/exec.c
index d924148..a5086c3 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1703,7 +1703,8 @@ execpline(Estate state, wordcode slcode, int how, int last1)
 			    jobtab[list_pipe_job].stat |= STAT_SUPERJOB;
 			    jn->stat |= STAT_SUBJOB | STAT_NOPRINT;
 			    jn->other = pid;
-			    jn->gleader = jobtab[list_pipe_job].gleader;
+			    if (hasprocs(list_pipe_job))
+				jn->gleader = jobtab[list_pipe_job].gleader;
 			}
 			if ((list_pipe || last1) && hasprocs(list_pipe_job))
 			    killpg(jobtab[list_pipe_job].gleader, SIGSTOP);

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-09-26 16:33 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02 18:39 Bug related to stdin/always/jobcontrol Christian Neukirchen
2016-09-05 15:04 ` Christian Neukirchen
2016-09-05 15:42 ` Peter Stephenson
     [not found]   ` <CGME20160914173110eucas1p1f0567e319ae439b975b19f4e55676fed@eucas1p1.samsung.com>
2016-09-14 17:31     ` Peter Stephenson
2016-09-14 21:35       ` Peter Stephenson
2016-09-15  3:24         ` Bart Schaefer
2016-09-15  8:27           ` Peter Stephenson
2016-09-15 10:33             ` Peter Stephenson
2016-09-15 17:40               ` Peter Stephenson
2016-09-16  1:08                 ` Bart Schaefer
2016-09-16 12:02                   ` Peter Stephenson
2016-09-22 11:59         ` Daniel Shahaf
2016-09-22 16:38           ` Bart Schaefer
2016-09-23  1:18             ` Bart Schaefer
2016-09-23  6:06               ` Daniel Shahaf
2016-09-25 14:39               ` Peter Stephenson
2016-09-25 22:54                 ` struct job.other (was Re: Bug related to stdin/always/jobcontrol) Bart Schaefer
2016-09-26 11:20                   ` Peter Stephenson
2016-09-26 16:33                     ` Bart Schaefer
2016-09-25 15:16           ` Bug related to stdin/always/jobcontrol Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).