How to misplace an entire pipeline

zsh-workers
 help / color / mirror / code / Atom feed

* How to misplace an entire pipeline
@ 2011-08-06  3:31 Bart Schaefer
  2011-08-07 17:50 ` Peter Stephenson
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Schaefer @ 2011-08-06  3:31 UTC (permalink / raw)
  To: zsh-workers

Suspend a pipeline whose tail is a builtin that has already exited, and
the whole job gets lost, orphaning the left hand side:

torch% print $ZSH_VERSION $ZSH_PATCHLEVEL
4.3.12-dev-1 1.5412
torch% sleep 10 | true
torch% jobs
torch% ps ax | grep sleep
32474 pts/2    T      0:00 sleep 10
32480 pts/2    S+     0:00 grep sleep
torch% 

Worse, suspend a pipeline whose tail is a builtin that blocks on I/O:

torch% sleep 10 | sleep 20 | read

The builtin does not get suspended, so the shell is stuck; fortunately in
this case one can interrupt the builtin, again leaving the left side in
limbo:

torch% ps ax | grep sleep
32493 pts/2    T      0:00 sleep 10
32494 pts/2    T      0:00 sleep 20
32501 pts/2    S+     0:00 grep sleep

This used to work, sort of:

torch% print $ZSH_VERSION $ZSH_PATCHLEVEL
4.2.0
torch% sleep 10 | sleep 20 | true

zsh: suspended  sleep 10 | sleep 20
torch% jobs
[1]  + suspended  sleep 10 | sleep 20
torch% 

Note that the builtin is gone but the rest of the pipeline remains a job.

-- 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-06  3:31 How to misplace an entire pipeline Bart Schaefer
@ 2011-08-07 17:50 ` Peter Stephenson
  2011-08-07 21:43   ` Bart Schaefer
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Stephenson @ 2011-08-07 17:50 UTC (permalink / raw)
  To: zsh-workers

On Fri, 05 Aug 2011 20:31:09 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> Suspend a pipeline whose tail is a builtin that has already exited, and
> the whole job gets lost, orphaning the left hand side:

This sounds like the extra-special code that takes account of the fact
that the right hand side of a pipline is run in the current job but
allows you by weird tricks to treat it as a separate job anyway.  The
suspension is supposed to cause a fork, so you forfeit the ability to
have the result within the current shell in return for being able to put
the job into the background.  What it's supposed to do if the part
executing in the current shell is already finished I have no idea;
presumably then it has no need to fork, though whether there's a special
case for the special case is another question.  This is code I don't go
near; I'm not even convinced in this case that Sven ever completely
understood it and I remember trying it years ago and being puzzled by
the results.  Search for the quotation from the Nibelungenlied.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-07 17:50 ` Peter Stephenson
@ 2011-08-07 21:43   ` Bart Schaefer
  2011-08-08  4:05     ` Bart Schaefer
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Schaefer @ 2011-08-07 21:43 UTC (permalink / raw)
  To: zsh-workers

On Aug 7,  6:50pm, Peter Stephenson wrote:
} 
} This is code I don't go near

Obviously somebody went near it at some point not as long ago as Sven,
because it behaved differently in 4.2.x.

Zsh does still know about the pipeline, because if you explicitly wait
for the subjobs by PID it does block rather than saying "not a child of
this shell".  findproc() is able to see them (so it's not related to
the recent go-round with that.)

(By the way, arguably "wait" ought to continue the job if it's stopped,
which it does if you wait for it by job number but does not if you wait
for it by PID.)

Starting from

torch% sleep 10 | sleep 20 | true

Then hit ^Z, then

torch% ps ax | grep slee
27868 pts/2    T      0:00 sleep 10
27869 pts/2    T      0:00 sleep 20
27873 pts/2    S+     0:00 grep slee
torch% wait 27868

Here we are in bin_fg right after the findproc():

(gdb) p *j
$4 = {gleader = 27868, other = 0, stat = 1139, pwd = 0x0, procs = 0x9f83070, 
  auxprocs = 0x0, filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p *p
$5 = {next = 0x9f83128, pid = 27868, 
  text = "sleep 10", '\0' <repeats 71 times>, status = 5247, ti = {ru_utime = {
      tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 1999}, 
    ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 242, 
    ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, 
    ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 3, 
    ru_nivcsw = 0}, bgtime = {tv_sec = 1312751280, tv_usec = 828721}, 
  endtime = {tv_sec = 1312751282, tv_usec = 474423}}

If I wait for the "sleep 20" instead:

(gdb) p *j
$2 = {gleader = 27868, other = 0, stat = 1139, pwd = 0x0, procs = 0x9f83070, 
  auxprocs = 0x0, filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p *p
$3 = {next = 0x0, pid = 27869, text = "sleep 20", '\0' <repeats 71 times>, 
  status = 5247, ti = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {
      tv_sec = 0, tv_usec = 999}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, 
    ru_isrss = 0, ru_minflt = 242, ru_majflt = 0, ru_nswap = 0, 
    ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, 
    ru_nsignals = 0, ru_nvcsw = 3, ru_nivcsw = 1}, bgtime = {
    tv_sec = 1312751280, tv_usec = 829868}, endtime = {tv_sec = 1312751282, 
    tv_usec = 474440}}

Notice that at some point "sleep 10" was made the process group leader
for the whole pipeline.

Now, here's an interesting tidbit:

torch% jobs %?foo
jobs: job not found: ?foo
torch% jobs %?sleep
jobs: %?sleep: no such job

Note the difference?  The latter message means that getjob() found the
pipeline, but either it's _not_ STAT_INUSE or it _is_ STAT_NOPRINT.  So
I think what we have here is a simple failure to communicate.

Unfortunately I can't chase this any further this afternoon.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-07 21:43   ` Bart Schaefer
@ 2011-08-08  4:05     ` Bart Schaefer
  2011-08-08 18:27       ` Peter Stephenson
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Schaefer @ 2011-08-08  4:05 UTC (permalink / raw)
  To: zsh-workers

Two patches inline with other discussion below.

On Aug 7,  2:43pm, Bart Schaefer wrote:
}
} (By the way, arguably "wait" ought to continue the job if it's stopped,
} which it does if you wait for it by job number but does not if you wait
} for it by PID.)

Patch for that follows; I'm not sure the makerunning() is correct:

--- ../zsh-forge/current/Src/jobs.c	2011-06-14 19:54:57.000000000 -0700
+++ Src/jobs.c	2011-08-07 18:47:58.000000000 -0700
@@ -1933,12 +1933,19 @@
 	    Process p;

 	    if (findproc(pid, &j, &p, 0)) {
-		/*
-		 * returns 0 for normal exit, else signal+128
-		 * in which case we should return that status.
-		 */
-		retval = waitforpid(pid, 1);
-		if (!retval)
+		if (j->stat & STAT_STOPPED) {
+		    retval = (killjb(j, SIGCONT) != 0);
+		    if (retval == 0)
+			makerunning(j);
+		}
+		if (retval == 0) {
+		    /*
+		     * returns 0 for normal exit, else signal+128
+		     * in which case we should return that status.
+		     */
+		    retval = waitforpid(pid, 1);
+		}
+		if (retval == 0)
 		    retval = lastval2;
 	    } else if (isset(POSIXJOBS) &&
 		       pid == lastpid && lastpid_status >= 0L) {

} Now, here's an interesting tidbit:
} 
} torch% jobs %?foo
} jobs: job not found: ?foo
} torch% jobs %?sleep
} jobs: %?sleep: no such job
} 
} Note the difference?  The latter message means that getjob() found the
} pipeline, but either it's _not_ STAT_INUSE or it _is_ STAT_NOPRINT.  So
} I think what we have here is a simple failure to communicate.

The following is clearly not a complete fix and maybe is even wrong if
a different problem is fixed elsewhere, but this at least allows the
suspended pipeline to be manipulated with jobs/fg/bg/wait.

--- ../zsh-forge/current/Src/exec.c	2011-07-27 01:13:48.000000000 -0700
+++ Src/exec.c	2011-08-07 19:07:59.000000000 -0700
@@ -2845,7 +2845,9 @@
 	/* This is a current shell procedure that didn't need to fork.    *
 	 * This includes current shell procedures that are being exec'ed, *
 	 * as well as null execs.                                         */
-	jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT;
+	jobtab[thisjob].stat |= STAT_CURSH;
+	if (!jobtab[thisjob].procs)
+	    jobtab[thisjob].stat |= STAT_NOPRINT;
     } else {
 	/* This is an exec (real or fake) for an external command.    *
 	 * Note that any form of exec means that the subshell is fake *

Even with this patch, "... | read" followed by ^Z results in "read" left
waiting for a pipeline that has been stopped.  What seems to be going on
here is that a job table entry was created for the pipeline and the
forked-off left side made the group leader, but then that job table
entry is overloaded to represent the current shell builtin that is also
running as the right-hand-side.

When the terminal driver generates a TSTP it hits the forked-off left
side (via the group leader).  This comes through zhandler() into
update_job() which does notice that the job is STAT_CURSH, but in this
case list_pipe == 0 because "read" is an ordinary builtin, not a loop
or other compound construct; so STAT_CURSH is ignored, and the group
leader is attached to the tty (I believe that's a no-op as it already
is attached).

The other side-effect of this snippet:

	if (jn->stat & STAT_CURSH)
	    inforeground = 1;
	else if (job == thisjob) {
	    lastval = val;
	    inforeground = 2;
	}

is that inforeground != 2, so at the end of update_job() when checking
to see if the current shell should pretend to have been signaled, the
test fails; but that doesn't matter for TSTP because only INT and QUIT
are handled specially.

When "read" is the tail of the pipe, the above all happens behind the
scenes and then the I/O system call gets restarted, which is how the
shell ends up stuck.  I'm not sure how to escape from that, except
maybe to have zhandler() kill the shell with a different signal from
which the system call will not recover.

When something like "true" is the tail of the pipe, we return into
execpline at line 1500 (from waitjobs()), where list_pipe_job is set
but list_pipe and list_pipe_child are not.  If all three were nonzero,
a dummy shell would be forked off as PWS described to act as the
suspended job, but instead execpline() simply returns because the
last job in the pipeline has exited.

The only obvious thing I can think to do here is to note in zhandler()
that we have STAT_CURSH but not list_pipe, and therefore SIGCONT the
left-hand-side immediately and return as if no signal had occurred
(possibly printing a warning about not being able to suspend the job,
which is what happens elsewhere if pipe() or fork() fails).  However,
that could lead to a serious busy-loop if somehow TTIN or TTOU was
the signal instead of TSTP.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-08  4:05     ` Bart Schaefer
@ 2011-08-08 18:27       ` Peter Stephenson
  2011-08-09  6:10         ` Bart Schaefer
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Stephenson @ 2011-08-08 18:27 UTC (permalink / raw)
  To: zsh-workers

On Sun, 07 Aug 2011 21:05:07 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> The following is clearly not a complete fix and maybe is even wrong if
> a different problem is fixed elsewhere, but this at least allows the
> suspended pipeline to be manipulated with jobs/fg/bg/wait.
> 
> --- ../zsh-forge/current/Src/exec.c	2011-07-27 01:13:48.000000000 -0700
> +++ Src/exec.c	2011-08-07 19:07:59.000000000 -0700
> @@ -2845,7 +2845,9 @@
>  	/* This is a current shell procedure that didn't need to fork.    *
>  	 * This includes current shell procedures that are being exec'ed, *
>  	 * as well as null execs.                                         */
> -	jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT;
> +	jobtab[thisjob].stat |= STAT_CURSH;
> +	if (!jobtab[thisjob].procs)
> +	    jobtab[thisjob].stat |= STAT_NOPRINT;
>      } else {
>  	/* This is an exec (real or fake) for an external command.    *
>  	 * Note that any form of exec means that the subshell is fake *

Looks fairly plausible, anyway.
 
> When "read" is the tail of the pipe, the above all happens behind the
> scenes and then the I/O system call gets restarted, which is how the
> shell ends up stuck.  I'm not sure how to escape from that, except
> maybe to have zhandler() kill the shell with a different signal from
> which the system call will not recover.

I suppose so.  I can hardly believe this ever worked.
 
> When something like "true" is the tail of the pipe, we return into
> execpline at line 1500 (from waitjobs()), where list_pipe_job is set
> but list_pipe and list_pipe_child are not.  If all three were nonzero,
> a dummy shell would be forked off as PWS described to act as the
> suspended job, but instead execpline() simply returns because the
> last job in the pipeline has exited.
> 
> The only obvious thing I can think to do here is to note in zhandler()
> that we have STAT_CURSH but not list_pipe, and therefore SIGCONT the
> left-hand-side immediately and return as if no signal had occurred
> (possibly printing a warning about not being able to suspend the job,
> which is what happens elsewhere if pipe() or fork() fails).  However,
> that could lead to a serious busy-loop if somehow TTIN or TTOU was
> the signal instead of TSTP.

But we can test if it's TSTP (or STOP)?

There are so many special cases here --- there's one for each state the
current shell might be in during the process, each of which would be
handled straighforwardly if the processing was in another shell --- that
even if it sometimes worked before it's not that surprising if it's
sensitive to knock-on effects.  I suppose it would be nice to be able to
test the things that seem to be working, at least, which would need some
care and probably a bit of extra test framework.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-08 18:27       ` Peter Stephenson
@ 2011-08-09  6:10         ` Bart Schaefer
  2011-08-09 20:19           ` Peter Stephenson
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Schaefer @ 2011-08-09  6:10 UTC (permalink / raw)
  To: zsh-workers

On Aug 8,  7:27pm, Peter Stephenson wrote:
} Subject: Re: How to misplace an entire pipeline
}
} On Sun, 07 Aug 2011 21:05:07 -0700
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > -	jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT;
} > +	jobtab[thisjob].stat |= STAT_CURSH;
} > +	if (!jobtab[thisjob].procs)
} > +	    jobtab[thisjob].stat |= STAT_NOPRINT;
} 
} Looks fairly plausible, anyway.

Should that be "if (hasprocs(thisjob))" instead, do you think?

} > When "read" is the tail of the pipe, the above all happens behind the
} > scenes and then the I/O system call gets restarted, which is how the
} > shell ends up stuck.  I'm not sure how to escape from that, except
} > maybe to have zhandler() kill the shell with a different signal from
} > which the system call will not recover.
} 
} I suppose so.  I can hardly believe this ever worked.

It didn't, exactly.  This particular one behaves almost the same in
4.3.9 as it does in recent dev releases.  The main difference is that
in 4.3.9 the ^Z echoes to the terminal; in 4.3.12 there's no feedback
at all.  I haven't figured out how/why that changed.

} > The only obvious thing I can think to do here is to note in zhandler()
} > that we have STAT_CURSH but not list_pipe, and therefore SIGCONT the
} > left-hand-side immediately and return as if no signal had occurred
} > (possibly printing a warning about not being able to suspend the job,
} > which is what happens elsewhere if pipe() or fork() fails).  However,
} > that could lead to a serious busy-loop if somehow TTIN or TTOU was
} > the signal instead of TSTP.
} 
} But we can test if it's TSTP (or STOP)?

Yes, when WSTOPSIG() is defined.  The parent shell never gets the
original signal, it only gets the SIGCHLD and can then examine the
return value from wait3() or whatever system call was available.

SIGSTOP can't be caught or blocked so in that case it's likely the
parent shell has also stopped.  I'm not sure it'd be good behavior
for the shell to auto-resume a process that was hit with STOP, even
if it could.

So, some more about what's going on ...

Near the end of execpline2() there's this snippet:

	/* if another execpline() is invoked because the command is *
	 * a list it must know that we're already in a pipeline     */
	cmdpush(CS_PIPE);
	list_pipe = 1;
	execpline2(state, *state->pc++, how, pipes[0], output, last1);
	list_pipe = old_list_pipe;
	cmdpop();

Somewhere in the call stack beyond this execpline2(), execbuiltin() is
called.  If that returns immediately we climb out of execpline2() and
clear list_pipe and wait for the pipeline at execpline() line 1500.
However, in the case of "read", execbuiltin() blocks on the I/O, so
list_pipe remains true, and when the TSTP is received, update_job()
proceeds as if execpline() is going to fork.  

So the following hack does the right thing in the case of piping to
true where execbuiltin() has returned, but does not do the right thing
in the case of read.  What needs to be tested in place of (!list_pipe)
to determine that the tail of the current pipeline is a simple shell
builtin?

In fact even that may not be enough, maybe this needs to know if the
current job is a simple shell builtin -- it might be blocked on a
"while read; do ..." or on a "wait" in the middle of a loop, etc.

--- ../zsh-forge/current/Src/signals.c	2011-08-06 11:13:23.000000000 -0700
+++ Src/signals.c	2011-08-08 22:28:11.000000000 -0700
@@ -490,14 +490,20 @@
 	 * update it.
 	 */
 	if (findproc(pid, &jn, &pn, 0)) {
+	    if (!list_pipe && (jn->stat & STAT_CURSH) &&
+		WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) {
+		killjb(jn, SIGCONT);
+		zwarn("job can't be suspended");
+	    } else {
 #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE)
-	    struct timezone dummy_tz;
-	    gettimeofday(&pn->endtime, &dummy_tz);
-	    pn->status = status;
-	    pn->ti = ru;
+		struct timezone dummy_tz;
+		gettimeofday(&pn->endtime, &dummy_tz);
+		pn->status = status;
+		pn->ti = ru;
 #else
-	    update_process(pn, status);
+		update_process(pn, status);
 #endif
+	    }
 	    update_job(jn);
 	} else if (findproc(pid, &jn, &pn, 1)) {
 	    pn->status = status;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-09  6:10         ` Bart Schaefer
@ 2011-08-09 20:19           ` Peter Stephenson
  2011-08-13 18:52             ` Bart Schaefer
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Stephenson @ 2011-08-09 20:19 UTC (permalink / raw)
  To: zsh-workers

On Mon, 08 Aug 2011 23:10:32 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Aug 8,  7:27pm, Peter Stephenson wrote:
> } Subject: Re: How to misplace an entire pipeline
> }
> } On Sun, 07 Aug 2011 21:05:07 -0700
> } Bart Schaefer <schaefer@brasslantern.com> wrote:
> } > -	jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT;
> } > +	jobtab[thisjob].stat |= STAT_CURSH;
> } > +	if (!jobtab[thisjob].procs)
> } > +	    jobtab[thisjob].stat |= STAT_NOPRINT;
> } 
> } Looks fairly plausible, anyway.
> 
> Should that be "if (hasprocs(thisjob))" instead, do you think?

To a first approximation, the user isn't directly interested in the aux
procs, but there might still be some knock-on effect.  So until we find
otherwise, I would guess not.
 
> So the following hack does the right thing in the case of piping to
> true where execbuiltin() has returned, but does not do the right thing
> in the case of read.  What needs to be tested in place of (!list_pipe)
> to determine that the tail of the current pipeline is a simple shell
> builtin?

I don't think we've remembered that fact, but it's available (as
is_builtin) in execcmd(), so could be stored before the builtin is
called.  What I don't know about is where to put that information (is it
as simple as using thisjob?), and what might invalidate it.

> In fact even that may not be enough, maybe this needs to know if the
> current job is a simple shell builtin -- it might be blocked on a
> "while read; do ..." or on a "wait" in the middle of a loop, etc.

I suppose it depends on what cases the list_pipe logic would be
triggered.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-09 20:19           ` Peter Stephenson
@ 2011-08-13 18:52             ` Bart Schaefer
  2011-08-13 20:15               ` Bart Schaefer
  2011-09-12 13:51               ` Alexey I. Froloff
  0 siblings, 2 replies; 16+ messages in thread
From: Bart Schaefer @ 2011-08-13 18:52 UTC (permalink / raw)
  To: zsh-workers

On Aug 9,  9:19pm, Peter Stephenson wrote:
} Subject: Re: How to misplace an entire pipeline
}
} > } > +	jobtab[thisjob].stat |= STAT_CURSH;
} > } > +	if (!jobtab[thisjob].procs)
} > } > +	    jobtab[thisjob].stat |= STAT_NOPRINT;
} > 
} > Should that be "if (hasprocs(thisjob))" instead, do you think?
} 
} To a first approximation, the user isn't directly interested in the aux
} procs, but there might still be some knock-on effect.  So until we find
} otherwise, I would guess not.

Assuming that the aux procs are not subject to tty process group signals, 
it should be harmless to ignore them.

I believe the patch below would make it possible to back out the above,
because when the pipeline can't be stopped it's impossible to examine
the job table anyway.  However, I think it's not a problem to leave it,
and there may be a few cases where it affects $jobtexts et al. in a
useful way.

} > What needs to be tested in place of (!list_pipe) to determine that
} > the tail of the current pipeline is a simple shell builtin?
} 
} I don't think we've remembered that fact, but it's available (as
} is_builtin) in execcmd(), so could be stored before the builtin is
} called.

It appears sufficient to add this as another STAT_* bitflag in the
job structure.  It might actually make sense to make list_pipe into
a bitflag as well, and mark every job in the table for which it was
true at the time the job began, but I'm not prepared for that dive.

} > In fact even that may not be enough, maybe this needs to know if the
} > current job is a simple shell builtin -- it might be blocked on a
} > "while read; do ..." or on a "wait" in the middle of a loop, etc.
} 
} I suppose it depends on what cases the list_pipe logic would be
} triggered.

I fooled around with a few cases and I think I caught this one too.
At this point list_pipe is true but the job receiving the stop signal
is not in the process list of the current job (thisjob), so it's
necessary to check whether the current job is a builtin.  Unless you
can think of a case where thisjob would not correctly identify the
foreground job at the time the signal is handled?

I'm still a little nervous about coughing up that zwarn() in the
signal handler but I can't find a better place for it.  One drawback
here is that if the tail of the pipeline is a loop that's rapidly
alternating between a builtin and an external command, ^Z might stop
the loop sometimes and issue the warning other times, seemingly at
random.

diff -u ../zsh-forge/current/Src/exec.c ./Src/exec.c
--- ../zsh-forge/current/Src/exec.c	2011-08-09 20:01:14.000000000 -0700
+++ ./Src/exec.c	2011-08-13 11:00:36.000000000 -0700
@@ -2848,6 +2848,8 @@
 	jobtab[thisjob].stat |= STAT_CURSH;
 	if (!jobtab[thisjob].procs)
 	    jobtab[thisjob].stat |= STAT_NOPRINT;
+	if (is_builtin)
+	  jobtab[thisjob].stat |= STAT_BUILTIN;
     } else {
 	/* This is an exec (real or fake) for an external command.    *
 	 * Note that any form of exec means that the subshell is fake *
diff -u ../zsh-forge/current/Src/signals.c ./Src/signals.c
--- ../zsh-forge/current/Src/signals.c	2011-08-06 11:13:23.000000000 -0700
+++ ./Src/signals.c	2011-08-11 09:55:26.000000000 -0700
@@ -490,14 +490,21 @@
 	 * update it.
 	 */
 	if (findproc(pid, &jn, &pn, 0)) {
+	    if (((jn->stat & STAT_BUILTIN) ||
+		 (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) &&
+		WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) {
+		killjb(jn, SIGCONT);
+		zwarn("job can't be suspended");
+	    } else {
 #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE)
-	    struct timezone dummy_tz;
-	    gettimeofday(&pn->endtime, &dummy_tz);
-	    pn->status = status;
-	    pn->ti = ru;
+		struct timezone dummy_tz;
+		gettimeofday(&pn->endtime, &dummy_tz);
+		pn->status = status;
+		pn->ti = ru;
 #else
-	    update_process(pn, status);
+		update_process(pn, status);
 #endif
+	    }
 	    update_job(jn);
 	} else if (findproc(pid, &jn, &pn, 1)) {
 	    pn->status = status;
diff -u ../zsh-forge/current/Src/zsh.h ./Src/zsh.h
--- ../zsh-forge/current/Src/zsh.h	2011-05-13 21:08:04.000000000 -0700
+++ ./Src/zsh.h	2011-08-10 09:10:35.000000000 -0700
@@ -907,6 +907,8 @@
 #define STAT_ATTACH	(0x1000) /* delay reattaching shell to tty       */
 #define STAT_SUBLEADER  (0x2000) /* is super-job, but leader is sub-shell */

+#define STAT_BUILTIN    (0x4000) /* job at tail of pipeline is a builtin */
+
 #define SP_RUNNING -1		/* fake status for jobs currently running */

 struct timeinfo {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-13 18:52             ` Bart Schaefer
@ 2011-08-13 20:15               ` Bart Schaefer
  2011-09-12 13:51               ` Alexey I. Froloff
  1 sibling, 0 replies; 16+ messages in thread
From: Bart Schaefer @ 2011-08-13 20:15 UTC (permalink / raw)
  To: zsh-workers

Here's a not-new bug (occurs in, e.g., 4.2.0) that I found while trying
to test the preceding patch for side-effects.

torch% sleep 200 | sleep 300 | while :; do read '?GO:' < /dev/tty; done
(rapid infinite loop printing "GO:" repeatedly)
^Z
zsh: suspended  sleep 200 | sleep 300 | 
zsh: running    while :; do; read '?GO:' < /dev/tty; done
torch% 

Note that it says the "while" loop is running, even though control has
returned to the top-level prompt.

If you bring this job back into the foreground, it now properly blocks
on the "read":

torch% fg
[1]  + continued  sleep 200 | sleep 300 | while :; do; read '?GO:' < /dev/tty;
done
GO:

Again, my patch didn't change this behavior, it's been this way for at
least several revisions.

It works somewhat better with "read -q".

In 4.2.0, at the prompt printed by read, ^Z is silently ignored.  Here is
what happens with ^C instead:

torch% sleep 200 | sleep 300 | while :; do read -q '?GO:' ; done  
GO:
torch% jobs
[1]    running    sleep 200 | sleep 300
torch% fg
fg: no current job
torch% %1
[1]  + running    sleep 200 | sleep 300

(These are now in the foreground.)

With my patch from 29677, 4.3.12-dev-1 still silently ignores the ^Z,
but now:

torch% sleep 200 | sleep 300 | while :; do read -q '?GO:' ; done  
GO:%
torch% 
[1]    interrupt  sleep 200 | sleep 300
torch% 

Which seems to make more sense.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-08-13 18:52             ` Bart Schaefer
  2011-08-13 20:15               ` Bart Schaefer
@ 2011-09-12 13:51               ` Alexey I. Froloff
  2011-09-12 16:03                 ` Bart Schaefer
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey I. Froloff @ 2011-09-12 13:51 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

On Sat, Aug 13, 2011 at 11:52:08AM -0700, Bart Schaefer wrote:
>  	if (findproc(pid, &jn, &pn, 0)) {
> +	    if (((jn->stat & STAT_BUILTIN) ||
> +		 (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) &&
> +		WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) {
> +		killjb(jn, SIGCONT);
> +		zwarn("job can't be suspended");
> +	    } else {
Sometimes, zsh segfaults here while doing B03print.ztst in POSIX
locale with thisjob equals to -1.  Not 100% reproducable, though.

$ LC_ALL=C zsh +Z -f ./ztst.zsh ./B03print.ztst
./B03print.ztst: starting.
[1]    20076 segmentation fault (core dumped)  LC_ALL=C zsh +Z -f ./ztst.zsh ./B03print.ztst
...

Core was generated by `zsh +Z -f ./ztst.zsh ./B03print.ztst'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004745b1 in wait_for_processes () at signals.c:494
494                      (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) &&
(gdb) list
489              * Find the process and job containing this pid and
490              * update it.
491              */
492             if (findproc(pid, &jn, &pn, 0)) {
493                 if (((jn->stat & STAT_BUILTIN) ||
494                      (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) &&
495                     WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) {
496                     killjb(jn, SIGCONT);
497                     zwarn("job can't be suspended");
498                 } else {
(gdb) p thisjob
$1 = -1

-- 
Regards,    --
Sir Raorn.   --- http://thousandsofhate.blogspot.com/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-09-12 13:51               ` Alexey I. Froloff
@ 2011-09-12 16:03                 ` Bart Schaefer
  2011-09-12 16:18                   ` Bart Schaefer
  2011-09-12 16:27                   ` Alexey I. Froloff
  0 siblings, 2 replies; 16+ messages in thread
From: Bart Schaefer @ 2011-09-12 16:03 UTC (permalink / raw)
  To: zsh-workers

On Sep 12,  5:51pm, Alexey I. Froloff wrote:
}
} Sometimes, zsh segfaults here while doing B03print.ztst in POSIX
} locale with thisjob equals to -1.  Not 100% reproducable, though.

This is why I wondered in the original patch message:
: At this point list_pipe is true but the job receiving the stop signal
: is not in the process list of the current job (thisjob), so it's
: necessary to check whether the current job is a builtin.  Unless you
: can think of a case where thisjob would not correctly identify the
: foreground job at the time the signal is handled?

If I understand correctly, thisjob == -1 means there is no foreground
job.  I'm scratching my head over how list_pipe can be true when there
is no job in the foreground.  Could you please run

    ZTST_verbose=2 make TESTNUM=B03 check

to verify which test is causing the segmentation fault?  I suspect it's
the very last test with "print -lO ... | while read ..." but I'd like
to be sure.  (OTOH it may not be possible to reproduce the fault with
verbose output, but it's worth a try.)

I further suspect the signal is arriving during foo=$line[i] or some
other part of the inner loop where no actual command is being run, but
list_pipe remains true because of the outer loop.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-09-12 16:03                 ` Bart Schaefer
@ 2011-09-12 16:18                   ` Bart Schaefer
  2011-09-12 16:35                     ` Peter Stephenson
  2011-09-12 16:27                   ` Alexey I. Froloff
  1 sibling, 1 reply; 16+ messages in thread
From: Bart Schaefer @ 2011-09-12 16:18 UTC (permalink / raw)
  To: zsh-workers

On Sep 12,  9:03am, Bart Schaefer wrote:
}
} If I understand correctly, thisjob == -1 means there is no foreground
} job.  I'm scratching my head over how list_pipe can be true when there
} is no job in the foreground.

In particular I can't decide whether the fix is to do this:

494c494,496
< 		 (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) &&
---
> 		 (list_pipe &&
> 		  (thisjob == -1 ||
> 		   (jobtab[thisjob].stat & STAT_BUILTIN)))) &&

Or this:

494c494,496
< 		 (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) &&
---
> 		 (list_pipe &&
> 		  (thisjob != -1 &&
> 		   (jobtab[thisjob].stat & STAT_BUILTIN)))) &&

The first way treats no-foreground-job as if a shell builtin were in the
foreground.  The second way treats no-foreground-job as if an external
command were in the foreground.  I tend to think the former is the way
to go here -- if there's no job in the foreground, then the parent shell
must be in the foreground, and hence nothing is prepared to restart the
signalled job if it gets stopped.  However, I'm not that confident that
I know all the cases where thisjob == -1 may be true.

Any help here?  If not we're going with plan (A).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-09-12 16:18                   ` Bart Schaefer
@ 2011-09-12 16:35                     ` Peter Stephenson
  2011-09-15 14:57                       ` Alexey I. Froloff
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Stephenson @ 2011-09-12 16:35 UTC (permalink / raw)
  To: zsh-workers

On Mon, 12 Sep 2011 09:18:55 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 12,  9:03am, Bart Schaefer wrote:
> } If I understand correctly, thisjob == -1 means there is no foreground
> } job.  I'm scratching my head over how list_pipe can be true when there
> } is no job in the foreground.

If we can't decide this, there's no obvious way of choosing between the
alternative fixes, so you should probably just pick one.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Follow CSR on Twitter at http://twitter.com/CSR_PLC and read our blog at www.csr.com/blog

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-09-12 16:35                     ` Peter Stephenson
@ 2011-09-15 14:57                       ` Alexey I. Froloff
  2011-09-16 16:16                         ` Bart Schaefer
  0 siblings, 1 reply; 16+ messages in thread
From: Alexey I. Froloff @ 2011-09-15 14:57 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 366 bytes --]

On Mon, Sep 12, 2011 at 05:35:27PM +0100, Peter Stephenson wrote:
> If we can't decide this, there's no obvious way of choosing between the
> alternative fixes, so you should probably just pick one.
So, have you decided what to do with this? If coin toss can help,
I just got heads.

-- 
Regards,    --
Sir Raorn.   --- http://thousandsofhate.blogspot.com/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-09-15 14:57                       ` Alexey I. Froloff
@ 2011-09-16 16:16                         ` Bart Schaefer
  0 siblings, 0 replies; 16+ messages in thread
From: Bart Schaefer @ 2011-09-16 16:16 UTC (permalink / raw)
  To: Zsh hackers list

I've been traveling for the last several days (still am, in fact;
writing this on a train through New England) but will commit "the
first way" when I return home next week, unless someone else wants to
grab the diff and do it for me.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: How to misplace an entire pipeline
  2011-09-12 16:03                 ` Bart Schaefer
  2011-09-12 16:18                   ` Bart Schaefer
@ 2011-09-12 16:27                   ` Alexey I. Froloff
  1 sibling, 0 replies; 16+ messages in thread
From: Alexey I. Froloff @ 2011-09-12 16:27 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1865 bytes --]

On Mon, Sep 12, 2011 at 09:03:53AM -0700, Bart Schaefer wrote:
> to verify which test is causing the segmentation fault?  I suspect it's
> the very last test with "print -lO ... | while read ..." but I'd like
> to be sure.  (OTOH it may not be possible to reproduce the fault with
> verbose output, but it's worth a try.)

Test successful.
ZTST_test: looking for new test
ZTST_test: examining line:

ZTST_test: examining line:
 print -lO $'a' $'a\0' $'a\0b' $'a\0b\0' $'a\0b\0a' $'a\0b\0b' $'a\0c' |
ZTST_getchunk: read code chunk:
 print -lO $'a' $'a\0' $'a\0b' $'a\0b\0' $'a\0b\0a' $'a\0b\0b' $'a\0c' |
 while read -r line; do
    for (( i = 1; i <= ${#line}; i++ )); do
      foo=$line[i]
      printf "%02x" $(( #foo ))
    done
    print
 done
ZTST_test: examining line:
>610063
ZTST_getredir: read redir for '>':
610063
6100620062
6100620061
61006200
610062
6100
61
ZTST_test: examining line:

Running test: sorting with embedded nulls
ZTST_test: expecting status: 0
Input: /usr/src/tmp/zsh.ztst.in.2764, output: /usr/src/tmp/zsh.ztst.out.2764, error: /usr/src/tmp/zsh.ztst.terr.2764
**************************************
0 successful test scripts, 1 failure, 0 skipped
**************************************
make[1]: *** [check] Error 1
make[1]: Leaving directory `/usr/src/RPM/BUILD/zsh-4.3.12/Test'
make: *** [check] Error 2


> I further suspect the signal is arriving during foo=$line[i] or some
> other part of the inner loop where no actual command is being run, but
> list_pipe remains true because of the outer loop.
I think there's a race condition somewhere.  I've managed to
reproduce this error on an average loaded system with decent swap
usage.  Fail chance is about 15-20%% on my system and >80% on
build server.

-- 
Regards,    --
Sir Raorn.   --- http://thousandsofhate.blogspot.com/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-09-16 16:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-06  3:31 How to misplace an entire pipeline Bart Schaefer
2011-08-07 17:50 ` Peter Stephenson
2011-08-07 21:43   ` Bart Schaefer
2011-08-08  4:05     ` Bart Schaefer
2011-08-08 18:27       ` Peter Stephenson
2011-08-09  6:10         ` Bart Schaefer
2011-08-09 20:19           ` Peter Stephenson
2011-08-13 18:52             ` Bart Schaefer
2011-08-13 20:15               ` Bart Schaefer
2011-09-12 13:51               ` Alexey I. Froloff
2011-09-12 16:03                 ` Bart Schaefer
2011-09-12 16:18                   ` Bart Schaefer
2011-09-12 16:35                     ` Peter Stephenson
2011-09-15 14:57                       ` Alexey I. Froloff
2011-09-16 16:16                         ` Bart Schaefer
2011-09-12 16:27                   ` Alexey I. Froloff

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).