interrupt handling bug (again?)

zsh-workers
 help / color / mirror / code / Atom feed

* interrupt handling bug (again?)
@ 2017-06-06 19:08 Mikael Magnusson
  2017-06-24 19:03 ` Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Mikael Magnusson @ 2017-06-06 19:08 UTC (permalink / raw)
  To: zsh workers

if you do this
% for a in 1 2 3; do xterm; done
then hit ctrl-z in that term and bg it, do stuff and at some point hit
ctrl-c, the backgrounded for loop will be interrupted after the
running xterm exits. This happens even if you disown it, but not if
you run it with & or &|.

-- 
Mikael Magnusson

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-06-06 19:08 interrupt handling bug (again?) Mikael Magnusson
@ 2017-06-24 19:03 ` Bart Schaefer
  2017-06-30 13:33   ` Peter Stephenson
  2017-07-03 15:09   ` Peter Stephenson
  0 siblings, 2 replies; 9+ messages in thread
From: Bart Schaefer @ 2017-06-24 19:03 UTC (permalink / raw)
  To: zsh workers; +Cc: Mikael Magnusson

On Jun 6,  9:08pm, Mikael Magnusson wrote:
}
} % for a in 1 2 3; do xterm; done
} then hit ctrl-z in that term and bg it, do stuff and at some point hit
} ctrl-c, the backgrounded for loop will be interrupted

I'm not 100% sure but this is most likely a TTY process group issue.

If I change "xterm" to

% for a in 1 2 3; do; sleep 10; echo $a; done

then I can reproduce the behavior on "bg".  However, if I stop, bg,
*and* disown, I end up with the sleep resumed, but the subshell that
handles the loop is still stopped.  Which is an entirely different
but related bug.

} This happens even if you disown it, but not if
} you run it with & or &|.

Here's what I think is going on:

When you run with & or &|, it's known "up front" that the job will be
in the background, so the entire loop is immediately disconnected
from the terminal and with &! also discarded from the job table.

If instead you start in the foreground, the loop is in the parent
shell, so by definition is attached to the terminal.  This is very
important because it means that "xterm" (or in my case "sleep") is
the foreground job.

When you ^Z, that suspends the foreground job.  The parent zsh gets
a wait-status of stopped, but it knows it's in a loop, so it forks
and also suspends the new subshell to be able to keep the loop state
without suspending the parent.

This is where things get weird.  When you "bg" that only resumes the
previous foreground job, it does not resume the forked subshell (if
it did, the next command in the loop would immediately start).  That
means the loop-subhell is still in the job table.  It stays there
(still stopped) until the previous foreground job finishes, at which
point the parent continues and backgrounds it to resume looping.

However, if you interrupt before the foreground job is done, the
parent faithfully propagates the signal to the entry in the job
table, and that kills the loop when it finally does restart.  If you
instead disown before the foreground job is done, the job table entry
for the foreground job is deleted, so the parent never notices that
it has exited and never resumes the loop at all.

That latter bit is going to be nearly impossible to fix.  Only the
original parent shell can manage both the foreground job and the
suspended subshell, which means it *can't* disown both of them if
the loop is to remain blocked until the opportune moment.  I think
the only answer will be to refuse to disown.  We already have a
warning for jobs that can't be suspended, and one for disowning a job
that is going to remain stopped.

The former issue can probably be fixed by skipping those subshell
job entries when propagating the TTY signal.  We must be able to
find those entries already to resume them when the current loop
foreground job exits, so this shouldn't be impossible.

I don't have time to look into this any further at this point.  PWS?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-06-24 19:03 ` Bart Schaefer
@ 2017-06-30 13:33   ` Peter Stephenson
  2017-06-30 21:16     ` Peter Stephenson
  2017-07-03 15:09   ` Peter Stephenson
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2017-06-30 13:33 UTC (permalink / raw)
  To: zsh workers

On Sat, 24 Jun 2017 12:03:10 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:

> On Jun 6,  9:08pm, Mikael Magnusson wrote:
> }
> } % for a in 1 2 3; do xterm; done
> } then hit ctrl-z in that term and bg it, do stuff and at some point hit
> } ctrl-c, the backgrounded for loop will be interrupted
> 
> I'm not 100% sure but this is most likely a TTY process group issue.
> 
> If I change "xterm" to
> 
> % for a in 1 2 3; do; sleep 10; echo $a; done
> 
> then I can reproduce the behavior on "bg".

Yes, this version is very straightforward to see.

> ...if you interrupt before the foreground job is done, the
> parent faithfully propagates the signal to the entry in the job
> table, and that kills the loop when it finally does restart.

I think that is happening, but I'm not sure where.

The shell forked to run the rest of the loop is running as a SUPERJOB
with a different PGID from the parent shell (I checked the latter
though haven't gone into the SUBJOB / SUPERJOB code again yet). I think
that means it should never need to get the SIGINT. The parent shell
knows (or can deduce simply by means of the job table) it's now not
associated with a foreground SUBJOB, so I don't see why it shouldn't be
able to handle this case properly.

Haven't tracked down the specific code, though.

pws

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-06-30 13:33   ` Peter Stephenson
@ 2017-06-30 21:16     ` Peter Stephenson
  2017-07-02 19:32       ` Peter Stephenson
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2017-06-30 21:16 UTC (permalink / raw)
  To: zsh workers

On Fri, 30 Jun 2017 14:33:47 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:

> On Sat, 24 Jun 2017 12:03:10 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> 
> > On Jun 6,  9:08pm, Mikael Magnusson wrote:
> > }
> > } % for a in 1 2 3; do xterm; done
> > } then hit ctrl-z in that term and bg it, do stuff and at some point hit
> > } ctrl-c, the backgrounded for loop will be interrupted
> > ...if you interrupt before the foreground job is done, the
> > parent faithfully propagates the signal to the entry in the job
> > table, and that kills the loop when it finally does restart.
> 
> I think that is happening, but I'm not sure where.

It's check_cursh_sig(SIGINT) from the interrupt handler.  It propagates
to jobs marked as STAT_CURSH.

I think when the SUPERJOB is put into the background it should no longer
have STAT_CURSH status.  Does that sound reasonable?  I'm not sure why
it would have it anyway, should it perhaps be removed when we mark it as
STAT_SUPERJOB, which is kind of decurrentshellising it in any case?

Anyway, this minimal fix seems to do the right thing for the case in
question.  It might have side effects, though.  More specificaly, if it
didn't have side effects that would be some kind of record.

pws

diff --git a/Src/jobs.c b/Src/jobs.c
index d1b98ac..09a8bab 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -2303,6 +2303,7 @@ bin_fg(char *name, char **argv, Options ops, int func)
 			}
 			pn = next;
 		    }
+		    jobtab[job].stat &= ~STAT_CURSH;
 		}
 	    } else if (func == BIN_BG) {
 		/* Silly to bg a job already running. */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-06-30 21:16     ` Peter Stephenson
@ 2017-07-02 19:32       ` Peter Stephenson
  2017-07-02 20:05         ` Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2017-07-02 19:32 UTC (permalink / raw)
  To: zsh workers

On Fri, 30 Jun 2017 22:16:10 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> I think when the SUPERJOB is put into the background it should no longer
> have STAT_CURSH status.  Does that sound reasonable?  I'm not sure why
> it would have it anyway, should it perhaps be removed when we mark it as
> STAT_SUPERJOB, which is kind of decurrentshellising it in any case?

Done this way, tied to something else we don't want when something's
backgrouned, it's probably good enough for now.

pws

diff --git a/Src/jobs.c b/Src/jobs.c
index d1b98ac..32f7daa 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -2288,8 +2288,10 @@ bin_fg(char *name, char **argv, Options ops, int func)
 	case BIN_FG:
 	case BIN_BG:
 	case BIN_WAIT:
-	    if (func == BIN_BG)
+	    if (func == BIN_BG) {
 		jobtab[job].stat |= STAT_NOSTTY;
+		jobtab[job].stat &= ~STAT_CURSH;
+	    }
 	    if ((stopped = (jobtab[job].stat & STAT_STOPPED))) {
 		makerunning(jobtab + job);
 		if (func == BIN_BG) {


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-07-02 19:32       ` Peter Stephenson
@ 2017-07-02 20:05         ` Bart Schaefer
  2017-07-02 20:22           ` Peter Stephenson
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2017-07-02 20:05 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh workers

Thanks for digging into this.  I'm not git-enabled at the moment, is
41386 instead of or as well as 41384?

Now we just have to deal with (not) disowning those jobs.  I suppose
we could delay the disown until they are resumed, or something.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-07-02 20:05         ` Bart Schaefer
@ 2017-07-02 20:22           ` Peter Stephenson
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2017-07-02 20:22 UTC (permalink / raw)
  To: zsh workers

On Sun, 2 Jul 2017 13:05:18 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> Thanks for digging into this.  I'm not git-enabled at the moment, is
> 41386 instead of or as well as 41384?

They're alternatives.  The second is just slightly neater but not
fundameentally different.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-06-24 19:03 ` Bart Schaefer
  2017-06-30 13:33   ` Peter Stephenson
@ 2017-07-03 15:09   ` Peter Stephenson
  2017-07-10  9:36     ` Peter Stephenson
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2017-07-03 15:09 UTC (permalink / raw)
  To: Zsh Hackers' List

Backgrounded shell code with subproceses:

On Sat, 24 Jun 2017 12:03:10 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> If you
> instead disown before the foreground job is done, the job table entry
> for the foreground job is deleted, so the parent never notices that
> it has exited and never resumes the loop at all.

I agree with this.

However, I think we can do slightly better by delaying the disown to the
point where we send SIGCONT to the superjob (the background copy of the
shell process) after finding it has no associated processes in the
current shell. At that point it will start its own processes, so we
never need to interact with it again.

This seems to work, and I've documented the effect.  Is it weird enough
that we should print a warning from "disown", or is that just going to
cause more confusion than it removes (warnings of a technical nature
often do, in my experience)?  I reckon simply pointing people at
the documentation if they ask is probably good enough.

BTW, I think I don't need to check if there are actually processes
associated with the superjob, since if there aren't (in which case it's
marked was WASSUPER instead of SUPERJOB) disowning immediately has no
bad effects.

pws

diff --git a/Doc/Zsh/jobs.yo b/Doc/Zsh/jobs.yo
index 6262dd2..44e0a44 100644
--- a/Doc/Zsh/jobs.yo
+++ b/Doc/Zsh/jobs.yo
@@ -49,6 +49,12 @@ in the parent shell.  Thus the behaviour is different from the case
 where the function was never suspended.  Zsh is different from many
 other shells in this regard.
 
+One additional side effect is that use of tt(disown) with a job
+created by suspending shell code in this fashion is delayed: the
+job can only be disowned once any process started from the parent
+shell has terminated.  At that point, the job disappears silently
+from the process list.
+
 The same behaviour is found when the shell is executing code as the
 right hand side of a pipeline or any complex shell construct such as
 tt(if), tt(for), etc., in order that the entire block of code
diff --git a/Src/jobs.c b/Src/jobs.c
index 32f7daa..66dfb5a 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -277,6 +277,10 @@ handle_sub(int job, int fg)
 		(!jn->procs->next || cp || jn->procs->pid != jn->gleader))
 		attachtty(jn->gleader);
 	    kill(sj->other, SIGCONT);
+	    if (jn->stat & STAT_DISOWN)
+	    {
+		deletejob(jn, 1);
+	    }
 	}
 	curjob = jn - jobtab;
     } else if (sj->stat & STAT_STOPPED) {
@@ -2375,6 +2379,10 @@ bin_fg(char *name, char **argv, Options ops, int func)
 	    printjob(job + (oldjobtab ? oldjobtab : jobtab), lng, 2);
 	    break;
 	case BIN_DISOWN:
+	    if (jobtab[job].stat & STAT_SUPERJOB) {
+		jobtab[job].stat |= STAT_DISOWN;
+		continue;
+	    }
 	    if (jobtab[job].stat & STAT_STOPPED) {
 		char buf[20], *pids = "";
 
diff --git a/Src/zsh.h b/Src/zsh.h
index 137b2a5..a5b4d8f 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1029,6 +1029,7 @@ struct job {
 #define STAT_BUILTIN    (0x4000) /* job at tail of pipeline is a builtin */
 #define STAT_SUBJOB_ORPHANED (0x8000)
                                  /* STAT_SUBJOB with STAT_SUPERJOB exited */
+#define STAT_DISOWN     (0x10000) /* STAT_SUPERJOB with disown pending */
 
 #define SP_RUNNING -1		/* fake status for jobs currently running */
 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: interrupt handling bug (again?)
  2017-07-03 15:09   ` Peter Stephenson
@ 2017-07-10  9:36     ` Peter Stephenson
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2017-07-10  9:36 UTC (permalink / raw)
  To: Zsh Hackers' List

On Mon, 3 Jul 2017 16:09:33 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> Backgrounded shell code with subproceses:
> 
> On Sat, 24 Jun 2017 12:03:10 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > If you
> > instead disown before the foreground job is done, the job table entry
> > for the foreground job is deleted, so the parent never notices that
> > it has exited and never resumes the loop at all.
> 
> I agree with this.
> 
> However, I think we can do slightly better by delaying the disown to the
> point where we send SIGCONT to the superjob (the background copy of the
> shell process) after finding it has no associated processes in the
> current shell. At that point it will start its own processes, so we
> never need to interact with it again.

It's been going through my mind that we can also add STAT_NOPRINT to the
job at the first point to prevent it showing up in the job list in the
mean time.  But this might be too clever by half, causing confusion, for
example, if you exit the parent shell (a common reason for wanting to
disown a job) before it's had a chance to restart the superjob.  So I
don't think I'm going to do this.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-07-10  9:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-06 19:08 interrupt handling bug (again?) Mikael Magnusson
2017-06-24 19:03 ` Bart Schaefer
2017-06-30 13:33   ` Peter Stephenson
2017-06-30 21:16     ` Peter Stephenson
2017-07-02 19:32       ` Peter Stephenson
2017-07-02 20:05         ` Bart Schaefer
2017-07-02 20:22           ` Peter Stephenson
2017-07-03 15:09   ` Peter Stephenson
2017-07-10  9:36     ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).