* interrupt handling bug (again?) @ 2017-06-06 19:08 Mikael Magnusson 2017-06-24 19:03 ` Bart Schaefer 0 siblings, 1 reply; 9+ messages in thread From: Mikael Magnusson @ 2017-06-06 19:08 UTC (permalink / raw) To: zsh workers if you do this % for a in 1 2 3; do xterm; done then hit ctrl-z in that term and bg it, do stuff and at some point hit ctrl-c, the backgrounded for loop will be interrupted after the running xterm exits. This happens even if you disown it, but not if you run it with & or &|. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-06-06 19:08 interrupt handling bug (again?) Mikael Magnusson @ 2017-06-24 19:03 ` Bart Schaefer 2017-06-30 13:33 ` Peter Stephenson 2017-07-03 15:09 ` Peter Stephenson 0 siblings, 2 replies; 9+ messages in thread From: Bart Schaefer @ 2017-06-24 19:03 UTC (permalink / raw) To: zsh workers; +Cc: Mikael Magnusson On Jun 6, 9:08pm, Mikael Magnusson wrote: } } % for a in 1 2 3; do xterm; done } then hit ctrl-z in that term and bg it, do stuff and at some point hit } ctrl-c, the backgrounded for loop will be interrupted I'm not 100% sure but this is most likely a TTY process group issue. If I change "xterm" to % for a in 1 2 3; do; sleep 10; echo $a; done then I can reproduce the behavior on "bg". However, if I stop, bg, *and* disown, I end up with the sleep resumed, but the subshell that handles the loop is still stopped. Which is an entirely different but related bug. } This happens even if you disown it, but not if } you run it with & or &|. Here's what I think is going on: When you run with & or &|, it's known "up front" that the job will be in the background, so the entire loop is immediately disconnected from the terminal and with &! also discarded from the job table. If instead you start in the foreground, the loop is in the parent shell, so by definition is attached to the terminal. This is very important because it means that "xterm" (or in my case "sleep") is the foreground job. When you ^Z, that suspends the foreground job. The parent zsh gets a wait-status of stopped, but it knows it's in a loop, so it forks and also suspends the new subshell to be able to keep the loop state without suspending the parent. This is where things get weird. When you "bg" that only resumes the previous foreground job, it does not resume the forked subshell (if it did, the next command in the loop would immediately start). That means the loop-subhell is still in the job table. It stays there (still stopped) until the previous foreground job finishes, at which point the parent continues and backgrounds it to resume looping. However, if you interrupt before the foreground job is done, the parent faithfully propagates the signal to the entry in the job table, and that kills the loop when it finally does restart. If you instead disown before the foreground job is done, the job table entry for the foreground job is deleted, so the parent never notices that it has exited and never resumes the loop at all. That latter bit is going to be nearly impossible to fix. Only the original parent shell can manage both the foreground job and the suspended subshell, which means it *can't* disown both of them if the loop is to remain blocked until the opportune moment. I think the only answer will be to refuse to disown. We already have a warning for jobs that can't be suspended, and one for disowning a job that is going to remain stopped. The former issue can probably be fixed by skipping those subshell job entries when propagating the TTY signal. We must be able to find those entries already to resume them when the current loop foreground job exits, so this shouldn't be impossible. I don't have time to look into this any further at this point. PWS? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-06-24 19:03 ` Bart Schaefer @ 2017-06-30 13:33 ` Peter Stephenson 2017-06-30 21:16 ` Peter Stephenson 2017-07-03 15:09 ` Peter Stephenson 1 sibling, 1 reply; 9+ messages in thread From: Peter Stephenson @ 2017-06-30 13:33 UTC (permalink / raw) To: zsh workers On Sat, 24 Jun 2017 12:03:10 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > On Jun 6, 9:08pm, Mikael Magnusson wrote: > } > } % for a in 1 2 3; do xterm; done > } then hit ctrl-z in that term and bg it, do stuff and at some point hit > } ctrl-c, the backgrounded for loop will be interrupted > > I'm not 100% sure but this is most likely a TTY process group issue. > > If I change "xterm" to > > % for a in 1 2 3; do; sleep 10; echo $a; done > > then I can reproduce the behavior on "bg". Yes, this version is very straightforward to see. > ...if you interrupt before the foreground job is done, the > parent faithfully propagates the signal to the entry in the job > table, and that kills the loop when it finally does restart. I think that is happening, but I'm not sure where. The shell forked to run the rest of the loop is running as a SUPERJOB with a different PGID from the parent shell (I checked the latter though haven't gone into the SUBJOB / SUPERJOB code again yet). I think that means it should never need to get the SIGINT. The parent shell knows (or can deduce simply by means of the job table) it's now not associated with a foreground SUBJOB, so I don't see why it shouldn't be able to handle this case properly. Haven't tracked down the specific code, though. pws ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-06-30 13:33 ` Peter Stephenson @ 2017-06-30 21:16 ` Peter Stephenson 2017-07-02 19:32 ` Peter Stephenson 0 siblings, 1 reply; 9+ messages in thread From: Peter Stephenson @ 2017-06-30 21:16 UTC (permalink / raw) To: zsh workers On Fri, 30 Jun 2017 14:33:47 +0100 Peter Stephenson <p.stephenson@samsung.com> wrote: > On Sat, 24 Jun 2017 12:03:10 -0700 > Bart Schaefer <schaefer@brasslantern.com> wrote: > > > On Jun 6, 9:08pm, Mikael Magnusson wrote: > > } > > } % for a in 1 2 3; do xterm; done > > } then hit ctrl-z in that term and bg it, do stuff and at some point hit > > } ctrl-c, the backgrounded for loop will be interrupted > > ...if you interrupt before the foreground job is done, the > > parent faithfully propagates the signal to the entry in the job > > table, and that kills the loop when it finally does restart. > > I think that is happening, but I'm not sure where. It's check_cursh_sig(SIGINT) from the interrupt handler. It propagates to jobs marked as STAT_CURSH. I think when the SUPERJOB is put into the background it should no longer have STAT_CURSH status. Does that sound reasonable? I'm not sure why it would have it anyway, should it perhaps be removed when we mark it as STAT_SUPERJOB, which is kind of decurrentshellising it in any case? Anyway, this minimal fix seems to do the right thing for the case in question. It might have side effects, though. More specificaly, if it didn't have side effects that would be some kind of record. pws diff --git a/Src/jobs.c b/Src/jobs.c index d1b98ac..09a8bab 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -2303,6 +2303,7 @@ bin_fg(char *name, char **argv, Options ops, int func) } pn = next; } + jobtab[job].stat &= ~STAT_CURSH; } } else if (func == BIN_BG) { /* Silly to bg a job already running. */ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-06-30 21:16 ` Peter Stephenson @ 2017-07-02 19:32 ` Peter Stephenson 2017-07-02 20:05 ` Bart Schaefer 0 siblings, 1 reply; 9+ messages in thread From: Peter Stephenson @ 2017-07-02 19:32 UTC (permalink / raw) To: zsh workers On Fri, 30 Jun 2017 22:16:10 +0100 Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > I think when the SUPERJOB is put into the background it should no longer > have STAT_CURSH status. Does that sound reasonable? I'm not sure why > it would have it anyway, should it perhaps be removed when we mark it as > STAT_SUPERJOB, which is kind of decurrentshellising it in any case? Done this way, tied to something else we don't want when something's backgrouned, it's probably good enough for now. pws diff --git a/Src/jobs.c b/Src/jobs.c index d1b98ac..32f7daa 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -2288,8 +2288,10 @@ bin_fg(char *name, char **argv, Options ops, int func) case BIN_FG: case BIN_BG: case BIN_WAIT: - if (func == BIN_BG) + if (func == BIN_BG) { jobtab[job].stat |= STAT_NOSTTY; + jobtab[job].stat &= ~STAT_CURSH; + } if ((stopped = (jobtab[job].stat & STAT_STOPPED))) { makerunning(jobtab + job); if (func == BIN_BG) { ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-07-02 19:32 ` Peter Stephenson @ 2017-07-02 20:05 ` Bart Schaefer 2017-07-02 20:22 ` Peter Stephenson 0 siblings, 1 reply; 9+ messages in thread From: Bart Schaefer @ 2017-07-02 20:05 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh workers Thanks for digging into this. I'm not git-enabled at the moment, is 41386 instead of or as well as 41384? Now we just have to deal with (not) disowning those jobs. I suppose we could delay the disown until they are resumed, or something. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-07-02 20:05 ` Bart Schaefer @ 2017-07-02 20:22 ` Peter Stephenson 0 siblings, 0 replies; 9+ messages in thread From: Peter Stephenson @ 2017-07-02 20:22 UTC (permalink / raw) To: zsh workers On Sun, 2 Jul 2017 13:05:18 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > Thanks for digging into this. I'm not git-enabled at the moment, is > 41386 instead of or as well as 41384? They're alternatives. The second is just slightly neater but not fundameentally different. pws ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-06-24 19:03 ` Bart Schaefer 2017-06-30 13:33 ` Peter Stephenson @ 2017-07-03 15:09 ` Peter Stephenson 2017-07-10 9:36 ` Peter Stephenson 1 sibling, 1 reply; 9+ messages in thread From: Peter Stephenson @ 2017-07-03 15:09 UTC (permalink / raw) To: Zsh Hackers' List Backgrounded shell code with subproceses: On Sat, 24 Jun 2017 12:03:10 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > If you > instead disown before the foreground job is done, the job table entry > for the foreground job is deleted, so the parent never notices that > it has exited and never resumes the loop at all. I agree with this. However, I think we can do slightly better by delaying the disown to the point where we send SIGCONT to the superjob (the background copy of the shell process) after finding it has no associated processes in the current shell. At that point it will start its own processes, so we never need to interact with it again. This seems to work, and I've documented the effect. Is it weird enough that we should print a warning from "disown", or is that just going to cause more confusion than it removes (warnings of a technical nature often do, in my experience)? I reckon simply pointing people at the documentation if they ask is probably good enough. BTW, I think I don't need to check if there are actually processes associated with the superjob, since if there aren't (in which case it's marked was WASSUPER instead of SUPERJOB) disowning immediately has no bad effects. pws diff --git a/Doc/Zsh/jobs.yo b/Doc/Zsh/jobs.yo index 6262dd2..44e0a44 100644 --- a/Doc/Zsh/jobs.yo +++ b/Doc/Zsh/jobs.yo @@ -49,6 +49,12 @@ in the parent shell. Thus the behaviour is different from the case where the function was never suspended. Zsh is different from many other shells in this regard. +One additional side effect is that use of tt(disown) with a job +created by suspending shell code in this fashion is delayed: the +job can only be disowned once any process started from the parent +shell has terminated. At that point, the job disappears silently +from the process list. + The same behaviour is found when the shell is executing code as the right hand side of a pipeline or any complex shell construct such as tt(if), tt(for), etc., in order that the entire block of code diff --git a/Src/jobs.c b/Src/jobs.c index 32f7daa..66dfb5a 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -277,6 +277,10 @@ handle_sub(int job, int fg) (!jn->procs->next || cp || jn->procs->pid != jn->gleader)) attachtty(jn->gleader); kill(sj->other, SIGCONT); + if (jn->stat & STAT_DISOWN) + { + deletejob(jn, 1); + } } curjob = jn - jobtab; } else if (sj->stat & STAT_STOPPED) { @@ -2375,6 +2379,10 @@ bin_fg(char *name, char **argv, Options ops, int func) printjob(job + (oldjobtab ? oldjobtab : jobtab), lng, 2); break; case BIN_DISOWN: + if (jobtab[job].stat & STAT_SUPERJOB) { + jobtab[job].stat |= STAT_DISOWN; + continue; + } if (jobtab[job].stat & STAT_STOPPED) { char buf[20], *pids = ""; diff --git a/Src/zsh.h b/Src/zsh.h index 137b2a5..a5b4d8f 100644 --- a/Src/zsh.h +++ b/Src/zsh.h @@ -1029,6 +1029,7 @@ struct job { #define STAT_BUILTIN (0x4000) /* job at tail of pipeline is a builtin */ #define STAT_SUBJOB_ORPHANED (0x8000) /* STAT_SUBJOB with STAT_SUPERJOB exited */ +#define STAT_DISOWN (0x10000) /* STAT_SUPERJOB with disown pending */ #define SP_RUNNING -1 /* fake status for jobs currently running */ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: interrupt handling bug (again?) 2017-07-03 15:09 ` Peter Stephenson @ 2017-07-10 9:36 ` Peter Stephenson 0 siblings, 0 replies; 9+ messages in thread From: Peter Stephenson @ 2017-07-10 9:36 UTC (permalink / raw) To: Zsh Hackers' List On Mon, 3 Jul 2017 16:09:33 +0100 Peter Stephenson <p.stephenson@samsung.com> wrote: > Backgrounded shell code with subproceses: > > On Sat, 24 Jun 2017 12:03:10 -0700 > Bart Schaefer <schaefer@brasslantern.com> wrote: > > If you > > instead disown before the foreground job is done, the job table entry > > for the foreground job is deleted, so the parent never notices that > > it has exited and never resumes the loop at all. > > I agree with this. > > However, I think we can do slightly better by delaying the disown to the > point where we send SIGCONT to the superjob (the background copy of the > shell process) after finding it has no associated processes in the > current shell. At that point it will start its own processes, so we > never need to interact with it again. It's been going through my mind that we can also add STAT_NOPRINT to the job at the first point to prevent it showing up in the job list in the mean time. But this might be too clever by half, causing confusion, for example, if you exit the parent shell (a common reason for wanting to disown a job) before it's had a chance to restart the superjob. So I don't think I'm going to do this. pws ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-07-10 9:46 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-06 19:08 interrupt handling bug (again?) Mikael Magnusson 2017-06-24 19:03 ` Bart Schaefer 2017-06-30 13:33 ` Peter Stephenson 2017-06-30 21:16 ` Peter Stephenson 2017-07-02 19:32 ` Peter Stephenson 2017-07-02 20:05 ` Bart Schaefer 2017-07-02 20:22 ` Peter Stephenson 2017-07-03 15:09 ` Peter Stephenson 2017-07-10 9:36 ` Peter Stephenson
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).