zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] Do not send duplicate signals when MONITOR is set
@ 2021-06-07 17:27 Erik Paulson
  2021-06-07 18:45 ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Erik Paulson @ 2021-06-07 17:27 UTC (permalink / raw)
  To: zsh-workers

When job control is enabled, killjb() is sending signals to the job's group
leader via killpg(), and then falling into a loop where the job's
process list is traversed and the signal is sent to each process. This
causes signals to always be sent twice.

This patch adds a return after the killpg() call to avoid sending the
signal again.
---

I run emacs as a daemon and use the emacsclient program to connect to
it. I noticed that when I suspended the emacsclient program and
resumed it in zsh, the program would sporadically crash. After digging
into the code, I realized that emacsclient was receiving two SIGCONTs,
which caused it to send a malformed command to the daemon. While this
is definitely a problem with emacsclient, it doesn't feel right that
Zsh is sending two SIGCONTs.

I found that this return used to be present, but was removed in
https://www.zsh.org/mla/workers/2018/msg01338.html while addressing
another emacs issue. It looks to me to be an oversight, but I cannot
tell as I am not well versed in the Zsh codebase or job control. I
know that my issue goes away with this patch, and I cannot reproduce
the original issue in the linked mail thread with it either.

Note that on testing with Linux, it seems the kernel will suppress the
second signal; in order to get a test program to detect it, I have to
step through the code with the debugger. On OSX, where I originally
detected this problem, I reliably get two signals delivered each time.

 Src/signals.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Src/signals.c b/Src/signals.c
index 2c540f38f..5c787e2a8 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -810,6 +810,7 @@ killjb(Job jn, int sig)
 	    err = killpg(jn->gleader, sig);
 	    if (sig == SIGCONT && err != -1)
 		makerunning(jn);
+	    return err;
 	}
     }
     for (pn = jn->procs; pn; pn = pn->next) {
-- 
2.31.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Do not send duplicate signals when MONITOR is set
  2021-06-07 17:27 [PATCH] Do not send duplicate signals when MONITOR is set Erik Paulson
@ 2021-06-07 18:45 ` Bart Schaefer
  2021-06-14 19:19   ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2021-06-07 18:45 UTC (permalink / raw)
  To: Erik Paulson; +Cc: Zsh hackers list

On Mon, Jun 7, 2021 at 10:28 AM Erik Paulson <epaulson10@gmail.com> wrote:
>
> I run emacs as a daemon and use the emacsclient program to connect to
> it. I noticed that when I suspended the emacsclient program and
> resumed it in zsh, the program would sporadically crash. After digging
> into the code, I realized that emacsclient was receiving two SIGCONTs,
> which caused it to send a malformed command to the daemon.
>
> I found that this return used to be present, but was removed in
> https://www.zsh.org/mla/workers/2018/msg01338.html while addressing
> another emacs issue.

I don't think it was removed ... similar code was added in two
separate places, but the "return" was only added in one of those.

Your patch adds that return in the second case.

The difference is that in the first case, the SIGCONT is received by a
job that is marked STAT_SUPERJOB and in the second case it's received
by a different job.

I believe this means that in the former case the superjob is in the
foreground and in the second case, it isn't -- rather one of its
subjobs is.  In the first instance zsh sends the signal to all the
subjobs and then to the process group.  In the second case it sends
the signal to the process group first and then falls into the loop
sending the signal to any subjobs that still appear to be stopped.

In any case I think a potential problem with placing an unconditional
"return" where your patch does, is that signals other than SIGCONT
probably still need to be delivered to the subjobs.  PWS, any input
here?

> Note that on testing with Linux, it seems the kernel will suppress the
> second signal; in order to get a test program to detect it, I have to
> step through the code with the debugger. On OSX, where I originally
> detected this problem, I reliably get two signals delivered each time.

This is probably a process scheduling difference rather than a signal
being suppressed, e.g., on Linux the order of events is
1) zsh sends signal to process group
2) process group copies signal to all processes
3) those processes resume
4) zsh proceeds into makerunning() and clears the STAT_STOPPED flag
5) that makes the loop a no-op

Whereas on OSX,
1) zsh sends signal to process group
2) zsh proceeds into makerunning() so STAT_STOPPED is left in place
3) process group copies signal to all processes
4) the loop sends a second SIGCONT
5) those processes resume and get a double SIGCONT

(2 & 3 might be simultaneous or in either order)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Do not send duplicate signals when MONITOR is set
  2021-06-07 18:45 ` Bart Schaefer
@ 2021-06-14 19:19   ` Peter Stephenson
  2021-07-18 22:55     ` Lawrence Velázquez
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2021-06-14 19:19 UTC (permalink / raw)
  To: zsh-workers

On Mon, 2021-06-07 at 11:45 -0700, Bart Schaefer wrote:
> On Mon, Jun 7, 2021 at 10:28 AM Erik Paulson <epaulson10@gmail.com> wrote:
> > 
> > I run emacs as a daemon and use the emacsclient program to connect to
> > it. I noticed that when I suspended the emacsclient program and
> > resumed it in zsh, the program would sporadically crash. After digging
> > into the code, I realized that emacsclient was receiving two SIGCONTs,
> > which caused it to send a malformed command to the daemon.
> > 
> > I found that this return used to be present, but was removed in
> > https://www.zsh.org/mla/workers/2018/msg01338.html while addressing
> > another emacs issue.
> 
> I don't think it was removed ... similar code was added in two
> separate places, but the "return" was only added in one of those.
> 
> Your patch adds that return in the second case.
> 
> The difference is that in the first case, the SIGCONT is received by a
> job that is marked STAT_SUPERJOB and in the second case it's received
> by a different job.
> 
> I believe this means that in the former case the superjob is in the
> foreground and in the second case, it isn't -- rather one of its
> subjobs is.  In the first instance zsh sends the signal to all the
> subjobs and then to the process group.  In the second case it sends
> the signal to the process group first and then falls into the loop
> sending the signal to any subjobs that still appear to be stopped.
> 
> In any case I think a potential problem with placing an unconditional
> "return" where your patch does, is that signals other than SIGCONT
> probably still need to be delivered to the subjobs.  PWS, any input
> here?

Hmm, it's not clear to me in what cases you'd need to deliver both
killpg() to the group leader and then kill to the processes.  You might
hope if the shell was doing the right thing the first would be enough.
But there might be special cases.

I would hazard that as SIGCONT is probably the most difficult case ---
the only one where you specifically want the process to be running
afterwards --- if this patch improves things there, it's prohably not
doing a lot of harm in most cases.

It's not impossible there are some oddities where process groups aren't
set up quite how you want that this might affect, but I doubt we're
going to spot them just by staring.

pws



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Do not send duplicate signals when MONITOR is set
  2021-06-14 19:19   ` Peter Stephenson
@ 2021-07-18 22:55     ` Lawrence Velázquez
  2021-07-19 10:00       ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Lawrence Velázquez @ 2021-07-18 22:55 UTC (permalink / raw)
  To: zsh-workers

On Mon, Jun 14, 2021, at 3:19 PM, Peter Stephenson wrote:
> On Mon, 2021-06-07 at 11:45 -0700, Bart Schaefer wrote:
> > On Mon, Jun 7, 2021 at 10:28 AM Erik Paulson <epaulson10@gmail.com> wrote:
> > > 
> > > I run emacs as a daemon and use the emacsclient program to connect to
> > > it. I noticed that when I suspended the emacsclient program and
> > > resumed it in zsh, the program would sporadically crash. After digging
> > > into the code, I realized that emacsclient was receiving two SIGCONTs,
> > > which caused it to send a malformed command to the daemon.
> > > 
> > > I found that this return used to be present, but was removed in
> > > https://www.zsh.org/mla/workers/2018/msg01338.html while addressing
> > > another emacs issue.
> > 
> > I don't think it was removed ... similar code was added in two
> > separate places, but the "return" was only added in one of those.
> > 
> > Your patch adds that return in the second case.
> > 
> > The difference is that in the first case, the SIGCONT is received by a
> > job that is marked STAT_SUPERJOB and in the second case it's received
> > by a different job.
> > 
> > I believe this means that in the former case the superjob is in the
> > foreground and in the second case, it isn't -- rather one of its
> > subjobs is.  In the first instance zsh sends the signal to all the
> > subjobs and then to the process group.  In the second case it sends
> > the signal to the process group first and then falls into the loop
> > sending the signal to any subjobs that still appear to be stopped.
> > 
> > In any case I think a potential problem with placing an unconditional
> > "return" where your patch does, is that signals other than SIGCONT
> > probably still need to be delivered to the subjobs.  PWS, any input
> > here?
> 
> Hmm, it's not clear to me in what cases you'd need to deliver both
> killpg() to the group leader and then kill to the processes.  You might
> hope if the shell was doing the right thing the first would be enough.
> But there might be special cases.
> 
> I would hazard that as SIGCONT is probably the most difficult case ---
> the only one where you specifically want the process to be running
> afterwards --- if this patch improves things there, it's prohably not
> doing a lot of harm in most cases.
> 
> It's not impossible there are some oddities where process groups aren't
> set up quite how you want that this might affect, but I doubt we're
> going to spot them just by staring.

Anything else on this?

-- 
vq


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Do not send duplicate signals when MONITOR is set
  2021-07-18 22:55     ` Lawrence Velázquez
@ 2021-07-19 10:00       ` Peter Stephenson
  2021-07-23 20:11         ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2021-07-19 10:00 UTC (permalink / raw)
  To: Lawrence Velázquez, zsh-workers

> On 18 July 2021 at 23:55 Lawrence Velázquez <larryv@zsh.org> wrote:
> On Mon, Jun 14, 2021, at 3:19 PM, Peter Stephenson wrote:
> > On Mon, 2021-06-07 at 11:45 -0700, Bart Schaefer wrote:
> > > On Mon, Jun 7, 2021 at 10:28 AM Erik Paulson <epaulson10@gmail.com> wrote:
> > > > 
> > > > I run emacs as a daemon and use the emacsclient program to connect to
> > > > it. I noticed that when I suspended the emacsclient program and
> > > > resumed it in zsh, the program would sporadically crash. After digging
> > > > into the code, I realized that emacsclient was receiving two SIGCONTs,
> > > > which caused it to send a malformed command to the daemon.
> > > > 
> > > > I found that this return used to be present, but was removed in
> > > > https://www.zsh.org/mla/workers/2018/msg01338.html while addressing
> > > > another emacs issue.
>...
> > I would hazard that as SIGCONT is probably the most difficult case ---
> > the only one where you specifically want the process to be running
> > afterwards --- if this patch improves things there, it's prohably not
> > doing a lot of harm in most cases.
>...
> Anything else on this?

Unless anyone can point to a fundamental error in my summary above, I
suggest we apply this and and see what happens.  At the absolute least,
I'm pretty sure we're not going to get any further without trying it
out in earnest, and looking for remaining oddities.

pws


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Do not send duplicate signals when MONITOR is set
  2021-07-19 10:00       ` Peter Stephenson
@ 2021-07-23 20:11         ` Peter Stephenson
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Stephenson @ 2021-07-23 20:11 UTC (permalink / raw)
  To: zsh-workers

On Mon, 2021-07-19 at 11:00 +0100, Peter Stephenson wrote:
> > On 18 July 2021 at 23:55 Lawrence Velázquez <larryv@zsh.org> wrote:
> > On Mon, Jun 14, 2021, at 3:19 PM, Peter Stephenson wrote:
> > > On Mon, 2021-06-07 at 11:45 -0700, Bart Schaefer wrote:
> > > > On Mon, Jun 7, 2021 at 10:28 AM Erik Paulson <epaulson10@gmail.com> wrote:
> > > > > 
> > > > > I run emacs as a daemon and use the emacsclient program to connect to
> > > > > it. I noticed that when I suspended the emacsclient program and
> > > > > resumed it in zsh, the program would sporadically crash. After digging
> > > > > into the code, I realized that emacsclient was receiving two SIGCONTs,
> > > > > which caused it to send a malformed command to the daemon.
> > > > > 
> > > > > I found that this return used to be present, but was removed in
> > > > > https://www.zsh.org/mla/workers/2018/msg01338.html while addressing
> > > > > another emacs issue.
> > 
> > ...
> > > I would hazard that as SIGCONT is probably the most difficult case ---
> > > the only one where you specifically want the process to be running
> > > afterwards --- if this patch improves things there, it's prohably not
> > > doing a lot of harm in most cases.
> > 
> > ...
> > Anything else on this?
> 
> Unless anyone can point to a fundamental error in my summary above, I
> suggest we apply this and and see what happens.  At the absolute least,
> I'm pretty sure we're not going to get any further without trying it
> out in earnest, and looking for remaining oddities.

I've committed this, so we should keep a look out for missing signals,
though they're going to be in very obscure contexts even if there are
any.

pws



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-23 20:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 17:27 [PATCH] Do not send duplicate signals when MONITOR is set Erik Paulson
2021-06-07 18:45 ` Bart Schaefer
2021-06-14 19:19   ` Peter Stephenson
2021-07-18 22:55     ` Lawrence Velázquez
2021-07-19 10:00       ` Peter Stephenson
2021-07-23 20:11         ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).