Possible signal handling issues

zsh-workers
 help / color / mirror / code / Atom feed

* Possible signal handling issues
@ 2013-12-28 23:02 Bart Schaefer
  2014-01-02 21:55 ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2013-12-28 23:02 UTC (permalink / raw)
  To: zsh-workers

Both of these have been around since at least 4.2.0.  Consider this script:

--- snip ---
sleep 20 &
TRAPINT() { set -x; kill -INT $$ }
wait
--- snip ---

Run that in the foreground, kill it with ctrl+c, and watch the infinite
loop.  Something to do with the "wait" command allows the INT to be re-
queued for handling even when it is sent from inside an INT trap.  The
signal_suspend() in zwaitjob() is constantly re-interrupted and never
returns.

I'm uncertain whether this next one is actually a bug.  This is from Chris
Johnson's question on zsh-users:

--- snip ---
sleep 10000 &
longpid=$!

sleepkill() {
  sleep 20
  print "timed out"
  kill $longpid
}

sleepkill &
sleeppid=$!

TRAPINT() {
  set -x
  kill $longpid
  #kill $sleeppid      # test 1
  #kill -- -$sleeppid  # test 2
  #kill -HUP -$$       # test 3
}
sleep 30
-- snip --

I avoided using "wait" as the last line there to show it's not related to
the previous bug.  If we run that script and then interrupt with ctrl+c,
the entire sleepkill function is left running in the background.  That's
probably correct because the monitor option is off in scripts so the hup
option does not apply.

Uncomment test 1.  Now the subshell started for sleepkill is terminated,
but "sleep 20" is left running.  This seems very odd to me, because that
sleep was never backgrounded relative to the sleepkill function.  Remove
test 1 and uncomment test 2 and you get a "no such process" error, which
indicates that sleepkill is not a "process group leader" so none of its
children receive its signals.  Uncomment test 3 which signals the whole
process group of the original script, and everything gets killed off.

Should the subshell be propagating that signal in the first test?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible signal handling issues
  2013-12-28 23:02 Possible signal handling issues Bart Schaefer
@ 2014-01-02 21:55 ` Peter Stephenson
  2014-01-02 22:40   ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2014-01-02 21:55 UTC (permalink / raw)
  To: zsh-workers

On Sat, 28 Dec 2013 15:02:34 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> Both of these have been around since at least 4.2.0.  Consider this script:
> 
> --- snip ---
> sleep 20 &
> TRAPINT() { set -x; kill -INT $$ }
> wait
> --- snip ---
> 
> Run that in the foreground, kill it with ctrl+c, and watch the infinite
> loop.  Something to do with the "wait" command allows the INT to be re-
> queued for handling even when it is sent from inside an INT trap.  The
> signal_suspend() in zwaitjob() is constantly re-interrupted and never
> returns.

The following doesn't get us much further, but I'm not sure executing
"wait" is the key thing here: I think it might be more to do with the
fact that the job was first started in the background.  The following is
less than conclusive, however, since "fg" shares a lot of code with "wait".

I first tried modifying the above to

set -x # for earlier (de?)mystification
sleep 20 &
TRAPINT() { set -x; kill -INT $$ }
fg

and then running with "zsh -fi" (something in my startup files is
causing a hang without the -f, which is irrelevant, but that's why I
went on and tried the other version below before I found that out).  I
got

../zwaitjob2.sh:3:> sleep 20
../zwaitjob2.sh:5:> fg %sleep
[1]  + 1792 running    sleep 20

^C reproducibly gives

+TRAPINT:0> set -x
+TRAPINT:0> kill -INT 2004
+TRAPINT:0> set -x
+TRAPINT:0> kill -INT 2004

Then I tried:

set -x
print $$
setopt monitor
sleep 20 &
TRAPINT() { set -x; kill -INT $$ }
fg %sleep

Initially I got

+../zwaitjob3.sh:2> print 1907
1907
+../zwaitjob3.sh:3> setopt monitor
[1] 1910
+../zwaitjob3.sh:4> sleep 20
+../zwaitjob3.sh:6> fg %sleep
[1]  + running    sleep 20

This time ^C or "kill -INT 1907" doesn't do anything.  I'm not sure what's
going on here.  However, if I send "kill -INT 1910" (killing the forked
process) from outside I see some variable number of repetitions of

+TRAPINT:0> kill -INT 1922
+TRAPINT:0> set -x

and then the shell exits.

Consequently, this looks to me like some intrinsic race that happens to
be particularly reproducible in the "wait" case.  However, this still
seems to be different to the second problem.

I wondered whether the race was due to the places where signals were
being queued and unqueued, but haven't got anywhere down that route, and
I don't know why this is different when the process wasn't
backgrounded.  I have a vague memory we do something about blocking ^C
when starting a job in the background, though?

pws

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible signal handling issues
  2014-01-02 21:55 ` Peter Stephenson
@ 2014-01-02 22:40   ` Peter Stephenson
  2014-01-02 22:53     ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2014-01-02 22:40 UTC (permalink / raw)
  To: zsh-workers

On Thu, 2 Jan 2014 21:55:05 +0000
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> Consequently, this looks to me like some intrinsic race that happens to
> be particularly reproducible in the "wait" case.

Consider, for example:

sleep 20 &
TRAPINT() { set -x; kill -INT $$; sleep 1 }
wait

Now I get the output from the trap as long as I wait for the sleep to
finish, but the function doesn't exit.  I presume that's because SIGINT
is now arriving while the trap is still running (and the fact that ^C
within a second has no effect is consistent).  Given that traps are
intrinsically asynchronous, that's not necessarily related to a bug in
the shell.  However, it's making the analysis even more complicated.

I further note that if I run sleep 20 normally in the foreground
(without the sleep in the trap) I get:

TRAPINT:0:> kill -INT 25339
TRAPINT:0:> kill -INT 25339

exactly twice.  That was consistent with zsh -i with an fg, but not
setopt monitor with an fg.  I'm guessing (but it's just a guess) that
the first time the trap exits quickly so the INT hits just afterwards
but the second time there's enough processing around that the signal
arrives before the trap has exited (or more precisely before the
internal framing of the trap has finished).

I'm getting less convinced this is anything more than a
you-only-have-yourself-to-blame oddity...

pws

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible signal handling issues
  2014-01-02 22:40   ` Peter Stephenson
@ 2014-01-02 22:53     ` Peter Stephenson
  2014-01-03  7:52       ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2014-01-02 22:53 UTC (permalink / raw)
  To: zsh-workers

On Thu, 2 Jan 2014 22:40:31 +0000
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
>...
> TRAPINT() { set -x; kill -INT $$; }
>...
> I further note that if I run sleep 20 normally in the foreground
> (without the sleep in the trap) I get:
> 
> TRAPINT:0:> kill -INT 25339
> TRAPINT:0:> kill -INT 25339
> 
> exactly twice.

Just to finish off this train of thought:  I'm sure Bart remembers this,
but it's worth reminding home viewers that if TRAPINT returns status
0, which it usually will with then kill last, then the job controller assumes
the signal was already handled, so doesn't take any action to pass the
signal on to the job itself.  If I add "return 1" to the end of
TRAPINT() I only ever see the output once before the job ends.

I think I've now lost interest in this...

pws

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible signal handling issues
  2014-01-02 22:53     ` Peter Stephenson
@ 2014-01-03  7:52       ` Bart Schaefer
  2014-01-03 17:44         ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2014-01-03  7:52 UTC (permalink / raw)
  To: zsh-workers

On Jan 2, 10:53pm, Peter Stephenson wrote:
}
} Just to finish off this train of thought:  I'm sure Bart remembers this,
} but it's worth reminding home viewers that if TRAPINT returns status
} 0, which it usually will with then kill last, then the job controller assumes
} the signal was already handled, so doesn't take any action to pass the
} signal on to the job itself.  If I add "return 1" to the end of
} TRAPINT() I only ever see the output once before the job ends.

I wasn't specifically considering that, but I think it's irrelevant. [*]
What I was pointing out is that it shouldn't be possible for signal N
to be delivered during the trap handler for that same signal, but the
way zsh "queues" signal handlers means that the TRAPN() function is not
usually called "during" the delivery of the signal.

} I think I've now lost interest in this...

Fair enough.

[*] It does suggest that Chris Johnson's script could possibly be improved
by replacing "kill -HUP $$" with a simple "return 1".  Haven't tried it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible signal handling issues
  2014-01-03  7:52       ` Bart Schaefer
@ 2014-01-03 17:44         ` Peter Stephenson
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Stephenson @ 2014-01-03 17:44 UTC (permalink / raw)
  To: zsh-workers

On Thu, 02 Jan 2014 23:52:42 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> What I was pointing out is that it shouldn't be possible for signal N
> to be delivered during the trap handler for that same signal, but the
> way zsh "queues" signal handlers means that the TRAPN() function is not
> usually called "during" the delivery of the signal.

Hmm... My experience suggests (but doesn't conclusively show) that the
signal doesn't get delivered *during* the trap handler (since if you
prolong its life artificially the signal is ignored), but may
sometimes be delivered *afterwards*, and that the latter is what's
causing the funny effects.  As there's no way of ensuring a signal gets
delivered to where it's going promptly, since they're by their nature
asynchronous, it therefore didn't struck me as particularly significant
that in that case you could trigger the trap to run again.  I can't
think of anything I'd want to do to make it better, is another way of
putting it.

pws

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-03 17:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-28 23:02 Possible signal handling issues Bart Schaefer
2014-01-02 21:55 ` Peter Stephenson
2014-01-02 22:40   ` Peter Stephenson
2014-01-02 22:53     ` Peter Stephenson
2014-01-03  7:52       ` Bart Schaefer
2014-01-03 17:44         ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).