zsh-workers
 help / color / mirror / code / Atom feed
* signal mask bug?
@ 2017-02-15 22:17 Danek Duvall
  2017-02-16  4:10 ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Danek Duvall @ 2017-02-15 22:17 UTC (permalink / raw)
  To: zsh-workers

A co-worker of mine who's quite adept at finding the strangest bugs
discovered this:

    % sleep 30 | cat
    ^Z
    zsh: suspended  sleep 30 | cat
    % bg
    [1]  + continued  sleep 30 | cat
    % jobs
    [1]  + running    sleep 30 | 
           unknown signal (core dumped)                 cat

Of course, nothing actually dumped core, suggesting it's just a reporting
problem.

I see this on Solaris (5.0.7, 5.2, 5.3.1) and on Linux (5.0.2).

Danek


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: signal mask bug?
  2017-02-15 22:17 signal mask bug? Danek Duvall
@ 2017-02-16  4:10 ` Bart Schaefer
  2017-02-16  9:48   ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2017-02-16  4:10 UTC (permalink / raw)
  To: zsh-workers

On Feb 15,  2:17pm, Danek Duvall wrote:
}
}     % jobs
}     [1]  + running    sleep 30 | 
}            unknown signal (core dumped)                 cat
} 
} Of course, nothing actually dumped core, suggesting it's just a reporting
} problem.

Hm.  This indicates that WCOREDUMP() is returning true for whatever the
job status of the cat process is.

However, presumably the status of that job should have been set to
SP_RUNNING at the same time it was for "sleep".

Which it is -- makerunning() assigns SP_RUNNING to pn->status of "cat"
during "bg".

But then when we reach prinjob() from the "jobs" command, pn->status of
that job has changed from -1 to 65535.

This happens at signals.c:525 in wait_for_processes(), when the status
returned from wait3(&status) [line 457] is assigned to it.

I suspect the status returned from wait3 is invalid in this case, but
I'm not sure.  What would a status of 65535 mean?  Or should it be
masked in some way that it isn't?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: signal mask bug?
  2017-02-16  4:10 ` Bart Schaefer
@ 2017-02-16  9:48   ` Peter Stephenson
  2017-02-16 20:18     ` Danek Duvall
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2017-02-16  9:48 UTC (permalink / raw)
  To: zsh-workers

On Wed, 15 Feb 2017 20:10:44 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> But then when we reach prinjob() from the "jobs" command, pn->status of
> that job has changed from -1 to 65535.
> 
> This happens at signals.c:525 in wait_for_processes(), when the status
> returned from wait3(&status) [line 457] is assigned to it.

Or somewhere or other it's going through a cast to unsigned short, but
that doesn't seem all that likely in the signal code, particularly POSIX
style on a 32- or 64-bit architecture.

If it's Solaris *and* Linux it seems unlikely the status itself is
doing anything funny, though, and that's passed back as int *...

pws


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: signal mask bug?
  2017-02-16  9:48   ` Peter Stephenson
@ 2017-02-16 20:18     ` Danek Duvall
  2017-02-23 21:39       ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Danek Duvall @ 2017-02-16 20:18 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

On Thu, Feb 16, 2017 at 09:48:36AM +0000, Peter Stephenson wrote:

> On Wed, 15 Feb 2017 20:10:44 -0800
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > But then when we reach prinjob() from the "jobs" command, pn->status of
> > that job has changed from -1 to 65535.
> > 
> > This happens at signals.c:525 in wait_for_processes(), when the status
> > returned from wait3(&status) [line 457] is assigned to it.
> 
> Or somewhere or other it's going through a cast to unsigned short, but
> that doesn't seem all that likely in the signal code, particularly POSIX
> style on a 32- or 64-bit architecture.
> 
> If it's Solaris *and* Linux it seems unlikely the status itself is
> doing anything funny, though, and that's passed back as int *...

So my colleague did some digging and found that the process is marked with
WCONTFLG, and that once that happens, you have to check WIFCONTINUED()
before anything else, because WCONTFLG overwrites all the bits.  He found
that the following patch worked for him:

    --- a/Src/signals.c   2015-09-04 13:38:13.000000000 -0700
    +++ b/Src/signals.c   2017-02-16 11:37:26.714503575 -0800
    @@ -510,6 +510,11 @@
                    struct timezone dummy_tz;
                    gettimeofday(&pn->endtime, &dummy_tz);
                    pn->status = status;
    +#ifdef WIFCONTINUED
    +               if (WIFCONTINUED(status))
    +                       pn->status = SP_RUNNING;
    +#endif
    +
                    pn->ti = ru;
     #else
                    update_process(pn, status);

though it occurred to me that it might be better placed in the loop in
printjob(), which already has an override for pn->status.

Thanks,
Danek


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: signal mask bug?
  2017-02-16 20:18     ` Danek Duvall
@ 2017-02-23 21:39       ` Bart Schaefer
  0 siblings, 0 replies; 5+ messages in thread
From: Bart Schaefer @ 2017-02-23 21:39 UTC (permalink / raw)
  To: zsh-workers

On Feb 16, 12:18pm, Danek Duvall wrote:
}
} So my colleague did some digging and found that the process is marked with
} WCONTFLG, and that once that happens, you have to check WIFCONTINUED()
} before anything else, because WCONTFLG overwrites all the bits.

Thanks.  Slight variation on the patch:

diff --git a/Src/signals.c b/Src/signals.c
index a717677..68a7ae3 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -522,6 +522,11 @@ wait_for_processes(void)
 #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE)
 		struct timezone dummy_tz;
 		gettimeofday(&pn->endtime, &dummy_tz);
+#ifdef WIFCONTINUED
+		if (WIFCONTINUED(status))
+		    pn->status = SP_RUNNING;
+		else
+#endif
 		pn->status = status;
 		pn->ti = ru;
 #else


There might be an argument for checking whether the status is already
SP_RUNNING but the above should cover the case of a stopped job that
is sent a SIGCONT from "elsewhere" (i.e., not via fg or bg from the
controlling shell).


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-23 21:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-15 22:17 signal mask bug? Danek Duvall
2017-02-16  4:10 ` Bart Schaefer
2017-02-16  9:48   ` Peter Stephenson
2017-02-16 20:18     ` Danek Duvall
2017-02-23 21:39       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).