zsh-workers
 help / color / mirror / code / Atom feed
* zargs with -P intermittently failing in zsh 5.9 and macOS
@ 2022-05-24 16:09 Eric Nielsen
  2022-05-24 17:32 ` Bart Schaefer
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Nielsen @ 2022-05-24 16:09 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1195 bytes --]

Hi, I'm seeing zargs with -P intermittently fail with in zsh 5.9 in
macOS. Here's the information I could gather so far.

Minimal reproducible example:

% f() { curl -Sso /dev/null 'https://example.com'; return 0 }
% autoload -Uz zargs
% setopt xtrace
% zargs -n 1 -P 0 -- {1..13} -- f

Expected the status code to be 0. Most of the times I get 123 instead.
The xtrace output shows:

+(eval):6> case 19 (<1-125>|128)
+(eval):8> ret=123

at least once, even though f always returned 0. Added the extra return 0
in f to force that return code.

If f is simplified, I don't see the error. If the -P 0 is removed, I
don't see the error. If the range is shortened, the error will be less
frequent. Even with {1..13} it happens to sometimes succeed in my Mid
2015 2.2 GHz Quad-Core Intel Core i7.

This is not reproducible with earlier versions of zsh or with the
/bin/zsh bundled with macOS (zsh version 5.8.1 in macOS 12.4). This is
reproducible with zsh 5.9 installed both via homebrew or macports. This
is so far not reproducible in other OSs.

Information comes from https://github.com/Homebrew/homebrew-
core/issues/101937 and https://github.com/zimfw/zimfw/discussions/459.

Kind regards,
Eric

[-- Attachment #2: Type: text/html, Size: 3737 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-24 16:09 zargs with -P intermittently failing in zsh 5.9 and macOS Eric Nielsen
@ 2022-05-24 17:32 ` Bart Schaefer
  2022-05-24 20:21   ` Eric Nielsen
  2022-05-26  8:57   ` Jun T
  0 siblings, 2 replies; 18+ messages in thread
From: Bart Schaefer @ 2022-05-24 17:32 UTC (permalink / raw)
  To: Eric Nielsen; +Cc: zsh-workers

On Tue, May 24, 2022 at 9:09 AM Eric Nielsen <ericbn@hey.com> wrote:
>
> Hi, I'm seeing zargs with -P intermittently fail with in zsh 5.9 in macOS.
> This is not reproducible with earlier versions of zsh or with the /bin/zsh bundled with macOS

zargs was re-coded as of the 5.9 release so that it waits for
individual jobs instead of using a single "wait" for all of them.
This means it can detect nonzero exit in cases where earlier revisions
missed it.

> +(eval):6> case 19 (<1-125>|128)
> +(eval):8> ret=123
>
> at least once, even though f always returned 0.

Something is causing the subshell that was running f to exit with
status 19 even though f returned 0.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-24 17:32 ` Bart Schaefer
@ 2022-05-24 20:21   ` Eric Nielsen
  2022-05-24 20:35     ` Bart Schaefer
  2022-05-26  8:57   ` Jun T
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Nielsen @ 2022-05-24 20:21 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

On May 24, 2022, Bart Schaefer <schaefer@brasslantern.com> wrote:
> zargs was re-coded as of the 5.9 release so that it waits for
> individual jobs instead of using a single "wait" for all of them.
> This means it can detect nonzero exit in cases where earlier revisions
> missed it.

Got it. So we're now seeing a more precise return code than before 5.9.

> Something is causing the subshell that was running f to exit with
> status 19 even though f returned 0.

Any idea what could be returning 19? Looks like it's something outside
the scope of f. At least the code is consistently 19, when it happens.

Kind regards,
Eric

[-- Attachment #2: Type: text/html, Size: 3160 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-24 20:21   ` Eric Nielsen
@ 2022-05-24 20:35     ` Bart Schaefer
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Schaefer @ 2022-05-24 20:35 UTC (permalink / raw)
  To: Eric Nielsen; +Cc: zsh-workers

On Tue, May 24, 2022 at 1:21 PM Eric Nielsen <ericbn@hey.com> wrote:
>
> Any idea what could be returning 19?

Unfortunately, no, and my Mac is currently offline.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-24 17:32 ` Bart Schaefer
  2022-05-24 20:21   ` Eric Nielsen
@ 2022-05-26  8:57   ` Jun T
  2022-05-26 11:21     ` Jun. T
  2022-05-27  3:39     ` Bart Schaefer
  1 sibling, 2 replies; 18+ messages in thread
From: Jun T @ 2022-05-26  8:57 UTC (permalink / raw)
  To: zsh-workers


> 2022/05/25 2:32, Bart Schaefer <schaefer@brasslantern.com> wrote:
> 
>> +(eval):6> case 19 (<1-125>|128)
>> +(eval):8> ret=123
>> 
>> at least once, even though f always returned 0.
> 
> Something is causing the subshell that was running f to exit with
> status 19 even though f returned 0.

The problem can be reproduced by the following simplified script:

#!/bin/zsh
_zajobs=()
(   # works OK if not run in a subshell
    np=10   # try increasing this if you do not get $? = 19
    for ((i=0; i<$np; ++i)); do
        curl -so /dev/null 'https://example.com' & _zajobs+=( $! )
    done

    #sleep 1    # works OK if 'sleep 1' is added here
    wait        # works OK if this line is commented out
    for p in $_zajobs; do
        wait $p
        echo $p $?
    done
)

If I run this script on my Mac, a few (one or two in most cases, zero or
three sometimes) of the ten "wait $p" give $? = 19.

But all the "wait $p" return 0 if
(1) remove the "wait" before the "for p in $_zajobs", or
(2) add "sleep 1" before the "wait", or
(3) run whole script in the main shell, i.e., remove the enclosing
"(" and ")", or
(4) replace "curl -so /dev/null 'https://example.com'" by "sleep 1",
or
(5) run the script on Linux (I haven't tried other BSDs).

(1)(2) may indicate that the first "wait" (without $p) can't correctly
wait for some of the child procs...?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-26  8:57   ` Jun T
@ 2022-05-26 11:21     ` Jun. T
  2022-05-27  3:39     ` Bart Schaefer
  1 sibling, 0 replies; 18+ messages in thread
From: Jun. T @ 2022-05-26 11:21 UTC (permalink / raw)
  To: zsh-workers


> 2022/05/26 17:57, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
> 
>        curl -so /dev/null 'https://example.com' & _zajobs+=( $! )

Sorry, curl fails (connection refused) if I test on my another Mac
in my home.

#!/bin/zsh
f () { curl -so /dev/null 'https://example.com'; return 0 }
_zajobs=()
(   # works OK if not run in a subshell
    np=20   #  try increasing this if you do not get $? = 19
    for ((i=0; i<$np; ++i)); do
        f & _zajobs+=( $! )
    done

    #sleep 1    # works OK if 'sleep 1' is added here
    wait        # works OK if this line is commented out
    for p in $_zajobs; do
        wait $p
        echo $p $?
    done
)

With this f(), the first pid in $_zajobs has the highest probability
to give $? = 19. If I replace f() by
f {} { date > /dev/null }
that takes much shorter time, and set np=20, then I get $? = 19 only
sometimes (not frequently), and the last (or the second last) pid in
$_zajobs has high probability to fail.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-26  8:57   ` Jun T
  2022-05-26 11:21     ` Jun. T
@ 2022-05-27  3:39     ` Bart Schaefer
  2022-05-27  4:18       ` Lawrence Velázquez
  2022-05-27  5:29       ` Jun T
  1 sibling, 2 replies; 18+ messages in thread
From: Bart Schaefer @ 2022-05-27  3:39 UTC (permalink / raw)
  To: Jun T; +Cc: Zsh hackers list

On Thu, May 26, 2022 at 1:58 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> > 2022/05/25 2:32, Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > Something is causing the subshell that was running f to exit with
> > status 19 even though f returned 0.
>
> The problem can be reproduced by the following simplified script:

Thanks.  Is this limited to newer MacOS, like the nullexec test
discussed in workers/50164 ?

> (   # works OK if not run in a subshell

Interesting.  It may actually be possible to remove the subshell from
zargs now (see below).

>     np=10   # try increasing this if you do not get $? = 19
>     for ((i=0; i<$np; ++i)); do
>         curl -so /dev/null 'https://example.com' & _zajobs+=( $! )
>     done

I don't suppose you've found any indication of what 19 actually means, there.

>     #sleep 1    # works OK if 'sleep 1' is added here
>     wait        # works OK if this line is commented out

Hm.  If that wait is removed, the subshell is probably not necessary.
It's there because of a lingering concern that if we didn't first wait
for all jobs to finish before starting the individual waits, we might
get race conditions. It seems like perhaps the opposite is actually
the case.

> But all the "wait $p" return 0 if
> (1) remove the "wait" before the "for p in $_zajobs", or
> (2) add "sleep 1" before the "wait", or
>
> (1)(2) may indicate that the first "wait" (without $p) can't correctly
> wait for some of the child procs...?

That would be ... strange ... and would mean we might have to be on
the lookout for "wait" failing in other cases too.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-27  3:39     ` Bart Schaefer
@ 2022-05-27  4:18       ` Lawrence Velázquez
  2022-05-27  5:29       ` Jun T
  1 sibling, 0 replies; 18+ messages in thread
From: Lawrence Velázquez @ 2022-05-27  4:18 UTC (permalink / raw)
  To: Bart Schaefer, Jun T; +Cc: zsh-workers

On Thu, May 26, 2022, at 11:39 PM, Bart Schaefer wrote:
> On Thu, May 26, 2022 at 1:58 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>>
>> > 2022/05/25 2:32, Bart Schaefer <schaefer@brasslantern.com> wrote:
>> >
>> > Something is causing the subshell that was running f to exit with
>> > status 19 even though f returned 0.
>>
>> The problem can be reproduced by the following simplified script:
>
> Thanks.  Is this limited to newer MacOS, like the nullexec test
> discussed in workers/50164 ?

I can reproduce the problem on Mojave (10.14.6).

-- 
vq


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-27  3:39     ` Bart Schaefer
  2022-05-27  4:18       ` Lawrence Velázquez
@ 2022-05-27  5:29       ` Jun T
  2022-05-27 16:10         ` Bart Schaefer
  1 sibling, 1 reply; 18+ messages in thread
From: Jun T @ 2022-05-27  5:29 UTC (permalink / raw)
  To: zsh-workers


> 2022/05/27 12:39, Bart Schaefer <schaefer@brasslantern.com> wrote:
> 
> I don't suppose you've found any indication of what 19 actually means, there.

I think I've found it now; 19 = SIGCONT.

If in subshell, zwaitjob(job) calls killjb(jn, SIGONT) (jobs.c:1657).

When wait_for_processes() is called (in signal handler?), wait3(&state)
_sometimes_ returns with WIFCONTINUED(status) set to true.
Then wait_for_processes() calls addbgstatus(pid, val) with
val=WEXITSTATUS(status)=19.

Later the process exits normally, and addbgstatus(pid, 0) is called.
This means bgstatus_list has two entries for the pid.

When 'wait pid' calls getbgstatus(pid), it finds the entry with status=19
and returns it.

A simple fix would be, at line 584 of signals.c,
(A) call addbgstatus() only if WIFCONTINUED() is not true.

But, what should we do if WIFSTOPPED() is true?

The comment before addbgstatus() (jobs.c, line 2211) says:
   Record the status of a background process that exited so we
   can execute the builtin wait for it.

So I guess we need to call addbgstatus() only if the process has actually
exited (or killed). If this is the case, better solution would be
(B) call addbgstatus() only if WIFEXITED() or WIFSIGNALED() is true.
This it the patch below.

Or, if we want to call addbgstatus() unconditionally,
(C) modify addbgstatus() so that if there is already an entry for
the pid then update it instead of adding a new entry.

I feel (B) is the best way, but not sure. Any idea?


>>    #sleep 1    # works OK if 'sleep 1' is added here
>>    wait        # works OK if this line is commented out
> 
> Hm.  If that wait is removed, the subshell is probably not necessary.
> It's there because of a lingering concern that if we didn't first wait
> for all jobs to finish before starting the individual waits, we might
> get race conditions. It seems like perhaps the opposite is actually
> the case.

I think 'wait pid' for all the pids in $_zajobs would be enough, but
not 100 % sure. zargs need not be modified if the addbgstatus-problem
is fixed (in some way), but if the wait/subshell is not necessary then
it is better removed.

It can happen that a user starts (in a script, probably) many background
jobs and want to wait for only part of the jobs. If waiting for each
job does not work, the user may call it a bug.
# of cause if it is hard to fix, then it would be better to include a
# workaround in zargs.


diff --git a/Src/signals.c b/Src/signals.c
index 5c787e2a8..8096993cd 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -575,14 +575,11 @@ wait_for_processes(void)
 	 * and is not equal to the current foreground job.
 	 */
 	if (jn && !(jn->stat & (STAT_CURSH|STAT_BUILTIN)) &&
-	    jn - jobtab != thisjob) {
-	    int val = (WIFSIGNALED(status) ?
-		   0200 | WTERMSIG(status) :
-		   (WIFSTOPPED(status) ?
-		    0200 | WEXITSTATUS(status) :
-		    WEXITSTATUS(status)));
-	    addbgstatus(pid, val);
-	}
+	    jn - jobtab != thisjob &&
+	    /* record the status only if the process has exited */
+	    (WIFEXITED(status) || WIFSIGNALED(status)))
+	    addbgstatus(pid, WIFSIGNALED(status) ? 0200 | WTERMSIG(status)
+						: WEXITSTATUS(status));
 
 	unqueue_signals();
     }





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-27  5:29       ` Jun T
@ 2022-05-27 16:10         ` Bart Schaefer
  2022-05-27 17:25           ` Jun. T
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Schaefer @ 2022-05-27 16:10 UTC (permalink / raw)
  To: Jun T; +Cc: Zsh hackers list

On Thu, May 26, 2022 at 10:30 PM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> When wait_for_processes() is called (in signal handler?), wait3(&state)
> _sometimes_ returns with WIFCONTINUED(status) set to true.
> Then wait_for_processes() calls addbgstatus(pid, val) with
> val=WEXITSTATUS(status)=19.
>
> Later the process exits normally, and addbgstatus(pid, 0) is called.
> This means bgstatus_list has two entries for the pid.
>
> When 'wait pid' calls getbgstatus(pid), it finds the entry with status=19
> and returns it.

Even if it's correct to have two entries, it's wrong to return the
least recent one.

> A simple fix would be, at line 584 of signals.c,
> (A) call addbgstatus() only if WIFCONTINUED() is not true.

I think that's insufficient ... we should be recording the most recent
state, per your question about WIFSTOPPED.

> So I guess we need to call addbgstatus() only if the process has actually
> exited (or killed). If this is the case, better solution would be
> (B) call addbgstatus() only if WIFEXITED() or WIFSIGNALED() is true.
> This it the patch below.

Does that really work?  I would expect WIFSIGNALED to be true when
each of WIFSTOPPED and WIFCONTINUED is also true, so it wouldn't
change anything?  (There isn't any way to stop/continue without using
a signal that I can think of.)

> Or, if we want to call addbgstatus() unconditionally,
> (C) modify addbgstatus() so that if there is already an entry for
> the pid then update it instead of adding a new entry.

I think either that, or

(D) Make sure getbgstatus() always returns the most recent entry.

But ... there isn't any way for a user / script to examine more than
one job status per PID, so unless we're using it internally somewhere
(?) I would think (C) is best.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-27 16:10         ` Bart Schaefer
@ 2022-05-27 17:25           ` Jun. T
  2022-05-29  3:30             ` Bart Schaefer
  0 siblings, 1 reply; 18+ messages in thread
From: Jun. T @ 2022-05-27 17:25 UTC (permalink / raw)
  To: zsh-workers


> 2022/05/28 1:10、Bart Schaefer <schaefer@brasslantern.com>のメール:
> 
> On Thu, May 26, 2022 at 10:30 PM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>> 
>> 
>> When 'wait pid' calls getbgstatus(pid), it finds the entry with status=19
>> and returns it.
> 
> Even if it's correct to have two entries, it's wrong to return the
> least recent one.

Yes, of course, so we need to fix it.

>> So I guess we need to call addbgstatus() only if the process has actually
>> exited (or killed). If this is the case, better solution would be
>> (B) call addbgstatus() only if WIFEXITED() or WIFSIGNALED() is true.
>> This it the patch below.
> 
> Does that really work?  I would expect WIFSIGNALED to be true when
> each of WIFSTOPPED and WIFCONTINUED is also true, so it wouldn't
> change anything?  (There isn't any way to stop/continue without using
> a signal that I can think of.)

I've tried A, B, C, and it seems all of them work in the sense that
it solves the current zargs's problem, and 'make check' passes.
But I think A is insufficient, as you write.

I was thinking B is enough, because:
WIFEXITED: has exited spontaneously
WIFSIGNALED: has been killed by a signal
WIFSTOPPED: still alive but now stoeepd
WIFCONTINUED: has been stopped but now continued
So these 4 are mutually exclusive.
In the case of EXITED/SIGNALED, further call of wait3() will not report
the status of the process. So if we call addbgstatus() only for
EXITED/SIGNALED, then it records the final state of the process just
onece. How do you think?  Do you have a Mac for testing?

Or we can use both B and C, and add a DPUTS() if addbgstatus() is
called more than once for a single pid.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-27 17:25           ` Jun. T
@ 2022-05-29  3:30             ` Bart Schaefer
  2022-05-29 14:47               ` Jun. T
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Schaefer @ 2022-05-29  3:30 UTC (permalink / raw)
  To: Jun. T; +Cc: Zsh hackers list

On Fri, May 27, 2022 at 10:26 AM Jun. T <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> I was thinking B is enough, because:
> WIFEXITED: has exited spontaneously
> WIFSIGNALED: has been killed by a signal
> WIFSTOPPED: still alive but now stoeepd
> WIFCONTINUED: has been stopped but now continued
> So these 4 are mutually exclusive.

OK.  However, reading through your patch again ... the changed code
kept track of stopped jobs.  Has that never been necessary?  Perhaps
the right thing to do is to continue recording the state on WIFSTOPPED
but to change to clearing the saved status on WIFCONTINUED?

> Or we can use both B and C, and add a DPUTS() if addbgstatus() is
> called more than once for a single pid.

That would be the "defensive programming" approach.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-29  3:30             ` Bart Schaefer
@ 2022-05-29 14:47               ` Jun. T
  2022-05-29 20:07                 ` Bart Schaefer
  2022-06-12 16:43                 ` Eric Nielsen
  0 siblings, 2 replies; 18+ messages in thread
From: Jun. T @ 2022-05-29 14:47 UTC (permalink / raw)
  To: zsh-workers


> 2022/05/29 12:30, Bart Schaefer <schaefer@brasslantern.com> wrote:
> 
> OK. However, reading through your patch again ... the changed code
> kept track of stopped jobs. Has that never been necessary? Perhaps
> the right thing to do is to continue recording the state on WIFSTOPPED
> but to change to clearing the saved status on WIFCONTINUED?


If I understand correctly, {add,get}bgstatus() exists only for the wait
builtin. The only thing that needs be recored in bgstatus_list is the
exist status (including killed by a signal) of the process. Status of a
running/stopped process is kept elsewhere (jobtab[].procs).

Below is the patch including both B (signals.c) and C (jobs.c).
singnals.s is as the previous patch, but the ternary operator is
replaced by if/else.


diff --git a/Src/jobs.c b/Src/jobs.c
index a91ef787f..a8d6aeded 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -2221,6 +2221,7 @@ addbgstatus(pid_t pid, int status)
 {
     static long child_max;
     Bgstatus bgstatus_entry;
+    LinkNode node;
 
     if (!child_max) {
 #ifdef _SC_CHILD_MAX
@@ -2244,6 +2245,18 @@ addbgstatus(pid_t pid, int status)
 	if (!bgstatus_list)
 	    return;
     }
+    /* See if an entry already exists for the pid */
+    for (node = firstnode(bgstatus_list); node; incnode(node)) {
+	bgstatus_entry = (Bgstatus)getdata(node);
+	if (bgstatus_entry->pid == pid) {
+	    /* In theory this should not happen because addbgstatus() is
+	     * called only once when the process exits or gets killed. */
+	    DPUTS(1, "addbgstatus is called more than once");
+	    bgstatus_entry->status = status;
+	    return;
+	}
+    }
+    /* Add an entry for the pid */
     if (bgstatus_count == child_max) {
 	/* Overflow.  List is in order, remove first */
 	rembgstatus(firstnode(bgstatus_list));
diff --git a/Src/signals.c b/Src/signals.c
index 5c787e2a8..a61368554 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -576,12 +576,10 @@ wait_for_processes(void)
 	 */
 	if (jn && !(jn->stat & (STAT_CURSH|STAT_BUILTIN)) &&
 	    jn - jobtab != thisjob) {
-	    int val = (WIFSIGNALED(status) ?
-		   0200 | WTERMSIG(status) :
-		   (WIFSTOPPED(status) ?
-		    0200 | WEXITSTATUS(status) :
-		    WEXITSTATUS(status)));
-	    addbgstatus(pid, val);
+	    if (WIFEXITED(status))
+		addbgstatus(pid, WEXITSTATUS(status));
+	    else if (WIFSIGNALED(status))
+		addbgstatus(pid, 0200 | WTERMSIG(status));
 	}
 
 	unqueue_signals();




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-29 14:47               ` Jun. T
@ 2022-05-29 20:07                 ` Bart Schaefer
  2022-05-30 11:16                   ` Jun T
  2022-06-12 16:43                 ` Eric Nielsen
  1 sibling, 1 reply; 18+ messages in thread
From: Bart Schaefer @ 2022-05-29 20:07 UTC (permalink / raw)
  To: Jun. T; +Cc: Zsh hackers list

Thank you.  A couple of concluding remarks ...

On Sun, May 29, 2022 at 7:48 AM Jun. T <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> +    /* See if an entry already exists for the pid */

Would it be worthwhile to put that entire thing in #ifdef DEBUG ?
Perhaps not, given that the bgstatus_list is unlikely to have many
nodes.

> -           int val = (WIFSIGNALED(status) ?
> -                  0200 | WTERMSIG(status) :
> -                  (WIFSTOPPED(status) ?
> -                   0200 | WEXITSTATUS(status) :
> -                   WEXITSTATUS(status)));

Irrelevant to the patch, but on the original issue, I keep looking at
this and wondering how we got an exit code of 19 rather than 19+128
which would have immediately pointed to SIGCONT.  Is this a difference
in the way MacOS defines WEXITSTATUS(), or what?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-29 20:07                 ` Bart Schaefer
@ 2022-05-30 11:16                   ` Jun T
  2022-06-07  6:51                     ` Jun T
  0 siblings, 1 reply; 18+ messages in thread
From: Jun T @ 2022-05-30 11:16 UTC (permalink / raw)
  To: zsh-workers

This post is rather long, and divided into (1)-(5).

(1)

> 2022/05/30 5:07, Bart Schaefer <schaefer@brasslantern.com> wrote:
> 
> Irrelevant to the patch, but on the original issue, I keep looking at
> this and wondering how we got an exit code of 19 rather than 19+128
> which would have immediately pointed to SIGCONT.  Is this a difference
> in the way MacOS defines WEXITSTATUS(), or what?


On both macOS and Linux, WEXITSTATUS(status) is equivalent to
(((status) & 0xff00) >> 8). This is so either WIFEXITED() is true or not.
But on both OSes, wait(2) man page says that WEXITSTATUS() can be used only
when WIFEXITED() is true. When WIFCONTINUED() is true we should have used
"0200 | SIGCONT".

(2) Why zrags has the problem only on macOS

The reason seems to be in the way SIGCONT is handled (in the kernel?).
On macOS, if the parent zsh sends SIGCONT to a background child and
calls wait3(), then wait3() returns with WIFCONTINUED even if the child
has not been stopped. This is _quite_ strange, and does not happen on
Linux. You may try the following C program:

#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/resource.h>

int main()
{
	pid_t pid = fork();
	if (pid < 0)
		return -1;
	if (pid > 0) {
		pid_t wpid;
		int state;
		struct rusage ru;
		kill(pid, SIGCONT);
		wpid = wait3(&state, WUNTRACED|WCONTINUED, &ru);
		printf("wait3: %d 0x%04x\n", wpid, state);
		kill(pid, SIGKILL);
	} else {
		sleep(1);
	}
	return 0;
}

On Linux, it outputs, after waiting for 1 second:
wait3: 76819 0x0000
but on macOS it immediately outputs:
wait3: 84313 0x137f
For status=0x137f, WIFCONTINUED(status)=true,
and WEXITSTATUS(x)=0x13=19

(3) zsh on Linux also has a problem

On Linux (and macOS), zsh (without the patch) has the following
problem:

% sleep 20 &
[1] 78570
% kill -STOP 78570
[1]  + 78570 suspended (signal)  sleep 20
% kill -CONT 78570
%
(after 20 seconds)
[1]  + 78570 done       sleep 20
% wait 78570
% echo $?
147        # should be 0

147 = 0200 + SIGSTOP(=19)    (on macOS we get 145)

(4) Another minor problem

In the current zsh (without the patch), when WIFSTOPPED()=true,
addbgstatus() records "0200 | WEXITSTATUS(status)" in bgstatus_list.

But wait(2) manpage says WEXITSTATUS() can be used only if WIFEXITED()
is true, and when WIFSTOPPED() is true WSTOPSIG() should be used.
On both Linux and macOS (and probably on other OSes) WSTOPSIG() is
equivalent to WEXITSTATUS(), so there was no observable problem.

grep WEXITSTATUS in the zsh source tree shows that there are two
more uses of WEXITSTATUS() (in jobs.c) which are better replaced
by WSTOPSIG(), as in the patch shown in (5) below.

And:

>> +    /* See if an entry already exists for the pid */
> 
> Would it be worthwhile to put that entire thing in #ifdef DEBUG ?


Slow down is negligible, as you write, but I think chances are _very_
low that addbgusage() is called more than once. So I enclosed it
by #ifdef DEBUG as you suggested.

Also added a test for the problem (3) above.


(5) Revised patch with test

diff --git a/Src/jobs.c b/Src/jobs.c
index a91ef787f..80893cebf 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -414,7 +414,7 @@ storepipestats(Job jn, int inforeground, int fixlastval)
 	jpipestats[i] = (WIFSIGNALED(p->status) ?
 			 0200 | WTERMSIG(p->status) :
 			 (WIFSTOPPED(p->status) ?
-			  0200 | WEXITSTATUS(p->status) :
+			  0200 | WSTOPSIG(p->status) :
 			  WEXITSTATUS(p->status)));
 	if (jpipestats[i])
 	    pipefail = jpipestats[i];
@@ -471,7 +471,7 @@ update_job(Job jn)
 	    val = (WIFSIGNALED(pn->status) ?
 		   0200 | WTERMSIG(pn->status) :
 		   (WIFSTOPPED(pn->status) ?
-		    0200 | WEXITSTATUS(pn->status) :
+		    0200 | WSTOPSIG(pn->status) :
 		    WEXITSTATUS(pn->status)));
 	    signalled = WIFSIGNALED(pn->status);
 	}
@@ -2221,6 +2221,7 @@ addbgstatus(pid_t pid, int status)
 {
     static long child_max;
     Bgstatus bgstatus_entry;
+    LinkNode node;
 
     if (!child_max) {
 #ifdef _SC_CHILD_MAX
@@ -2244,6 +2245,21 @@ addbgstatus(pid_t pid, int status)
 	if (!bgstatus_list)
 	    return;
     }
+#ifdef DEBUG
+    /* See if an entry already exists for the pid */
+    for (node = firstnode(bgstatus_list); node; incnode(node)) {
+	bgstatus_entry = (Bgstatus)getdata(node);
+	if (bgstatus_entry->pid == pid) {
+	    /* In theory this should not happen because addbgstatus() is
+	     * called only once when the process exits or gets killed. */
+	    dputs("addbgstatus called again: status %d -> %d",
+		    bgstatus_entry->status, status);
+	    bgstatus_entry->status = status;
+	    return;
+	}
+    }
+#endif
+    /* Add an entry for the pid */
     if (bgstatus_count == child_max) {
 	/* Overflow.  List is in order, remove first */
 	rembgstatus(firstnode(bgstatus_list));
diff --git a/Src/signals.c b/Src/signals.c
index 5c787e2a8..a61368554 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -576,12 +576,10 @@ wait_for_processes(void)
 	 */
 	if (jn && !(jn->stat & (STAT_CURSH|STAT_BUILTIN)) &&
 	    jn - jobtab != thisjob) {
-	    int val = (WIFSIGNALED(status) ?
-		   0200 | WTERMSIG(status) :
-		   (WIFSTOPPED(status) ?
-		    0200 | WEXITSTATUS(status) :
-		    WEXITSTATUS(status)));
-	    addbgstatus(pid, val);
+	    if (WIFEXITED(status))
+		addbgstatus(pid, WEXITSTATUS(status));
+	    else if (WIFSIGNALED(status))
+		addbgstatus(pid, 0200 | WTERMSIG(status));
 	}
 
 	unqueue_signals();
diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst
index d95ee363c..b257ddf2c 100644
--- a/Test/A05execution.ztst
+++ b/Test/A05execution.ztst
@@ -396,6 +396,13 @@ F:anonymous function, and a descriptor leak when backgrounding a pipeline
 # TBD: the 0 above is believed to be bogus and should also be turned
 # into 127 when the ccorresponding bug is fixed in the main shell.
 
+  sleep 1 & pid=$!
+  kill -STOP $pid
+  sleep 1
+  kill -CONT $pid
+  wait $pid
+0:wait for stopped and continued process
+
 # Without the outer subshell, the test harness reports the pre-46060 behaviour
 # as "skipped" rather than "failed".
  (( exit 130 ) | { sleep 1; echo hello })





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-30 11:16                   ` Jun T
@ 2022-06-07  6:51                     ` Jun T
  2022-06-09  2:56                       ` Bart Schaefer
  0 siblings, 1 reply; 18+ messages in thread
From: Jun T @ 2022-06-07  6:51 UTC (permalink / raw)
  To: zsh-workers


> 2022/05/30 20:16, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
> 
> (5) Revised patch with test

It seems this test "sometimes" fails on macOS:

--- /tmp/zsh.ztst.40838/ztst.err	2022-06-07 12:45:24.000000000 +0900
+++ /tmp/zsh.ztst.40838/ztst.terr	2022-06-07 12:45:25.000000000 +0900
@@ -0,0 +1 @@
+(eval):kill:4: kill 47169 failed: no such process
Test ./A05execution.ztst failed: error output differs from expected as shown above for:
  sleep 1 & pid=$!
  kill -STOP $pid
  sleep 1
  kill -CONT $pid
  wait $pid
Was testing: wait for stopped and continued process

The failed 'kill' is the second one: 'kill -CONT $pid'.
If I remove the 'sleep 1' between the two 'kill's then the test
always passes on macOS. But this is not a solution, because then
the test passes even without the patch for signals.c (at least on
Linux). The only way I can think of to make it pass on macOS (but
fails on Linux without the fix for signals.c) is to increase
the background sleep time to 'sleep 2'.

Is there any better solution? Or is this test necessary?

Patch for jobs.c below is just to silence the warning.


diff --git a/Src/jobs.c b/Src/jobs.c
index aa32f4e80..e0e453ed8 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -2221,7 +2221,9 @@ addbgstatus(pid_t pid, int status)
 {
     static long child_max;
     Bgstatus bgstatus_entry;
+#ifdef DEBUG
     LinkNode node;
+#endif
 
     if (!child_max) {
 #ifdef _SC_CHILD_MAX
diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst
index b257ddf2c..bcadc6d56 100644
--- a/Test/A05execution.ztst
+++ b/Test/A05execution.ztst
@@ -396,7 +396,7 @@ F:anonymous function, and a descriptor leak when backgrounding a pipeline
 # TBD: the 0 above is believed to be bogus and should also be turned
 # into 127 when the ccorresponding bug is fixed in the main shell.
 
-  sleep 1 & pid=$!
+  sleep 2 & pid=$!
   kill -STOP $pid
   sleep 1
   kill -CONT $pid





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-06-07  6:51                     ` Jun T
@ 2022-06-09  2:56                       ` Bart Schaefer
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Schaefer @ 2022-06-09  2:56 UTC (permalink / raw)
  To: Jun T; +Cc: Zsh hackers list

On Mon, Jun 6, 2022 at 11:52 PM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> It seems this test "sometimes" fails on macOS:

I think increasing the sleep to 2 seconds is fine.  The test is about
stopping and starting the job; the choice of 1 second was just to
avoid delaying the whole test thread unnecessarily.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: zargs with -P intermittently failing in zsh 5.9 and macOS
  2022-05-29 14:47               ` Jun. T
  2022-05-29 20:07                 ` Bart Schaefer
@ 2022-06-12 16:43                 ` Eric Nielsen
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Nielsen @ 2022-06-12 16:43 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 293 bytes --]

On May 29, 2022, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
> Below is the patch including both B (signals.c) and C (jobs.c).
> singnals.s is as the previous patch, but the ternary operator is
> replaced by if/else.
Just want to thank Jun and Bart for looking into this and fixing the
issue!

[-- Attachment #2: Type: text/html, Size: 2792 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-06-12 16:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-24 16:09 zargs with -P intermittently failing in zsh 5.9 and macOS Eric Nielsen
2022-05-24 17:32 ` Bart Schaefer
2022-05-24 20:21   ` Eric Nielsen
2022-05-24 20:35     ` Bart Schaefer
2022-05-26  8:57   ` Jun T
2022-05-26 11:21     ` Jun. T
2022-05-27  3:39     ` Bart Schaefer
2022-05-27  4:18       ` Lawrence Velázquez
2022-05-27  5:29       ` Jun T
2022-05-27 16:10         ` Bart Schaefer
2022-05-27 17:25           ` Jun. T
2022-05-29  3:30             ` Bart Schaefer
2022-05-29 14:47               ` Jun. T
2022-05-29 20:07                 ` Bart Schaefer
2022-05-30 11:16                   ` Jun T
2022-06-07  6:51                     ` Jun T
2022-06-09  2:56                       ` Bart Schaefer
2022-06-12 16:43                 ` Eric Nielsen

Code repositories for project(s) associated with this inbox:

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).