* zargs with -P intermittently failing in zsh 5.9 and macOS @ 2022-05-24 16:09 Eric Nielsen 2022-05-24 17:32 ` Bart Schaefer 0 siblings, 1 reply; 18+ messages in thread From: Eric Nielsen @ 2022-05-24 16:09 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1195 bytes --] Hi, I'm seeing zargs with -P intermittently fail with in zsh 5.9 in macOS. Here's the information I could gather so far. Minimal reproducible example: % f() { curl -Sso /dev/null 'https://example.com'; return 0 } % autoload -Uz zargs % setopt xtrace % zargs -n 1 -P 0 -- {1..13} -- f Expected the status code to be 0. Most of the times I get 123 instead. The xtrace output shows: +(eval):6> case 19 (<1-125>|128) +(eval):8> ret=123 at least once, even though f always returned 0. Added the extra return 0 in f to force that return code. If f is simplified, I don't see the error. If the -P 0 is removed, I don't see the error. If the range is shortened, the error will be less frequent. Even with {1..13} it happens to sometimes succeed in my Mid 2015 2.2 GHz Quad-Core Intel Core i7. This is not reproducible with earlier versions of zsh or with the /bin/zsh bundled with macOS (zsh version 5.8.1 in macOS 12.4). This is reproducible with zsh 5.9 installed both via homebrew or macports. This is so far not reproducible in other OSs. Information comes from https://github.com/Homebrew/homebrew- core/issues/101937 and https://github.com/zimfw/zimfw/discussions/459. Kind regards, Eric [-- Attachment #2: Type: text/html, Size: 3737 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-24 16:09 zargs with -P intermittently failing in zsh 5.9 and macOS Eric Nielsen @ 2022-05-24 17:32 ` Bart Schaefer 2022-05-24 20:21 ` Eric Nielsen 2022-05-26 8:57 ` Jun T 0 siblings, 2 replies; 18+ messages in thread From: Bart Schaefer @ 2022-05-24 17:32 UTC (permalink / raw) To: Eric Nielsen; +Cc: zsh-workers On Tue, May 24, 2022 at 9:09 AM Eric Nielsen <ericbn@hey.com> wrote: > > Hi, I'm seeing zargs with -P intermittently fail with in zsh 5.9 in macOS. > This is not reproducible with earlier versions of zsh or with the /bin/zsh bundled with macOS zargs was re-coded as of the 5.9 release so that it waits for individual jobs instead of using a single "wait" for all of them. This means it can detect nonzero exit in cases where earlier revisions missed it. > +(eval):6> case 19 (<1-125>|128) > +(eval):8> ret=123 > > at least once, even though f always returned 0. Something is causing the subshell that was running f to exit with status 19 even though f returned 0. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-24 17:32 ` Bart Schaefer @ 2022-05-24 20:21 ` Eric Nielsen 2022-05-24 20:35 ` Bart Schaefer 2022-05-26 8:57 ` Jun T 1 sibling, 1 reply; 18+ messages in thread From: Eric Nielsen @ 2022-05-24 20:21 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh-workers [-- Attachment #1: Type: text/plain, Size: 627 bytes --] On May 24, 2022, Bart Schaefer <schaefer@brasslantern.com> wrote: > zargs was re-coded as of the 5.9 release so that it waits for > individual jobs instead of using a single "wait" for all of them. > This means it can detect nonzero exit in cases where earlier revisions > missed it. Got it. So we're now seeing a more precise return code than before 5.9. > Something is causing the subshell that was running f to exit with > status 19 even though f returned 0. Any idea what could be returning 19? Looks like it's something outside the scope of f. At least the code is consistently 19, when it happens. Kind regards, Eric [-- Attachment #2: Type: text/html, Size: 3160 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-24 20:21 ` Eric Nielsen @ 2022-05-24 20:35 ` Bart Schaefer 0 siblings, 0 replies; 18+ messages in thread From: Bart Schaefer @ 2022-05-24 20:35 UTC (permalink / raw) To: Eric Nielsen; +Cc: zsh-workers On Tue, May 24, 2022 at 1:21 PM Eric Nielsen <ericbn@hey.com> wrote: > > Any idea what could be returning 19? Unfortunately, no, and my Mac is currently offline. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-24 17:32 ` Bart Schaefer 2022-05-24 20:21 ` Eric Nielsen @ 2022-05-26 8:57 ` Jun T 2022-05-26 11:21 ` Jun. T 2022-05-27 3:39 ` Bart Schaefer 1 sibling, 2 replies; 18+ messages in thread From: Jun T @ 2022-05-26 8:57 UTC (permalink / raw) To: zsh-workers > 2022/05/25 2:32, Bart Schaefer <schaefer@brasslantern.com> wrote: > >> +(eval):6> case 19 (<1-125>|128) >> +(eval):8> ret=123 >> >> at least once, even though f always returned 0. > > Something is causing the subshell that was running f to exit with > status 19 even though f returned 0. The problem can be reproduced by the following simplified script: #!/bin/zsh _zajobs=() ( # works OK if not run in a subshell np=10 # try increasing this if you do not get $? = 19 for ((i=0; i<$np; ++i)); do curl -so /dev/null 'https://example.com' & _zajobs+=( $! ) done #sleep 1 # works OK if 'sleep 1' is added here wait # works OK if this line is commented out for p in $_zajobs; do wait $p echo $p $? done ) If I run this script on my Mac, a few (one or two in most cases, zero or three sometimes) of the ten "wait $p" give $? = 19. But all the "wait $p" return 0 if (1) remove the "wait" before the "for p in $_zajobs", or (2) add "sleep 1" before the "wait", or (3) run whole script in the main shell, i.e., remove the enclosing "(" and ")", or (4) replace "curl -so /dev/null 'https://example.com'" by "sleep 1", or (5) run the script on Linux (I haven't tried other BSDs). (1)(2) may indicate that the first "wait" (without $p) can't correctly wait for some of the child procs...? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-26 8:57 ` Jun T @ 2022-05-26 11:21 ` Jun. T 2022-05-27 3:39 ` Bart Schaefer 1 sibling, 0 replies; 18+ messages in thread From: Jun. T @ 2022-05-26 11:21 UTC (permalink / raw) To: zsh-workers > 2022/05/26 17:57, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > curl -so /dev/null 'https://example.com' & _zajobs+=( $! ) Sorry, curl fails (connection refused) if I test on my another Mac in my home. #!/bin/zsh f () { curl -so /dev/null 'https://example.com'; return 0 } _zajobs=() ( # works OK if not run in a subshell np=20 # try increasing this if you do not get $? = 19 for ((i=0; i<$np; ++i)); do f & _zajobs+=( $! ) done #sleep 1 # works OK if 'sleep 1' is added here wait # works OK if this line is commented out for p in $_zajobs; do wait $p echo $p $? done ) With this f(), the first pid in $_zajobs has the highest probability to give $? = 19. If I replace f() by f {} { date > /dev/null } that takes much shorter time, and set np=20, then I get $? = 19 only sometimes (not frequently), and the last (or the second last) pid in $_zajobs has high probability to fail. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-26 8:57 ` Jun T 2022-05-26 11:21 ` Jun. T @ 2022-05-27 3:39 ` Bart Schaefer 2022-05-27 4:18 ` Lawrence Velázquez 2022-05-27 5:29 ` Jun T 1 sibling, 2 replies; 18+ messages in thread From: Bart Schaefer @ 2022-05-27 3:39 UTC (permalink / raw) To: Jun T; +Cc: Zsh hackers list On Thu, May 26, 2022 at 1:58 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > > 2022/05/25 2:32, Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > Something is causing the subshell that was running f to exit with > > status 19 even though f returned 0. > > The problem can be reproduced by the following simplified script: Thanks. Is this limited to newer MacOS, like the nullexec test discussed in workers/50164 ? > ( # works OK if not run in a subshell Interesting. It may actually be possible to remove the subshell from zargs now (see below). > np=10 # try increasing this if you do not get $? = 19 > for ((i=0; i<$np; ++i)); do > curl -so /dev/null 'https://example.com' & _zajobs+=( $! ) > done I don't suppose you've found any indication of what 19 actually means, there. > #sleep 1 # works OK if 'sleep 1' is added here > wait # works OK if this line is commented out Hm. If that wait is removed, the subshell is probably not necessary. It's there because of a lingering concern that if we didn't first wait for all jobs to finish before starting the individual waits, we might get race conditions. It seems like perhaps the opposite is actually the case. > But all the "wait $p" return 0 if > (1) remove the "wait" before the "for p in $_zajobs", or > (2) add "sleep 1" before the "wait", or > > (1)(2) may indicate that the first "wait" (without $p) can't correctly > wait for some of the child procs...? That would be ... strange ... and would mean we might have to be on the lookout for "wait" failing in other cases too. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-27 3:39 ` Bart Schaefer @ 2022-05-27 4:18 ` Lawrence Velázquez 2022-05-27 5:29 ` Jun T 1 sibling, 0 replies; 18+ messages in thread From: Lawrence Velázquez @ 2022-05-27 4:18 UTC (permalink / raw) To: Bart Schaefer, Jun T; +Cc: zsh-workers On Thu, May 26, 2022, at 11:39 PM, Bart Schaefer wrote: > On Thu, May 26, 2022 at 1:58 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: >> >> > 2022/05/25 2:32, Bart Schaefer <schaefer@brasslantern.com> wrote: >> > >> > Something is causing the subshell that was running f to exit with >> > status 19 even though f returned 0. >> >> The problem can be reproduced by the following simplified script: > > Thanks. Is this limited to newer MacOS, like the nullexec test > discussed in workers/50164 ? I can reproduce the problem on Mojave (10.14.6). -- vq ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-27 3:39 ` Bart Schaefer 2022-05-27 4:18 ` Lawrence Velázquez @ 2022-05-27 5:29 ` Jun T 2022-05-27 16:10 ` Bart Schaefer 1 sibling, 1 reply; 18+ messages in thread From: Jun T @ 2022-05-27 5:29 UTC (permalink / raw) To: zsh-workers > 2022/05/27 12:39, Bart Schaefer <schaefer@brasslantern.com> wrote: > > I don't suppose you've found any indication of what 19 actually means, there. I think I've found it now; 19 = SIGCONT. If in subshell, zwaitjob(job) calls killjb(jn, SIGONT) (jobs.c:1657). When wait_for_processes() is called (in signal handler?), wait3(&state) _sometimes_ returns with WIFCONTINUED(status) set to true. Then wait_for_processes() calls addbgstatus(pid, val) with val=WEXITSTATUS(status)=19. Later the process exits normally, and addbgstatus(pid, 0) is called. This means bgstatus_list has two entries for the pid. When 'wait pid' calls getbgstatus(pid), it finds the entry with status=19 and returns it. A simple fix would be, at line 584 of signals.c, (A) call addbgstatus() only if WIFCONTINUED() is not true. But, what should we do if WIFSTOPPED() is true? The comment before addbgstatus() (jobs.c, line 2211) says: Record the status of a background process that exited so we can execute the builtin wait for it. So I guess we need to call addbgstatus() only if the process has actually exited (or killed). If this is the case, better solution would be (B) call addbgstatus() only if WIFEXITED() or WIFSIGNALED() is true. This it the patch below. Or, if we want to call addbgstatus() unconditionally, (C) modify addbgstatus() so that if there is already an entry for the pid then update it instead of adding a new entry. I feel (B) is the best way, but not sure. Any idea? >> #sleep 1 # works OK if 'sleep 1' is added here >> wait # works OK if this line is commented out > > Hm. If that wait is removed, the subshell is probably not necessary. > It's there because of a lingering concern that if we didn't first wait > for all jobs to finish before starting the individual waits, we might > get race conditions. It seems like perhaps the opposite is actually > the case. I think 'wait pid' for all the pids in $_zajobs would be enough, but not 100 % sure. zargs need not be modified if the addbgstatus-problem is fixed (in some way), but if the wait/subshell is not necessary then it is better removed. It can happen that a user starts (in a script, probably) many background jobs and want to wait for only part of the jobs. If waiting for each job does not work, the user may call it a bug. # of cause if it is hard to fix, then it would be better to include a # workaround in zargs. diff --git a/Src/signals.c b/Src/signals.c index 5c787e2a8..8096993cd 100644 --- a/Src/signals.c +++ b/Src/signals.c @@ -575,14 +575,11 @@ wait_for_processes(void) * and is not equal to the current foreground job. */ if (jn && !(jn->stat & (STAT_CURSH|STAT_BUILTIN)) && - jn - jobtab != thisjob) { - int val = (WIFSIGNALED(status) ? - 0200 | WTERMSIG(status) : - (WIFSTOPPED(status) ? - 0200 | WEXITSTATUS(status) : - WEXITSTATUS(status))); - addbgstatus(pid, val); - } + jn - jobtab != thisjob && + /* record the status only if the process has exited */ + (WIFEXITED(status) || WIFSIGNALED(status))) + addbgstatus(pid, WIFSIGNALED(status) ? 0200 | WTERMSIG(status) + : WEXITSTATUS(status)); unqueue_signals(); } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-27 5:29 ` Jun T @ 2022-05-27 16:10 ` Bart Schaefer 2022-05-27 17:25 ` Jun. T 0 siblings, 1 reply; 18+ messages in thread From: Bart Schaefer @ 2022-05-27 16:10 UTC (permalink / raw) To: Jun T; +Cc: Zsh hackers list On Thu, May 26, 2022 at 10:30 PM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > When wait_for_processes() is called (in signal handler?), wait3(&state) > _sometimes_ returns with WIFCONTINUED(status) set to true. > Then wait_for_processes() calls addbgstatus(pid, val) with > val=WEXITSTATUS(status)=19. > > Later the process exits normally, and addbgstatus(pid, 0) is called. > This means bgstatus_list has two entries for the pid. > > When 'wait pid' calls getbgstatus(pid), it finds the entry with status=19 > and returns it. Even if it's correct to have two entries, it's wrong to return the least recent one. > A simple fix would be, at line 584 of signals.c, > (A) call addbgstatus() only if WIFCONTINUED() is not true. I think that's insufficient ... we should be recording the most recent state, per your question about WIFSTOPPED. > So I guess we need to call addbgstatus() only if the process has actually > exited (or killed). If this is the case, better solution would be > (B) call addbgstatus() only if WIFEXITED() or WIFSIGNALED() is true. > This it the patch below. Does that really work? I would expect WIFSIGNALED to be true when each of WIFSTOPPED and WIFCONTINUED is also true, so it wouldn't change anything? (There isn't any way to stop/continue without using a signal that I can think of.) > Or, if we want to call addbgstatus() unconditionally, > (C) modify addbgstatus() so that if there is already an entry for > the pid then update it instead of adding a new entry. I think either that, or (D) Make sure getbgstatus() always returns the most recent entry. But ... there isn't any way for a user / script to examine more than one job status per PID, so unless we're using it internally somewhere (?) I would think (C) is best. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-27 16:10 ` Bart Schaefer @ 2022-05-27 17:25 ` Jun. T 2022-05-29 3:30 ` Bart Schaefer 0 siblings, 1 reply; 18+ messages in thread From: Jun. T @ 2022-05-27 17:25 UTC (permalink / raw) To: zsh-workers > 2022/05/28 1:10、Bart Schaefer <schaefer@brasslantern.com>のメール: > > On Thu, May 26, 2022 at 10:30 PM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: >> >> >> When 'wait pid' calls getbgstatus(pid), it finds the entry with status=19 >> and returns it. > > Even if it's correct to have two entries, it's wrong to return the > least recent one. Yes, of course, so we need to fix it. >> So I guess we need to call addbgstatus() only if the process has actually >> exited (or killed). If this is the case, better solution would be >> (B) call addbgstatus() only if WIFEXITED() or WIFSIGNALED() is true. >> This it the patch below. > > Does that really work? I would expect WIFSIGNALED to be true when > each of WIFSTOPPED and WIFCONTINUED is also true, so it wouldn't > change anything? (There isn't any way to stop/continue without using > a signal that I can think of.) I've tried A, B, C, and it seems all of them work in the sense that it solves the current zargs's problem, and 'make check' passes. But I think A is insufficient, as you write. I was thinking B is enough, because: WIFEXITED: has exited spontaneously WIFSIGNALED: has been killed by a signal WIFSTOPPED: still alive but now stoeepd WIFCONTINUED: has been stopped but now continued So these 4 are mutually exclusive. In the case of EXITED/SIGNALED, further call of wait3() will not report the status of the process. So if we call addbgstatus() only for EXITED/SIGNALED, then it records the final state of the process just onece. How do you think? Do you have a Mac for testing? Or we can use both B and C, and add a DPUTS() if addbgstatus() is called more than once for a single pid. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-27 17:25 ` Jun. T @ 2022-05-29 3:30 ` Bart Schaefer 2022-05-29 14:47 ` Jun. T 0 siblings, 1 reply; 18+ messages in thread From: Bart Schaefer @ 2022-05-29 3:30 UTC (permalink / raw) To: Jun. T; +Cc: Zsh hackers list On Fri, May 27, 2022 at 10:26 AM Jun. T <takimoto-j@kba.biglobe.ne.jp> wrote: > > I was thinking B is enough, because: > WIFEXITED: has exited spontaneously > WIFSIGNALED: has been killed by a signal > WIFSTOPPED: still alive but now stoeepd > WIFCONTINUED: has been stopped but now continued > So these 4 are mutually exclusive. OK. However, reading through your patch again ... the changed code kept track of stopped jobs. Has that never been necessary? Perhaps the right thing to do is to continue recording the state on WIFSTOPPED but to change to clearing the saved status on WIFCONTINUED? > Or we can use both B and C, and add a DPUTS() if addbgstatus() is > called more than once for a single pid. That would be the "defensive programming" approach. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-29 3:30 ` Bart Schaefer @ 2022-05-29 14:47 ` Jun. T 2022-05-29 20:07 ` Bart Schaefer 2022-06-12 16:43 ` Eric Nielsen 0 siblings, 2 replies; 18+ messages in thread From: Jun. T @ 2022-05-29 14:47 UTC (permalink / raw) To: zsh-workers > 2022/05/29 12:30, Bart Schaefer <schaefer@brasslantern.com> wrote: > > OK. However, reading through your patch again ... the changed code > kept track of stopped jobs. Has that never been necessary? Perhaps > the right thing to do is to continue recording the state on WIFSTOPPED > but to change to clearing the saved status on WIFCONTINUED? If I understand correctly, {add,get}bgstatus() exists only for the wait builtin. The only thing that needs be recored in bgstatus_list is the exist status (including killed by a signal) of the process. Status of a running/stopped process is kept elsewhere (jobtab[].procs). Below is the patch including both B (signals.c) and C (jobs.c). singnals.s is as the previous patch, but the ternary operator is replaced by if/else. diff --git a/Src/jobs.c b/Src/jobs.c index a91ef787f..a8d6aeded 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -2221,6 +2221,7 @@ addbgstatus(pid_t pid, int status) { static long child_max; Bgstatus bgstatus_entry; + LinkNode node; if (!child_max) { #ifdef _SC_CHILD_MAX @@ -2244,6 +2245,18 @@ addbgstatus(pid_t pid, int status) if (!bgstatus_list) return; } + /* See if an entry already exists for the pid */ + for (node = firstnode(bgstatus_list); node; incnode(node)) { + bgstatus_entry = (Bgstatus)getdata(node); + if (bgstatus_entry->pid == pid) { + /* In theory this should not happen because addbgstatus() is + * called only once when the process exits or gets killed. */ + DPUTS(1, "addbgstatus is called more than once"); + bgstatus_entry->status = status; + return; + } + } + /* Add an entry for the pid */ if (bgstatus_count == child_max) { /* Overflow. List is in order, remove first */ rembgstatus(firstnode(bgstatus_list)); diff --git a/Src/signals.c b/Src/signals.c index 5c787e2a8..a61368554 100644 --- a/Src/signals.c +++ b/Src/signals.c @@ -576,12 +576,10 @@ wait_for_processes(void) */ if (jn && !(jn->stat & (STAT_CURSH|STAT_BUILTIN)) && jn - jobtab != thisjob) { - int val = (WIFSIGNALED(status) ? - 0200 | WTERMSIG(status) : - (WIFSTOPPED(status) ? - 0200 | WEXITSTATUS(status) : - WEXITSTATUS(status))); - addbgstatus(pid, val); + if (WIFEXITED(status)) + addbgstatus(pid, WEXITSTATUS(status)); + else if (WIFSIGNALED(status)) + addbgstatus(pid, 0200 | WTERMSIG(status)); } unqueue_signals(); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-29 14:47 ` Jun. T @ 2022-05-29 20:07 ` Bart Schaefer 2022-05-30 11:16 ` Jun T 2022-06-12 16:43 ` Eric Nielsen 1 sibling, 1 reply; 18+ messages in thread From: Bart Schaefer @ 2022-05-29 20:07 UTC (permalink / raw) To: Jun. T; +Cc: Zsh hackers list Thank you. A couple of concluding remarks ... On Sun, May 29, 2022 at 7:48 AM Jun. T <takimoto-j@kba.biglobe.ne.jp> wrote: > > + /* See if an entry already exists for the pid */ Would it be worthwhile to put that entire thing in #ifdef DEBUG ? Perhaps not, given that the bgstatus_list is unlikely to have many nodes. > - int val = (WIFSIGNALED(status) ? > - 0200 | WTERMSIG(status) : > - (WIFSTOPPED(status) ? > - 0200 | WEXITSTATUS(status) : > - WEXITSTATUS(status))); Irrelevant to the patch, but on the original issue, I keep looking at this and wondering how we got an exit code of 19 rather than 19+128 which would have immediately pointed to SIGCONT. Is this a difference in the way MacOS defines WEXITSTATUS(), or what? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-29 20:07 ` Bart Schaefer @ 2022-05-30 11:16 ` Jun T 2022-06-07 6:51 ` Jun T 0 siblings, 1 reply; 18+ messages in thread From: Jun T @ 2022-05-30 11:16 UTC (permalink / raw) To: zsh-workers This post is rather long, and divided into (1)-(5). (1) > 2022/05/30 5:07, Bart Schaefer <schaefer@brasslantern.com> wrote: > > Irrelevant to the patch, but on the original issue, I keep looking at > this and wondering how we got an exit code of 19 rather than 19+128 > which would have immediately pointed to SIGCONT. Is this a difference > in the way MacOS defines WEXITSTATUS(), or what? On both macOS and Linux, WEXITSTATUS(status) is equivalent to (((status) & 0xff00) >> 8). This is so either WIFEXITED() is true or not. But on both OSes, wait(2) man page says that WEXITSTATUS() can be used only when WIFEXITED() is true. When WIFCONTINUED() is true we should have used "0200 | SIGCONT". (2) Why zrags has the problem only on macOS The reason seems to be in the way SIGCONT is handled (in the kernel?). On macOS, if the parent zsh sends SIGCONT to a background child and calls wait3(), then wait3() returns with WIFCONTINUED even if the child has not been stopped. This is _quite_ strange, and does not happen on Linux. You may try the following C program: #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/wait.h> #include <sys/resource.h> int main() { pid_t pid = fork(); if (pid < 0) return -1; if (pid > 0) { pid_t wpid; int state; struct rusage ru; kill(pid, SIGCONT); wpid = wait3(&state, WUNTRACED|WCONTINUED, &ru); printf("wait3: %d 0x%04x\n", wpid, state); kill(pid, SIGKILL); } else { sleep(1); } return 0; } On Linux, it outputs, after waiting for 1 second: wait3: 76819 0x0000 but on macOS it immediately outputs: wait3: 84313 0x137f For status=0x137f, WIFCONTINUED(status)=true, and WEXITSTATUS(x)=0x13=19 (3) zsh on Linux also has a problem On Linux (and macOS), zsh (without the patch) has the following problem: % sleep 20 & [1] 78570 % kill -STOP 78570 [1] + 78570 suspended (signal) sleep 20 % kill -CONT 78570 % (after 20 seconds) [1] + 78570 done sleep 20 % wait 78570 % echo $? 147 # should be 0 147 = 0200 + SIGSTOP(=19) (on macOS we get 145) (4) Another minor problem In the current zsh (without the patch), when WIFSTOPPED()=true, addbgstatus() records "0200 | WEXITSTATUS(status)" in bgstatus_list. But wait(2) manpage says WEXITSTATUS() can be used only if WIFEXITED() is true, and when WIFSTOPPED() is true WSTOPSIG() should be used. On both Linux and macOS (and probably on other OSes) WSTOPSIG() is equivalent to WEXITSTATUS(), so there was no observable problem. grep WEXITSTATUS in the zsh source tree shows that there are two more uses of WEXITSTATUS() (in jobs.c) which are better replaced by WSTOPSIG(), as in the patch shown in (5) below. And: >> + /* See if an entry already exists for the pid */ > > Would it be worthwhile to put that entire thing in #ifdef DEBUG ? Slow down is negligible, as you write, but I think chances are _very_ low that addbgusage() is called more than once. So I enclosed it by #ifdef DEBUG as you suggested. Also added a test for the problem (3) above. (5) Revised patch with test diff --git a/Src/jobs.c b/Src/jobs.c index a91ef787f..80893cebf 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -414,7 +414,7 @@ storepipestats(Job jn, int inforeground, int fixlastval) jpipestats[i] = (WIFSIGNALED(p->status) ? 0200 | WTERMSIG(p->status) : (WIFSTOPPED(p->status) ? - 0200 | WEXITSTATUS(p->status) : + 0200 | WSTOPSIG(p->status) : WEXITSTATUS(p->status))); if (jpipestats[i]) pipefail = jpipestats[i]; @@ -471,7 +471,7 @@ update_job(Job jn) val = (WIFSIGNALED(pn->status) ? 0200 | WTERMSIG(pn->status) : (WIFSTOPPED(pn->status) ? - 0200 | WEXITSTATUS(pn->status) : + 0200 | WSTOPSIG(pn->status) : WEXITSTATUS(pn->status))); signalled = WIFSIGNALED(pn->status); } @@ -2221,6 +2221,7 @@ addbgstatus(pid_t pid, int status) { static long child_max; Bgstatus bgstatus_entry; + LinkNode node; if (!child_max) { #ifdef _SC_CHILD_MAX @@ -2244,6 +2245,21 @@ addbgstatus(pid_t pid, int status) if (!bgstatus_list) return; } +#ifdef DEBUG + /* See if an entry already exists for the pid */ + for (node = firstnode(bgstatus_list); node; incnode(node)) { + bgstatus_entry = (Bgstatus)getdata(node); + if (bgstatus_entry->pid == pid) { + /* In theory this should not happen because addbgstatus() is + * called only once when the process exits or gets killed. */ + dputs("addbgstatus called again: status %d -> %d", + bgstatus_entry->status, status); + bgstatus_entry->status = status; + return; + } + } +#endif + /* Add an entry for the pid */ if (bgstatus_count == child_max) { /* Overflow. List is in order, remove first */ rembgstatus(firstnode(bgstatus_list)); diff --git a/Src/signals.c b/Src/signals.c index 5c787e2a8..a61368554 100644 --- a/Src/signals.c +++ b/Src/signals.c @@ -576,12 +576,10 @@ wait_for_processes(void) */ if (jn && !(jn->stat & (STAT_CURSH|STAT_BUILTIN)) && jn - jobtab != thisjob) { - int val = (WIFSIGNALED(status) ? - 0200 | WTERMSIG(status) : - (WIFSTOPPED(status) ? - 0200 | WEXITSTATUS(status) : - WEXITSTATUS(status))); - addbgstatus(pid, val); + if (WIFEXITED(status)) + addbgstatus(pid, WEXITSTATUS(status)); + else if (WIFSIGNALED(status)) + addbgstatus(pid, 0200 | WTERMSIG(status)); } unqueue_signals(); diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst index d95ee363c..b257ddf2c 100644 --- a/Test/A05execution.ztst +++ b/Test/A05execution.ztst @@ -396,6 +396,13 @@ F:anonymous function, and a descriptor leak when backgrounding a pipeline # TBD: the 0 above is believed to be bogus and should also be turned # into 127 when the ccorresponding bug is fixed in the main shell. + sleep 1 & pid=$! + kill -STOP $pid + sleep 1 + kill -CONT $pid + wait $pid +0:wait for stopped and continued process + # Without the outer subshell, the test harness reports the pre-46060 behaviour # as "skipped" rather than "failed". (( exit 130 ) | { sleep 1; echo hello }) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-30 11:16 ` Jun T @ 2022-06-07 6:51 ` Jun T 2022-06-09 2:56 ` Bart Schaefer 0 siblings, 1 reply; 18+ messages in thread From: Jun T @ 2022-06-07 6:51 UTC (permalink / raw) To: zsh-workers > 2022/05/30 20:16, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > (5) Revised patch with test It seems this test "sometimes" fails on macOS: --- /tmp/zsh.ztst.40838/ztst.err 2022-06-07 12:45:24.000000000 +0900 +++ /tmp/zsh.ztst.40838/ztst.terr 2022-06-07 12:45:25.000000000 +0900 @@ -0,0 +1 @@ +(eval):kill:4: kill 47169 failed: no such process Test ./A05execution.ztst failed: error output differs from expected as shown above for: sleep 1 & pid=$! kill -STOP $pid sleep 1 kill -CONT $pid wait $pid Was testing: wait for stopped and continued process The failed 'kill' is the second one: 'kill -CONT $pid'. If I remove the 'sleep 1' between the two 'kill's then the test always passes on macOS. But this is not a solution, because then the test passes even without the patch for signals.c (at least on Linux). The only way I can think of to make it pass on macOS (but fails on Linux without the fix for signals.c) is to increase the background sleep time to 'sleep 2'. Is there any better solution? Or is this test necessary? Patch for jobs.c below is just to silence the warning. diff --git a/Src/jobs.c b/Src/jobs.c index aa32f4e80..e0e453ed8 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -2221,7 +2221,9 @@ addbgstatus(pid_t pid, int status) { static long child_max; Bgstatus bgstatus_entry; +#ifdef DEBUG LinkNode node; +#endif if (!child_max) { #ifdef _SC_CHILD_MAX diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst index b257ddf2c..bcadc6d56 100644 --- a/Test/A05execution.ztst +++ b/Test/A05execution.ztst @@ -396,7 +396,7 @@ F:anonymous function, and a descriptor leak when backgrounding a pipeline # TBD: the 0 above is believed to be bogus and should also be turned # into 127 when the ccorresponding bug is fixed in the main shell. - sleep 1 & pid=$! + sleep 2 & pid=$! kill -STOP $pid sleep 1 kill -CONT $pid ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-06-07 6:51 ` Jun T @ 2022-06-09 2:56 ` Bart Schaefer 0 siblings, 0 replies; 18+ messages in thread From: Bart Schaefer @ 2022-06-09 2:56 UTC (permalink / raw) To: Jun T; +Cc: Zsh hackers list On Mon, Jun 6, 2022 at 11:52 PM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > It seems this test "sometimes" fails on macOS: I think increasing the sleep to 2 seconds is fine. The test is about stopping and starting the job; the choice of 1 second was just to avoid delaying the whole test thread unnecessarily. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: zargs with -P intermittently failing in zsh 5.9 and macOS 2022-05-29 14:47 ` Jun. T 2022-05-29 20:07 ` Bart Schaefer @ 2022-06-12 16:43 ` Eric Nielsen 1 sibling, 0 replies; 18+ messages in thread From: Eric Nielsen @ 2022-06-12 16:43 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 293 bytes --] On May 29, 2022, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > Below is the patch including both B (signals.c) and C (jobs.c). > singnals.s is as the previous patch, but the ternary operator is > replaced by if/else. Just want to thank Jun and Bart for looking into this and fixing the issue! [-- Attachment #2: Type: text/html, Size: 2792 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-06-12 16:43 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-24 16:09 zargs with -P intermittently failing in zsh 5.9 and macOS Eric Nielsen 2022-05-24 17:32 ` Bart Schaefer 2022-05-24 20:21 ` Eric Nielsen 2022-05-24 20:35 ` Bart Schaefer 2022-05-26 8:57 ` Jun T 2022-05-26 11:21 ` Jun. T 2022-05-27 3:39 ` Bart Schaefer 2022-05-27 4:18 ` Lawrence Velázquez 2022-05-27 5:29 ` Jun T 2022-05-27 16:10 ` Bart Schaefer 2022-05-27 17:25 ` Jun. T 2022-05-29 3:30 ` Bart Schaefer 2022-05-29 14:47 ` Jun. T 2022-05-29 20:07 ` Bart Schaefer 2022-05-30 11:16 ` Jun T 2022-06-07 6:51 ` Jun T 2022-06-09 2:56 ` Bart Schaefer 2022-06-12 16:43 ` Eric Nielsen
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).