* problem with 'ls | less' shell function @ 2022-10-17 9:17 Thomas Klausner 2022-10-17 14:50 ` Mikael Magnusson 2022-10-19 9:33 ` Jun T 0 siblings, 2 replies; 28+ messages in thread From: Thomas Klausner @ 2022-10-17 9:17 UTC (permalink / raw) To: zsh-workers Hi! I recently noticed a problem in zsh 5.9 (as built from pkgsrc) on NetBSD 9.99.100. Since I didn't notice it before it could be related to a change in NetBSD (I'm following the latest version), but I've been told that the issue can be reproduced on Ubuntu 19.04 and FreeBSD 13.1 too; but not in zsh 5.8.1, nor in most other shells though. The discussion on the NetBSD mailing list can be read in this thread: https://mail-index.netbsd.org/current-users/2022/10/12/msg043076.html but I'll summarize the issue I see in zsh here. I have a shell function I've been using for ages: dir() { ls -al "$@" | less; } Recently, when I tried suspending this with CTRL-Z and then resuming it with 'fg', I get: $ dir (CTRL-Z) zsh: done ls -al "$@" | zsh: suspended $ fg [1] + done ls -al "$@" | continued zsh: done ls -al "$@" | zsh: suspended (tty output) zsh: done ls -al "$@" | zsh: suspended (tty output) The same thing works in NetBSD's ksh: $ fg ls -al "$@" | less (CTRL-Z) [1] + Done ls -al "$@" | Stopped less or in bash $ fg ls -al "$@" | less (CTRL-Z) [1]+ Stopped ls -al "$@" | less If I use '/bin/ls' in the shell function instead of 'ls', it works fine. Any ideas what the issue could be? Cheers, Thomas (I'll try to send this without subscribing first, since this seems to be the address intended for bug reports.) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-17 9:17 problem with 'ls | less' shell function Thomas Klausner @ 2022-10-17 14:50 ` Mikael Magnusson 2022-10-17 15:36 ` Thomas Klausner 2022-10-19 9:33 ` Jun T 1 sibling, 1 reply; 28+ messages in thread From: Mikael Magnusson @ 2022-10-17 14:50 UTC (permalink / raw) To: Thomas Klausner; +Cc: zsh-workers On 10/17/22, Thomas Klausner <wiz@gatalith.at> wrote: > Hi! > > I recently noticed a problem in zsh 5.9 (as built from pkgsrc) on > NetBSD 9.99.100. Since I didn't notice it before it could be related > to a change in NetBSD (I'm following the latest version), but I've > been told that the issue can be reproduced on Ubuntu 19.04 and FreeBSD > 13.1 too; but not in zsh 5.8.1, nor in most other shells though. > > The discussion on the NetBSD mailing list can be read in this thread: > https://mail-index.netbsd.org/current-users/2022/10/12/msg043076.html > but I'll summarize the issue I see in zsh here. > > I have a shell function I've been using for ages: > > dir() { ls -al "$@" | less; } > > Recently, when I tried suspending this with CTRL-Z and then resuming > it with 'fg', I get: > > $ dir > (CTRL-Z) > zsh: done ls -al "$@" | > zsh: suspended > $ fg > [1] + done ls -al "$@" | > continued > zsh: done ls -al "$@" | > zsh: suspended (tty output) > zsh: done ls -al "$@" | > zsh: suspended (tty output) > > The same thing works in NetBSD's ksh: > > $ fg > ls -al "$@" | less > (CTRL-Z) > [1] + Done ls -al "$@" | > Stopped less > > or in bash > > $ fg > ls -al "$@" | less > (CTRL-Z) > > [1]+ Stopped ls -al "$@" | less > > If I use '/bin/ls' in the shell function instead of 'ls', it works > fine. > > Any ideas what the issue could be? The last bit implies that 'ls' is an alias or function, can you check the output of 'which ls'? -- Mikael Magnusson ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-17 14:50 ` Mikael Magnusson @ 2022-10-17 15:36 ` Thomas Klausner 2022-10-19 8:26 ` zsugabubus 0 siblings, 1 reply; 28+ messages in thread From: Thomas Klausner @ 2022-10-17 15:36 UTC (permalink / raw) To: Mikael Magnusson; +Cc: zsh-workers On Mon, Oct 17, 2022 at 04:50:20PM +0200, Mikael Magnusson wrote: > On 10/17/22, Thomas Klausner <wiz@gatalith.at> wrote: > > Hi! > > > > I recently noticed a problem in zsh 5.9 (as built from pkgsrc) on > > NetBSD 9.99.100. Since I didn't notice it before it could be related > > to a change in NetBSD (I'm following the latest version), but I've > > been told that the issue can be reproduced on Ubuntu 19.04 and FreeBSD > > 13.1 too; but not in zsh 5.8.1, nor in most other shells though. > > > > The discussion on the NetBSD mailing list can be read in this thread: > > https://mail-index.netbsd.org/current-users/2022/10/12/msg043076.html > > but I'll summarize the issue I see in zsh here. > > > > I have a shell function I've been using for ages: > > > > dir() { ls -al "$@" | less; } > > > > Recently, when I tried suspending this with CTRL-Z and then resuming > > it with 'fg', I get: > > > > $ dir > > (CTRL-Z) > > zsh: done ls -al "$@" | > > zsh: suspended > > $ fg > > [1] + done ls -al "$@" | > > continued > > zsh: done ls -al "$@" | > > zsh: suspended (tty output) > > zsh: done ls -al "$@" | > > zsh: suspended (tty output) > > > > The same thing works in NetBSD's ksh: > > > > $ fg > > ls -al "$@" | less > > (CTRL-Z) > > [1] + Done ls -al "$@" | > > Stopped less > > > > or in bash > > > > $ fg > > ls -al "$@" | less > > (CTRL-Z) > > > > [1]+ Stopped ls -al "$@" | less > > > > If I use '/bin/ls' in the shell function instead of 'ls', it works > > fine. > > > > Any ideas what the issue could be? > > The last bit implies that 'ls' is an alias or function, can you check > the output of 'which ls'? No, it isn't: $ which ls /bin/ls $ type ls ls is /bin/ls Compare to $ type ll ll is an alias for ls -al $ type dir dir is a shell function from /home/wiz/.zshrc But I take back the statement that it works with /bin/ls in the dir function definition, it shows the same broken behaviour. Not sure why I thought it worked before, sorry for that red herring. Thomas ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-17 15:36 ` Thomas Klausner @ 2022-10-19 8:26 ` zsugabubus 2022-10-19 10:04 ` Jun T ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: zsugabubus @ 2022-10-19 8:26 UTC (permalink / raw) To: Thomas Klausner; +Cc: zsh-workers On Mon, Oct 17, 2022 at 05:36:20PM +0200, Thomas Klausner wrote: > On Mon, Oct 17, 2022 at 04:50:20PM +0200, Mikael Magnusson wrote: > > On 10/17/22, Thomas Klausner <wiz@gatalith.at> wrote: > > > Hi! > > > > > > I recently noticed a problem in zsh 5.9 (as built from pkgsrc) on > > > NetBSD 9.99.100. Since I didn't notice it before it could be related > > > to a change in NetBSD (I'm following the latest version), but I've > > > been told that the issue can be reproduced on Ubuntu 19.04 and FreeBSD > > > 13.1 too; but not in zsh 5.8.1, nor in most other shells though. > > > > > > The discussion on the NetBSD mailing list can be read in this thread: > > > https://mail-index.netbsd.org/current-users/2022/10/12/msg043076.html > > > but I'll summarize the issue I see in zsh here. > > > > > > I have a shell function I've been using for ages: > > > > > > dir() { ls -al "$@" | less; } > > > > > > Recently, when I tried suspending this with CTRL-Z and then resuming > > > it with 'fg', I get: > > > > > > $ dir > > > (CTRL-Z) > > > zsh: done ls -al "$@" | > > > zsh: suspended > > > $ fg > > > [1] + done ls -al "$@" | > > > continued > > > zsh: done ls -al "$@" | > > > zsh: suspended (tty output) > > > zsh: done ls -al "$@" | > > > zsh: suspended (tty output) > > > > > > The same thing works in NetBSD's ksh: > > > > > > $ fg > > > ls -al "$@" | less > > > (CTRL-Z) > > > [1] + Done ls -al "$@" | > > > Stopped less > > > > > > or in bash > > > > > > $ fg > > > ls -al "$@" | less > > > (CTRL-Z) > > > > > > [1]+ Stopped ls -al "$@" | less > > > > > > If I use '/bin/ls' in the shell function instead of 'ls', it works > > > fine. > > > > > > Any ideas what the issue could be? > > > > The last bit implies that 'ls' is an alias or function, can you check > > the output of 'which ls'? > > No, it isn't: > > $ which ls > /bin/ls > $ type ls > ls is /bin/ls > > Compare to > > $ type ll > ll is an alias for ls -al > $ type dir > dir is a shell function from /home/wiz/.zshrc > > But I take back the statement that it works with /bin/ls in the dir > function definition, it shows the same broken behaviour. Not sure why > I thought it worked before, sorry for that red herring. > Thomas > I think what happens is that zsh fails to correctly set the foreground process group in `fg`. `less` is not in the foreground pgrp that's why it immediately gets suspended by SIGTTIN after it receives SIGCONT. In this example it is more obvious that `sleep` is not in the foreground pgrp (confirm with ps): $ f(){echo|sleep 60} $ f ^Z $ fg ^Z^Z^C^C^C^C^C^\ 15bf8ace168a86d0fae90b10e9f706baddd4c0bf is the first bad commit 50134: Tweak process group handling to prevent unkillable pipelines -- zsugabubus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-19 8:26 ` zsugabubus @ 2022-10-19 10:04 ` Jun T 2022-10-19 13:36 ` Jun. T 2022-10-20 0:10 ` Bart Schaefer 2022-10-21 5:45 ` Jun T 2 siblings, 1 reply; 28+ messages in thread From: Jun T @ 2022-10-19 10:04 UTC (permalink / raw) To: zsh-workers; +Cc: Thomas Klausner Sorry, I didn't notice this post: > 2022/10/19 17:26, zsugabubus@national.shitposting.agency wrote: > > $ f(){echo|sleep 60} > $ f > ^Z > $ fg > ^Z^Z^C^C^C^C^C^\ Thnaks. Now I understand that we need to use function to reproduce the problem. % ls | less # works fine but % dir () { ls | less } % dir # this has the problem > 15bf8ace168a86d0fae90b10e9f706baddd4c0bf is the first bad commit > 50134: Tweak process group handling to prevent unkillable pipelines Thanks again. > I think what happens is that zsh fails to correctly set the foreground > process group in `fg`. `less` is not in the foreground pgrp that's why > it immediately gets suspended by SIGTTIN after it receives SIGCONT. Yes, and I guess it is related with the code in wait_for_processes(), signals.c, lines 540-557. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-19 10:04 ` Jun T @ 2022-10-19 13:36 ` Jun. T 0 siblings, 0 replies; 28+ messages in thread From: Jun. T @ 2022-10-19 13:36 UTC (permalink / raw) To: zsh-workers > 2022/10/19 19:04, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > Yes, and I guess it is related with the code in wait_for_processes(), > signals.c, lines 540-557. No, sorry again. This is not related. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-19 8:26 ` zsugabubus 2022-10-19 10:04 ` Jun T @ 2022-10-20 0:10 ` Bart Schaefer 2022-10-20 16:26 ` Peter Stephenson 2022-10-21 5:45 ` Jun T 2 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-10-20 0:10 UTC (permalink / raw) To: zsh-workers On Wed, Oct 19, 2022 at 1:28 AM <zsugabubus@national.shitposting.agency> wrote: > > I think what happens is that zsh fails to correctly set the foreground > process group in `fg`. `less` is not in the foreground pgrp that's why > it immediately gets suspended by SIGTTIN after it receives SIGCONT. First sentence partly wrong, second sentence right. When a shell function is suspended with ^Z, the parent shell forks again to create a process for the shell function, but doesn't change the process group of the pipeline. When "fg" brings the shell function process back, that process is made group leader, but it then needs to reset the group leader again to the pid of its own foreground job, which it does not ... or, conversely, upon suspend the parent shell needs to avoid changing the group leader of the new process. At this point I'm not sure which of those needs to happen or where. It's all in or around the call to addproc() at exec.c line 1734. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-20 0:10 ` Bart Schaefer @ 2022-10-20 16:26 ` Peter Stephenson 0 siblings, 0 replies; 28+ messages in thread From: Peter Stephenson @ 2022-10-20 16:26 UTC (permalink / raw) To: zsh-workers > On 20/10/2022 01:10 Bart Schaefer <schaefer@brasslantern.com> wrote: > > > On Wed, Oct 19, 2022 at 1:28 AM <zsugabubus@national.shitposting.agency> wrote: > > > > I think what happens is that zsh fails to correctly set the foreground > > process group in `fg`. `less` is not in the foreground pgrp that's why > > it immediately gets suspended by SIGTTIN after it receives SIGCONT. > > First sentence partly wrong, second sentence right. > > When a shell function is suspended with ^Z, the parent shell forks > again to create a process for the shell function, but doesn't change > the process group of the pipeline. When "fg" brings the shell > function process back, that process is made group leader, but it then > needs to reset the group leader again to the pid of its own foreground > job, which it does not ... or, conversely, upon suspend the parent > shell needs to avoid changing the group leader of the new process. > > At this point I'm not sure which of those needs to happen or where. > It's all in or around the call to addproc() at exec.c line 1734. I'd be quite surprised if you didn't need to be looking at this comment at line 1872. This isn't necessarily the same thing but must surely be related... /* * At this point, we used to attach this process * to the process group of list_pipe_job (the * new superjob) any time that was still available. * That caused problems in at least two * cases because this forked shell was then * suspended with the right hand side of the * pipeline, and the SIGSTOP below suspended * it a second time when it was continued. * * It's therefore not clear entirely why you'd ever * do anything other than the following, but no * doubt we'll find out... */ pws ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-19 8:26 ` zsugabubus 2022-10-19 10:04 ` Jun T 2022-10-20 0:10 ` Bart Schaefer @ 2022-10-21 5:45 ` Jun T 2022-10-21 7:40 ` Jun T 2 siblings, 1 reply; 28+ messages in thread From: Jun T @ 2022-10-21 5:45 UTC (permalink / raw) To: zsh-workers > 2022/10/19 17:26, zsugabubus@national.shitposting.agency wrote: > > 15bf8ace168a86d0fae90b10e9f706baddd4c0bf is the first bad commit > 50134: Tweak process group handling to prevent unkillable pipelines This is for solving the problem that pipelines like the following can't be interrupted by ^C: zsh% { /bin/sleep 10 ; /bin/sleep 11; echo done 1; } | { /bin/sleep 12 ; /bin/sleep 13; echo done 2; } After the commit ^C works. But if we suspend/resume it by ^Z/fg, the pipeline never finishes. We get the output 'done 2', but not 'done 1', nor the next prompt (^C works). After we get 'done 2', if we run 'ps -Oppid,pgid,tpgid' in another terminal: PID PPID PGID TPGID S TTY TIME COMMAND 2447 .... 2447 3644 S pts/0 00:00:00 zsh 3644 2447 3644 3644 R pts/0 00:00:37 zsh 3645 3644 3644 3644 Z pts/0 00:00:00 [sleep] <defunct> 2447 is the main zsh, 3644 is the subshell for the left hand side of the pipeline, and 3645 is 'sleep 10'. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-21 5:45 ` Jun T @ 2022-10-21 7:40 ` Jun T 2022-10-21 21:22 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Jun T @ 2022-10-21 7:40 UTC (permalink / raw) To: zsh-workers > 2022/10/21 14:45, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > 2447 is the main zsh, 3644 is the subshell for the left hand side > of the pipeline, and 3645 is 'sleep 10'. And the subshell is using 100% of a CPU core, in the loop for (; !nowait;) { in function execpline(), mostly lines 1779-1797 in exec.c. The subshell have missed the SIGCHLD? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-21 7:40 ` Jun T @ 2022-10-21 21:22 ` Bart Schaefer 2022-10-21 21:24 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-10-21 21:22 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Fri, Oct 21, 2022 at 12:41 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > The subshell have missed the SIGCHLD? The subshell will never get the SIGCHLD, because it is a sibling of the pipeline, not its parent. It's up to the original shell to assign the correct process group, which means the new subshell for the function has to be in the same process group as the foreground job at the time the STOP/TSTP was received. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-21 21:22 ` Bart Schaefer @ 2022-10-21 21:24 ` Bart Schaefer 2022-10-22 23:22 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-10-21 21:24 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Fri, Oct 21, 2022 at 2:22 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > which means the new subshell for the > function has to be in the same process group as the foreground job at > the time the STOP/TSTP was received. Or more accurately ... the parent needs to assign the tty pgroup to the foreground job, and the forked child has to wait for that process to exit and then grab the pgroup (or be assigned it by the parent). There's a lot of juggling here. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-21 21:24 ` Bart Schaefer @ 2022-10-22 23:22 ` Bart Schaefer 2022-10-22 23:43 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-10-22 23:22 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers OK, perhaps we're getting somewhere. Still working with suspend of dir() { ls -lrRt | less } In bin_fg(), line 2564, we attachtty(jobtab[jobtab[job].other].gleader)) which is using the correct PID. We then enter waitjobs() which calls waitonejob() for that PID (so far so good). However, that returns immediately, because at that point the pipeline gets SIGTTOU -- which is confusing, because it should be attached to the tty. In any case, waitjobs() then falls through to waiting for the function subshell, which is the also the right thing. It's a bit hard to follow here because sometimes the "other" field of the jobtab struct is an offset into jobtab, and sometimes its an actual PID, but the parent shell's internal record-keeping seems to be correct. However, it must be the case that something is sending SIGCONT to "less" before the attachtty() has a chance to ... actually change the process group? If I stop at the point where the SIGCONT is sent, and call getprp() from the debugger, the process group still belongs to the parent shell despite the apparently successful call to attachtty(). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-22 23:22 ` Bart Schaefer @ 2022-10-22 23:43 ` Bart Schaefer 2022-11-03 23:10 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-10-22 23:43 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Sat, Oct 22, 2022 at 4:22 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > However, it must be the case that something is sending > SIGCONT to "less" before the attachtty() has a chance to ... actually > change the process group? If I stop at the point where the SIGCONT is > sent, and call getprp() from the debugger, the process group still > belongs to the parent shell despite the apparently successful call to > attachtty(). If I instrument attachtty(): [1] + continued ls -lrRt | zsh: ep = 0 and last_attached_pgrp = 276698 in 276698 zsh: last_attached_pgrp now 276716 in 276698, gettygrp = 276716 zsh: suspended (tty output) ls -lrRt | zsh: suspended (tty output) ls -lrRt | So as far as zsh knows, the attachtty() was successful, but if the debugger is telling the truth, something is resetting the tty pgrp again without attachtty() being called. I'm out of time to chase this today. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-22 23:43 ` Bart Schaefer @ 2022-11-03 23:10 ` Bart Schaefer 2022-11-04 6:09 ` [PATCH] " Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-11-03 23:10 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Sat, Oct 22, 2022 at 4:43 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > So as far as zsh knows, the attachtty() was successful, but if the > debugger is telling the truth, something is resetting the tty pgrp > again without attachtty() being called. A little further ... I instrumented jobs.c:makerunning() -- % dir() { ls -lrRt | less } % dir ^Z % fg makerunning: PID 79731 job 1 jn->stat: 9427 makerunning: PID 79731 job 2 jn->stat: 370 [1] + continued ls -lrRt | makerunning: PID 79731 job 1 jn->stat: 9425 makerunning: PID 79731 job 2 jn->stat: 368 zsh: suspended (tty output) ls -lrRt | zsh: suspended (tty output) ls -lrRt | With an anonymous function instead of a named one, this doesn't happen and "fg" works: % () { ls -lrRt | less } ^Z % fg makerunning: PID 79731 job 3 jn->stat: 82 [3] - continued ls -lrRt | makerunning: PID 79731 job 3 jn->stat: 80 9427 is CHANGED|STOPPED|LOCKED|INUSE|SUPERJOB|CURSH|SUBLEADER 370 is STOPPED|LOCKED|NOPRINT|INUSE|SUBJOB 82 is STOPPED|LOCKED|INUSE So we can remove at least one makerunning(), I think from bin_fg(). But I then instrumented update_job() where "continued..." is printed, and: % fg [1] + suspended ls -lrRt | makerunning: PID 687 job 1 jn->stat: 9427 makerunning: PID 687 job 2 jn->stat: 370 zsh: wait_for_processes after SIGCHLD zsh: update_job signals.c L559 zsh: update_job signals.c L559 update_job: stopping gleader zsh: suspended (tty output) ls -lrRt | zsh: suspended (tty output) ls -lrRt | Aha. So there's a race condition of sorts, just one that we always lose. We send a SIGCONT to the process group for the forked job, which wakes up both the subjob and the superjob, but when we get the SIGCHLD for the subjob we decide that its superjob is running not because we just restarted it, but because it is still waiting to be stopped from the previous SIGTSTP. I haven't worked out if this means there's an order of signal delivery issue or just that we botched some internal state. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] Re: problem with 'ls | less' shell function 2022-11-03 23:10 ` Bart Schaefer @ 2022-11-04 6:09 ` Bart Schaefer 2022-11-04 15:10 ` [PATCH] " Jun T 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-11-04 6:09 UTC (permalink / raw) To: Zsh hackers list [-- Attachment #1: Type: text/plain, Size: 4001 bytes --] On Thu, Nov 3, 2022 at 4:10 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > We send a SIGCONT to the process group for the forked job, > which wakes up both the subjob and the superjob, but when we get the > SIGCHLD for the subjob we decide that its superjob is running not > because we just restarted it, but because it is still waiting to be > stopped from the previous SIGTSTP. I find we have to take this all the way back to workers/50105 and the "unkillable pipeline". The earlier example given was this: { /bin/sleep 10 ; /bin/sleep 20; } | { /bin/sleep 30 ; /bin/sleep 40; } I "fixed" this with workers/50134 by avoiding a reset of the process group leader when the job is not in the current shell, but it turns out that's not actually the problem. In the above example, two subshells are forked for either side of the pipe and "/bin/sleep 30" becomes the foreground job. Upon interrupting this, "/bin/sleep 40" begins to execute and the right-hand subshell immediately exits, because only the exit value of that sleep matters to the parent. The patch in 50134 makes "/bin/sleep 40" its own group leader in that case, allowing further keyboard-generated signals to reach it. Without 50134, the sleep effectively becomes an orphan; the parent is waiting, but for the wrong job, and the job that is running is immune to signals. There are two problems with this. First, arguably the "sleep 40" should not start at all, because the pipeline was interrupted. (This is in fact what happens in "sh" emulation thanks to workers/47794.) Second, in the "ls | less" function example from this thread, upon ^Z it's wrong to set the group leader to the newly-forked subshell because that results in the aforementioned race condition -- upon "fg", "ls" keeps getting stopped before "less" can get started, which causes the "oops, subjob is still stopped, better stop the superjob" confusion. I believe the reason interrupting "sleep 30" doesn't abort the whole pipeline is because the { ... } expression is in an interactive shell, so the two sleeps get interpreted as separately "jobbable". Running the whole thing in a non-interactive shell (zsh -fs) works fine. In fact % { /bin/sleep 10 ; /bin/sleep 20; } | { /bin/sleep 30 ; /bin/sleep 40; } ^Z % fg ^C % also works fine and does end the pipeline, but if you "fg" without immediate ^C, once "sleep 40" starts running we're right back in the invulnerable orphan state. There are two fixes needed here. One, never set the group leader to a process that's about to exit. Sadly, as far as I can tell, the parent doesn't know whether that's going to happen, and attempting to check with kill(gleader, 0) or similar just leads to a different race condition. Two, stop the whole { ... } construct if any job within it gets interrupted. In both cases 50134 should then be reverted. The attached patch seems to satisfactorily implement the second fix. There's an anecdote I may have mentioned before wherein a mechanic is called upon to fix a piece of machinery. When the job is finished he presents a bill for $1000. "What was the problem?" "Oh, a bolt had come out, I just replaced it." "What? $1000 for a single bolt?" "Sorry, let me correct that." The mechanic takes back his invoice, scribbles a few words, and hands it back. It now reads: "New bolt: $1. Finding out where to put it: $999" This patch is a $1000 bolt. Some no-op whitespace and comments included. Unfortunately, that leaves us with another example in 50105 still unfixed: ( zsh -fc 'print a few words; /bin/sleep 10' ) | { head -n 1 } Here we end up with the current shell as the foreground job because "head" has exited, and as an interactive shell it is ignoring signals. It needs to bring the left-hand-side into the foreground so it can be killed. I haven't tracked that down (nor figured out why 50134 works around it, except to believe it's two bugs canceling each other). Also, yes, I'm on vacation, and it's been raining all day here. [-- Attachment #2: pipejobs.txt --] [-- Type: text/plain, Size: 1644 bytes --] diff --git a/Src/jobs.c b/Src/jobs.c index 707374297..2c4f41547 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -566,6 +566,12 @@ update_job(Job jn) * when the job is finally deleted. */ jn->stat |= STAT_ATTACH; + /* + * If we're in shell jobs on the right side of a pipeline + * we should treat it like a job in the current shell. + */ + if (inforeground == 2) + inforeground = 1; } /* If we have `foo|while true; (( x++ )); done', and hit * ^C, we have to stop the loop, too. */ @@ -1488,10 +1494,7 @@ addproc(pid_t pid, char *text, int aux, struct timeval *bgtime, * set it for that, too. */ if (gleader != -1) { - if (jobtab[thisjob].stat & STAT_CURSH) - jobtab[thisjob].gleader = gleader; - else - jobtab[thisjob].gleader = pid; + jobtab[thisjob].gleader = gleader; if (list_pipe_job_used != -1) jobtab[list_pipe_job_used].gleader = gleader; /* @@ -1500,7 +1503,7 @@ addproc(pid_t pid, char *text, int aux, struct timeval *bgtime, */ last_attached_pgrp = gleader; } else if (!jobtab[thisjob].gleader) - jobtab[thisjob].gleader = pid; + jobtab[thisjob].gleader = pid; /* attach this process to end of process list of current job */ pnlist = &jobtab[thisjob].procs; } @@ -2506,6 +2509,7 @@ bin_fg(char *name, char **argv, Options ops, int func) jobtab[job].stat &= ~STAT_CURSH; } if ((stopped = (jobtab[job].stat & STAT_STOPPED))) { + /* WIFCONTINUED will makerunning() again at killjb() */ makerunning(jobtab + job); if (func == BIN_BG) { /* Set $! to indicate this was backgrounded */ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-04 6:09 ` [PATCH] " Bart Schaefer @ 2022-11-04 15:10 ` Jun T 2022-11-05 0:09 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Jun T @ 2022-11-04 15:10 UTC (permalink / raw) To: zsh-workers > 2022/11/04 15:09, Bart Schaefer <schaefer@brasslantern.com> wrote: > > Unfortunately, that leaves us with another example in 50105 still unfixed: > > ( zsh -fc 'print a few words; /bin/sleep 10' ) | { head -n 1 } The main zsh (zsh0, pid=pgid=100) forks two times; the 1st one exec 'zsh -fc' (zsh1, pid=101), the 2nd one 'head' (pid=102). Both are in the process group pgid=101, and the group becomes foreground. But it seems these two are in different jobs (because of { head } ?). Then zsh1 exec 'sleep 100' (without forking). Now 'sleep 100' has pid=101 and pgid=101. When 'head' exits, zsh0 calls update_job(). In this function: 548 /* is this job in the foreground of an interactive shell? */ 549 if (mypgrp != pgrp && inforeground && 550 (jn->gleader == pgrp || 551 (pgrp > 1 && 552 (kill(-pgrp, 0) == -1 && errno == ESRCH)))) { mypgrp=100, pgrp=101, jn->gleader=101, and the if(...) holds, and the line 568 jn->stat |= STAT_ATTACH; is executed. Since { head } is a separate job and head has exited, the job will be removed, and attachtty(mypgrp=100) will be called when the job is removed. Then pgid=101 (=sleep 100) lose tty. But why just "jn->gleader == pgrp" is enough to assume that the foreground job has finished? If the line 550 above is replaced by 550 ( then ^C works for [1] ( zsh -fc 'print a few words; /bin/sleep 10' ) | { head -n 1 } But now we need two ^C's to kill [2] { sleep 10; sleep 20; } | { sleep 30; sleep 40; } Another code that looks suspicious to me is (also in jobs.c) 478 if (pn->pid == jn->gleader) /* if this process is process group leader */ 479 status = pn->status; 480 } The 'status' is later used at (line numbers after the patch pipejobs.txt): 641 if (inforeground == 2 && isset(MONITOR) && WIFSIGNALED(status)) { When 'sleep 30' is killed, pn->pid=(pid of sleep 30), but it is not equal to jn->gleader (= probably 'sleep 10'?). Then if() at line 641 does not hold, and 'sleep 40' will be started. If the line 478 is replaced by 478 if (WIFSIGNALED(pn->status) || pn->pid == jn->gleader) then [2] can be killed by a single ^C. But I guess it will have bad side effects. And ^Z/fg (still) does not work for [2]. Sorry, I have no time now to investigate further. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-04 15:10 ` [PATCH] " Jun T @ 2022-11-05 0:09 ` Bart Schaefer 2022-11-06 19:12 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-11-05 0:09 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1267 bytes --] On Fri, Nov 4, 2022 at 8:12 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > 548 /* is this job in the foreground of an interactive shell? */ > 549 if (mypgrp != pgrp && inforeground && > 550 (jn->gleader == pgrp || > 551 (pgrp > 1 && > 552 (kill(-pgrp, 0) == -1 && errno == ESRCH)))) { > > But why just "jn->gleader == pgrp" is enough to assume that the foreground > job has finished? > [...] > Another code that looks suspicious to me is (also in jobs.c) > > 478 if (pn->pid == jn->gleader) /* if this process is process group leader */ > 479 status = pn->status; > 480 } > > [...] > If the line 478 is replaced by > > 478 if (WIFSIGNALED(pn->status) || pn->pid == jn->gleader) > > then [2] can be killed by a single ^C. Thank you! This is the clue I needed. On line 476 is 476 signalled = WIFSIGNALED(pn->status); If we change line 550 to be 550 ((jn->gleader == pgrp && signalled) || then all three examples start working. Second patch attached, replaces the one from workers/50862. Tests pass, which is not surprising since all of this depends on signaling interactive shells, but I have no idea how to write a regression test for that. [-- Attachment #2: pipejobs2.txt --] [-- Type: text/plain, Size: 2422 bytes --] diff --git a/Src/jobs.c b/Src/jobs.c index 707374297..76c762ee5 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -544,16 +544,14 @@ update_job(Job jn) if (isset(MONITOR)) { pid_t pgrp = gettygrp(); /* get process group of tty */ + int deadpgrp = (mypgrp != pgrp && inforeground && pgrp > 1 && + kill(-pgrp, 0) == -1 && errno == ESRCH); /* is this job in the foreground of an interactive shell? */ if (mypgrp != pgrp && inforeground && - (jn->gleader == pgrp || - (pgrp > 1 && - (kill(-pgrp, 0) == -1 && errno == ESRCH)))) { + ((jn->gleader == pgrp && signalled) || deadpgrp)) { if (list_pipe) { - if (somestopped || (pgrp > 1 && - kill(-pgrp, 0) == -1 && - errno == ESRCH)) { + if (somestopped || deadpgrp) { attachtty(mypgrp); /* check window size and adjust if necessary */ adjustwinsize(0); @@ -566,6 +564,12 @@ update_job(Job jn) * when the job is finally deleted. */ jn->stat |= STAT_ATTACH; + /* + * If we're in shell jobs on the right side of a pipeline + * we should treat it like a job in the current shell. + */ + if (inforeground == 2) + inforeground = 1; } /* If we have `foo|while true; (( x++ )); done', and hit * ^C, we have to stop the loop, too. */ @@ -1488,10 +1492,7 @@ addproc(pid_t pid, char *text, int aux, struct timeval *bgtime, * set it for that, too. */ if (gleader != -1) { - if (jobtab[thisjob].stat & STAT_CURSH) - jobtab[thisjob].gleader = gleader; - else - jobtab[thisjob].gleader = pid; + jobtab[thisjob].gleader = gleader; if (list_pipe_job_used != -1) jobtab[list_pipe_job_used].gleader = gleader; /* @@ -1500,7 +1501,7 @@ addproc(pid_t pid, char *text, int aux, struct timeval *bgtime, */ last_attached_pgrp = gleader; } else if (!jobtab[thisjob].gleader) - jobtab[thisjob].gleader = pid; + jobtab[thisjob].gleader = pid; /* attach this process to end of process list of current job */ pnlist = &jobtab[thisjob].procs; } @@ -2506,6 +2507,7 @@ bin_fg(char *name, char **argv, Options ops, int func) jobtab[job].stat &= ~STAT_CURSH; } if ((stopped = (jobtab[job].stat & STAT_STOPPED))) { + /* WIFCONTINUED will makerunning() again at killjb() */ makerunning(jobtab + job); if (func == BIN_BG) { /* Set $! to indicate this was backgrounded */ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-05 0:09 ` Bart Schaefer @ 2022-11-06 19:12 ` Bart Schaefer 2022-11-07 8:43 ` Jun T 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-11-06 19:12 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Fri, Nov 4, 2022 at 5:09 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > Second patch attached, replaces the one from workers/50862. This "works for me" so I'm going to push it to encourage wider testing. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-06 19:12 ` Bart Schaefer @ 2022-11-07 8:43 ` Jun T 2022-11-07 19:33 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Jun T @ 2022-11-07 8:43 UTC (permalink / raw) To: zsh-workers > 2022/11/07 4:12, Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Fri, Nov 4, 2022 at 5:09 PM Bart Schaefer <schaefer@brasslantern.com> wrote: >> >> Second patch attached, replaces the one from workers/50862. > > This "works for me" so I'm going to push it to encourage wider testing. [1] ( zsh -fc 'print a few words; /bin/sleep 10'; ) [2] { sleep 10; sleep 11; } | { sleep 20; sleep 21; } [3] dir () { ls | less; }; dir Now both ^C and ^Z/fg fork for [1], ^C works for [2], and ^Z/fg works for [3] ({3] can't be killed by ^C but it's OK). But ^Z/fg still doesn't work for [2]; the pipeline never finishes. This was not working in zsh-5.8.1, zsh-5.9 and zsh-5.9.1-test (before worker/50874), so this is not a regression. worker/50803 > 2022/10/22 6:22, Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Fri, Oct 21, 2022 at 12:41 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: >> >> The subshell have missed the SIGCHLD? > > The subshell will never get the SIGCHLD, because it is a sibling of > the pipeline, not its parent. Do you mean the subshell for the right hand side of the pipe? In the case of [2], the main zsh (zsh0) always fork() a subshell (zsh1) for the left hand side of the pipe, and zsh1 fork/exec 'sleep 10'. By hitting ^Z, zsh0 fork() another subshell (zsh2) for the right hand side of the pipe, but it seems zsh2 is not causing the problem. After 'fg', zsh1 (not zsh2) is using 100% of a CPU core, and when 'sleep 10' exits it is left as "defunct" (not waited for). Strace output suggests that zsh1 does not receive SIGCHLD when 'sleep 10' finishes. Maybe SIGCHLD is blocked?? (I'm not sure). zsh1 is repeatedly calling hasprocs(list_pid_job) at line 1790 in exec.c: 1789 if (!updated && 1790 list_pipe_job && hasprocs(list_pipe_job) && 1791 !(jobtab[list_pipe_job].stat & STAT_STOPPED)) { 1792 int q = queue_signal_level(); 1793 child_unblock(); 1794 child_block(); 1795 dont_queue_signals(); 1796 restore_queue_signals(q); 1797 } When zsh1 is executing this code, thisjob=2 and list_pipe_job=1, and hasprocs(list_pipe_job) returns 0 because jobtab[1].procs is NULL. I guess 'sleep 10' is in thisjob. If I blindly replace the line 1790 by 1790 list_pipe_job && then 'sleep 10' does not left as "defunct" and the pipeline finishes when all the sleeps exit. But zsh1 still uses 100% of CPU before 'sleep 10' exits (replace 'sleep 10' by 'sleep 30' to see this more easily), so this is obviously not the correct solution. # I haven't yet understood (and maybe will never understand) the # list_pipe_xxxx. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-07 8:43 ` Jun T @ 2022-11-07 19:33 ` Bart Schaefer 2022-11-07 19:44 ` Roman Perepelitsa 2022-11-08 0:44 ` Bart Schaefer 0 siblings, 2 replies; 28+ messages in thread From: Bart Schaefer @ 2022-11-07 19:33 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Mon, Nov 7, 2022 at 12:44 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > [1] ( zsh -fc 'print a few words; /bin/sleep 10'; ) Missing the | { head -n -1 } here, but I think still confirmed working? > [2] { sleep 10; sleep 11; } | { sleep 20; sleep 21; } > [3] dir () { ls | less; }; dir > > Now both ^C and ^Z/fg fork for [1], ^C works for [2], > and ^Z/fg works for [3] ({3] can't be killed by ^C but it's OK). > > But ^Z/fg still doesn't work for [2]; the pipeline never finishes. [...] > In the case of [2], the main zsh (zsh0) always fork() a subshell (zsh1) > for the left hand side of the pipe, and zsh1 fork/exec 'sleep 10'. > By hitting ^Z, zsh0 fork() another subshell (zsh2) for the right hand > side of the pipe, but it seems zsh2 is not causing the problem. Hmm. If I examine "pstree" I find: zsh(43919)---zsh(43978)---sleep(43980) 43978 is your "zsh2" and is busy-waiting with 100% CPU at #0 0x000055ca5bc4c042 in execpline (state=0x7ffc311ce770, slcode=4098, how=2, last1=0) at exec.c:1789 #1 0x000055ca5bc4abc3 in execlist (state=0x7ffc311ce770, dont_change_job=1, exiting=1) at exec.c:1444 #2 0x000055ca5bc477d7 in execcursh (state=0x7ffc311ce770, do_exec=1) at exec.c:450 43919 is "zsh1" and is here: #0 0x00007fcadf04045c in __GI___sigsuspend (set=0x7ffc311cd9b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26 #1 0x000055ca5bcbf050 in signal_suspend (sig=17, wait_cmd=0) at signals.c:393 #2 0x000055ca5bc7c31a in zwaitjob (job=1, wait_cmd=0) at jobs.c:1629 > After 'fg', zsh1 (not zsh2) is using 100% of a CPU core, and > when 'sleep 10' exits it is left as "defunct" (not waited for). pstree for that zombie sleep shows it's a child of 43919 (zsh1) and hasn't been cleaned up yet because zsh1 is waiting on zsh2. zsh2 believes it's going to get signaled for the sleep, but it won't. > > 2022/10/22 6:22, Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > On Fri, Oct 21, 2022 at 12:41 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > >> > >> The subshell have missed the SIGCHLD? > > > > The subshell will never get the SIGCHLD, because it is a sibling of > > the pipeline, not its parent. > > Do you mean the subshell for the right hand side of the pipe? Yes, that is what I mean. The zombie sleep and zsh2 are siblings. > Strace output suggests that zsh1 does not receive SIGCHLD when > 'sleep 10' finishes. Maybe SIGCHLD is blocked?? (I'm not sure). #2 0x000055ca5bc7c31a in zwaitjob (job=1, wait_cmd=0) at jobs.c:1629 job=1 is zsh2: (gdb) p jobtab[1].procs[0] $2 = {next = 0x55ca5dca7690, pid = 43978, waitjob() in zsh1 will never return because zsh2 is in an infinite loop waiting for a child that doesn't exist. > zsh1 is repeatedly calling hasprocs(list_pid_job) at line 1790 in > exec.c: This is actually zsh2. As you noted, hasprocs() returns zero, so we fall through to 1902 else if (subsh && jn->stat & STAT_STOPPED) 1903 thisjob = newjob; 1904 else 1905 break; The condition at 1902 is true because jn->stat refers to the zombie sleep, which WAS stopped when zsh2 was forked (but is not actually even a job of this new subshell). So we go around the loop again at 1778 for (; !nowait;) { When zsh2 was forked during "sleep 20" , we should have hit this 1899 break; so we must have re-entered at line 1778 after starting the "sleep 21" and are now incorrectly waiting for the left side of the pipeline. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-07 19:33 ` Bart Schaefer @ 2022-11-07 19:44 ` Roman Perepelitsa 2022-11-08 0:46 ` Bart Schaefer 2022-11-08 0:44 ` Bart Schaefer 1 sibling, 1 reply; 28+ messages in thread From: Roman Perepelitsa @ 2022-11-07 19:44 UTC (permalink / raw) To: Bart Schaefer; +Cc: Jun T, zsh-workers On Mon, Nov 7, 2022 at 8:33 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Mon, Nov 7, 2022 at 12:44 AM Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > > > [1] ( zsh -fc 'print a few words; /bin/sleep 10'; ) The old bug that's been bothering me might be related. This hangs: zsh -fic '( setopt monitor; true | true )' A few more details here: https://www.zsh.org/mla/workers//2020/msg00471.html Do you think your patch will fix this? Roman. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-07 19:44 ` Roman Perepelitsa @ 2022-11-08 0:46 ` Bart Schaefer 0 siblings, 0 replies; 28+ messages in thread From: Bart Schaefer @ 2022-11-08 0:46 UTC (permalink / raw) To: Roman Perepelitsa; +Cc: Jun T, zsh-workers On Mon, Nov 7, 2022 at 11:44 AM Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote: > > zsh -fic '( setopt monitor; true | true )' > > Do you think your patch will fix this? I don't the the patch I've already pushed will fix that, no. IIRC there's a race condition with the above example because the first "true" exits before the pipeline to the second "true" has even been set up. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-07 19:33 ` Bart Schaefer 2022-11-07 19:44 ` Roman Perepelitsa @ 2022-11-08 0:44 ` Bart Schaefer 2022-11-08 5:03 ` Bart Schaefer 1 sibling, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-11-08 0:44 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Mon, Nov 7, 2022 at 11:33 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > [2] { sleep 10; sleep 11; } | { sleep 20; sleep 21; } [...] > Hmm. If I examine "pstree" I find: > > zsh(43919)---zsh(43978)---sleep(43980) > > 43978 is your "zsh2" and is busy-waiting with 100% CPU at OK, this just gets weirder. There are different results depending on whether you hit ^Z during the first "sleep 10", during the second "sleep 11", or during "sleep 21". There are also different results if you change it to { sleep 5; sleep 6; } | { sleep 20; sleep 21; } and hit ^Z during "sleep 20". So some of what I said in workers/50901 doesn't always apply, though the location of the busy-loop is always the same. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-08 0:44 ` Bart Schaefer @ 2022-11-08 5:03 ` Bart Schaefer 2022-11-09 5:03 ` Bart Schaefer 0 siblings, 1 reply; 28+ messages in thread From: Bart Schaefer @ 2022-11-08 5:03 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers On Mon, Nov 7, 2022 at 4:44 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > the location of the busy-loop is always > the same. I'm sure someone is going to explain to me why the following is wrong. If we're in the subshell and we just came back from being suspended, why would we not resume the current job? What happens with the patch below is that in the right side of the pipeline, after the new subshell is created upon ^Z, "fg" resumes the previous foreground job (the earlier job in the right-side brace construct) and the new subshell remains stopped until that job finishes, at which point it is resumed and proceeds with the next job in the braces. There are still some oddball race conditions. Depending on exactly when the ^Z is sent, and particularly after the left-side job has finished, the resumed right-side can become unkillable again. But that doesn't always happen. diff --git a/Src/exec.c b/Src/exec.c index c8bcf4ee5..b0f42ae67 100644 --- a/Src/exec.c +++ b/Src/exec.c @@ -1899,8 +1899,12 @@ execpline(Estate state, wordcode slcode, int how, int last1) break; } } - else if (subsh && jn->stat & STAT_STOPPED) - thisjob = newjob; + else if (subsh && jn->stat & STAT_STOPPED) { + if (thisjob == newjob) + makerunning(jn); + else + thisjob = newjob; + } else break; } ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] problem with 'ls | less' shell function 2022-11-08 5:03 ` Bart Schaefer @ 2022-11-09 5:03 ` Bart Schaefer 0 siblings, 0 replies; 28+ messages in thread From: Bart Schaefer @ 2022-11-09 5:03 UTC (permalink / raw) To: Jun T; +Cc: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1196 bytes --] On Mon, Nov 7, 2022 at 9:03 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > I'm sure someone is going to explain to me why the following is wrong. Nobody has, yet. However, after thinking for a while about this: > What happens with the patch below is that in the right side of the > pipeline, after the new subshell is created upon ^Z, "fg" resumes the > previous foreground job (the earlier job in the right-side brace > construct) and the new subshell remains stopped until that job > finishes, at which point it is resumed and proceeds with the next job > in the braces. ... it occurred to me that it might be accomplishing a superset of this from 50874: + /* + * If we're in shell jobs on the right side of a pipeline + * we should treat it like a job in the current shell. + */ + if (inforeground == 2) + inforeground = 1; ... and indeed, if I remove that bit of 50874 I'm still able to do ^Z/fg and ^C in all the previously hanging or re-suspending examples. I like that there doesn't seem to be as much of a magic bullet as that seemed to be. Patch attached. [-- Attachment #2: pipejobs3.txt --] [-- Type: text/plain, Size: 993 bytes --] diff --git a/Src/exec.c b/Src/exec.c index 2422dae91..d4e681887 100644 --- a/Src/exec.c +++ b/Src/exec.c @@ -1899,8 +1899,12 @@ execpline(Estate state, wordcode slcode, int how, int last1) break; } } - else if (subsh && jn->stat & STAT_STOPPED) - thisjob = newjob; + else if (subsh && jn->stat & STAT_STOPPED) { + if (thisjob == newjob) + makerunning(jn); + else + thisjob = newjob; + } else break; } diff --git a/Src/jobs.c b/Src/jobs.c index 76c762ee5..4863962b9 100644 --- a/Src/jobs.c +++ b/Src/jobs.c @@ -564,12 +564,6 @@ update_job(Job jn) * when the job is finally deleted. */ jn->stat |= STAT_ATTACH; - /* - * If we're in shell jobs on the right side of a pipeline - * we should treat it like a job in the current shell. - */ - if (inforeground == 2) - inforeground = 1; } /* If we have `foo|while true; (( x++ )); done', and hit * ^C, we have to stop the loop, too. */ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-17 9:17 problem with 'ls | less' shell function Thomas Klausner 2022-10-17 14:50 ` Mikael Magnusson @ 2022-10-19 9:33 ` Jun T 2022-10-19 10:01 ` Jun T 1 sibling, 1 reply; 28+ messages in thread From: Jun T @ 2022-10-19 9:33 UTC (permalink / raw) To: zsh-workers; +Cc: Thomas Klausner Is there anyone here who can reproduce this problem? > 2022/10/17 18:17, Thomas Klausner <wiz@gatalith.at> wrote: > > I recently noticed a problem in zsh 5.9 (as built from pkgsrc) on > NetBSD 9.99.100. Since I didn't notice it before it could be related > to a change in NetBSD (I'm following the latest version), I can't (and will not be able to) test on NetBSD 9.99. I tested on NetBSD 9.3, but couldn't reproduce the problem. > but I've > been told that the issue can be reproduced on Ubuntu 19.04 and FreeBSD > 13.1 too; I also tested on FreeBSD 13.1, but ^Z/fg worked fine. I don't have Ubuntu-19.04, but any of LTS (18.04/20.04/22.04) works fine. Does the problem occur every time you suspend a pipeline, or just occasionally? (I tested only a few times) I'm testing on virtual machines for which I allocate only two CPU cores. Usually two cores are enough to reproduce this kind of problems (probably related with a kind of race conditions), but haven't tested with more cores. > but not in zsh 5.8.1, Can you try 'git bisect'? Or at least checkout some specific commits (the one just before the suspicious commits) and test them? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: problem with 'ls | less' shell function 2022-10-19 9:33 ` Jun T @ 2022-10-19 10:01 ` Jun T 0 siblings, 0 replies; 28+ messages in thread From: Jun T @ 2022-10-19 10:01 UTC (permalink / raw) To: zsh-workers; +Cc: Thomas Klausner Sorry, please disregard my following post; > 2022/10/19 18:33, Jun T <takimoto-j@kba.biglobe.ne.jp> wrote: > > Is there anyone here who can reproduce this problem? ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2022-11-09 5:04 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-17 9:17 problem with 'ls | less' shell function Thomas Klausner 2022-10-17 14:50 ` Mikael Magnusson 2022-10-17 15:36 ` Thomas Klausner 2022-10-19 8:26 ` zsugabubus 2022-10-19 10:04 ` Jun T 2022-10-19 13:36 ` Jun. T 2022-10-20 0:10 ` Bart Schaefer 2022-10-20 16:26 ` Peter Stephenson 2022-10-21 5:45 ` Jun T 2022-10-21 7:40 ` Jun T 2022-10-21 21:22 ` Bart Schaefer 2022-10-21 21:24 ` Bart Schaefer 2022-10-22 23:22 ` Bart Schaefer 2022-10-22 23:43 ` Bart Schaefer 2022-11-03 23:10 ` Bart Schaefer 2022-11-04 6:09 ` [PATCH] " Bart Schaefer 2022-11-04 15:10 ` [PATCH] " Jun T 2022-11-05 0:09 ` Bart Schaefer 2022-11-06 19:12 ` Bart Schaefer 2022-11-07 8:43 ` Jun T 2022-11-07 19:33 ` Bart Schaefer 2022-11-07 19:44 ` Roman Perepelitsa 2022-11-08 0:46 ` Bart Schaefer 2022-11-08 0:44 ` Bart Schaefer 2022-11-08 5:03 ` Bart Schaefer 2022-11-09 5:03 ` Bart Schaefer 2022-10-19 9:33 ` Jun T 2022-10-19 10:01 ` Jun T
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).