zsh-users
 help / color / mirror / code / Atom feed
* Re: zsh mysteriously suspending job with sudo
       [not found] <CAG4d5Bg=vkj6PYYGVyNO3pWTh=RkT=2rTFdE_hPBj1xxz6YTSw__29611.299622205$1582420251$gmane$org@mail.gmail.com>
@ 2020-02-23 14:03 ` Stephane Chazelas
  2020-02-23 18:53   ` Ronan Pigott
  2020-02-23 20:18   ` Peter Stephenson
  0 siblings, 2 replies; 8+ messages in thread
From: Stephane Chazelas @ 2020-02-23 14:03 UTC (permalink / raw)
  To: Ronan Pigott; +Cc: Zsh Users

2020-02-22 18:09:13 -0700, Ronan Pigott:
[...]
> $ sudo true
> [sudo] password for ronan:
> $ pacman -Qttdq | sudo pacman -Rns -
> [...]
> :: Do you want to remove these packages? [Y/n] zsh: done
>  pacman -Qttdq |
> zsh: suspended (tty output)  sudo pacman -Rns -
[...]

Can be reproduced with:

$ sleep 1 | sudo sh -c 'sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty'
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
chazelas 25430  8308 25430 25430  0 13:44 pts/1    00:00:00 /bin/zsh
root     26867 25430 26866 25430  0 13:46 pts/1    00:00:00   sudo sh -c sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
root     26868 26867 26866 25430  0 13:46 pts/1    00:00:00     sh -c sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
root     26871 26868 26866 25430  0 13:46 pts/1    00:00:00       ps -jfHt /dev/pts/1
25430
zsh: done                   sleep 1 |
zsh: suspended (tty input)  sudo sh -c

As seen above, at the time "ps" is run (and awk later), the
foreground process group of the terminal is 25430 which is the
pgid of the main shell, not the pgid of the foreground job
(26866), which is why that job gets a SIGTTIN when awk tries to
read from the terminal (or SIGTTOU when pacman does an ioctl to
the terminal).

From "strace", it seems it's because when "sleep 1" (the process
group leader) finishes, zsh does a kill(-26866,0), presumably to
check that the process group is still alive, but that fails with
EPERM as there are processes running as root in that group, and
then zsh changes the foreground process group back to the main
shell's.

So it seems indeed to be a bug in zsh.

I suppose an easy fix would be to check for an errno of ESRCH
when kill(-pgid,0) fails to make sure it's because the process
group is gone. But, here the shell should be able to know that
the job is not gone as sudo, a direct children of the shell has
not returned yet, so there's probably something wrong with the
logic in the first place.

-- 
Stephane

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh mysteriously suspending job with sudo
  2020-02-23 14:03 ` zsh mysteriously suspending job with sudo Stephane Chazelas
@ 2020-02-23 18:53   ` Ronan Pigott
  2020-02-23 23:57     ` Ronan Pigott
  2020-02-23 20:18   ` Peter Stephenson
  1 sibling, 1 reply; 8+ messages in thread
From: Ronan Pigott @ 2020-02-23 18:53 UTC (permalink / raw)
  To: stephane, Zsh Users

[-- Attachment #1: Type: text/plain, Size: 2571 bytes --]

Wow! Thank you Stephane for your analysis.

So you think that a patch checking for ESRCH is not suitable?

I tried poking around in Src/jobs.c in update_job where
it sounds like this issue originates, but I'm not quite sure
how the logic should be revised.

Is there somewhere else where this bug should be reported?

Thanks again for your help!


On Sun, Feb 23, 2020 at 7:03 AM Stephane Chazelas <stephane@chazelas.org>
wrote:

> 2020-02-22 18:09:13 -0700, Ronan Pigott:
> [...]
> > $ sudo true
> > [sudo] password for ronan:
> > $ pacman -Qttdq | sudo pacman -Rns -
> > [...]
> > :: Do you want to remove these packages? [Y/n] zsh: done
> >  pacman -Qttdq |
> > zsh: suspended (tty output)  sudo pacman -Rns -
> [...]
>
> Can be reproduced with:
>
> $ sleep 1 | sudo sh -c 'sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}"
> /proc/self/stat /dev/tty'
> UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
> chazelas 25430  8308 25430 25430  0 13:44 pts/1    00:00:00 /bin/zsh
> root     26867 25430 26866 25430  0 13:46 pts/1    00:00:00   sudo sh -c
> sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
> root     26868 26867 26866 25430  0 13:46 pts/1    00:00:00     sh -c
> sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
> root     26871 26868 26866 25430  0 13:46 pts/1    00:00:00       ps -jfHt
> /dev/pts/1
> 25430
> zsh: done                   sleep 1 |
> zsh: suspended (tty input)  sudo sh -c
>
> As seen above, at the time "ps" is run (and awk later), the
> foreground process group of the terminal is 25430 which is the
> pgid of the main shell, not the pgid of the foreground job
> (26866), which is why that job gets a SIGTTIN when awk tries to
> read from the terminal (or SIGTTOU when pacman does an ioctl to
> the terminal).
>
> From "strace", it seems it's because when "sleep 1" (the process
> group leader) finishes, zsh does a kill(-26866,0), presumably to
> check that the process group is still alive, but that fails with
> EPERM as there are processes running as root in that group, and
> then zsh changes the foreground process group back to the main
> shell's.
>
> So it seems indeed to be a bug in zsh.
>
> I suppose an easy fix would be to check for an errno of ESRCH
> when kill(-pgid,0) fails to make sure it's because the process
> group is gone. But, here the shell should be able to know that
> the job is not gone as sudo, a direct children of the shell has
> not returned yet, so there's probably something wrong with the
> logic in the first place.
>
> --
> Stephane
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh mysteriously suspending job with sudo
  2020-02-23 14:03 ` zsh mysteriously suspending job with sudo Stephane Chazelas
  2020-02-23 18:53   ` Ronan Pigott
@ 2020-02-23 20:18   ` Peter Stephenson
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Stephenson @ 2020-02-23 20:18 UTC (permalink / raw)
  To: zsh-users

On Sun, 2020-02-23 at 14:03 +0000, Stephane Chazelas wrote:
> 2020-02-22 18:09:13 -0700, Ronan Pigott:
> Can be reproduced with:
> 
> $ sleep 1 | sudo sh -c 'sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty'
> UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
> chazelas 25430  8308 25430 25430  0 13:44 pts/1    00:00:00 /bin/zsh
> root     26867 25430 26866 25430  0 13:46 pts/1    00:00:00   sudo sh -c sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
> root     26868 26867 26866 25430  0 13:46 pts/1    00:00:00     sh -c sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
> root     26871 26868 26866 25430  0 13:46 pts/1    00:00:00       ps -jfHt /dev/pts/1
> 25430
> zsh: done                   sleep 1 |
> zsh: suspended (tty input)  sudo sh -c
> 
> As seen above, at the time "ps" is run (and awk later), the
> foreground process group of the terminal is 25430 which is the
> pgid of the main shell, not the pgid of the foreground job
> (26866), which is why that job gets a SIGTTIN when awk tries to
> read from the terminal (or SIGTTOU when pacman does an ioctl to
> the terminal).
> 
> From "strace", it seems it's because when "sleep 1" (the process
> group leader) finishes, zsh does a kill(-26866,0), presumably to
> check that the process group is still alive, but that fails with
> EPERM as there are processes running as root in that group, and
> then zsh changes the foreground process group back to the main
> shell's.
> 
> So it seems indeed to be a bug in zsh.
> 
> I suppose an easy fix would be to check for an errno of ESRCH
> when kill(-pgid,0) fails to make sure it's because the process
> group is gone.

Thanks, this is probably something like the following --- it hits a
number of other places doing something similar which look like they need
the same treatment, though I'm happy to be told otherwise, but the one
in signals.c is the key one here.

I don't know if we really need the different cases of killpg(pgrp, 0)
and kill(-pgprp, 0)?

> But, here the shell should be able to know that
> the job is not gone as sudo, a direct children of the shell has
> not returned yet, so there's probably something wrong with the
> logic in the first place.

I wouldn't have the first clue how to begin checking that there is some
appropriate set of processes associated with the process group still
there that would allow this in a sufficiently race-free manner, but if
anyone does have a clue they are welcome to try.

pws

diff --git a/Src/exec.c b/Src/exec.c
index 50027654a..0d321b757 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1036,7 +1036,8 @@ entersubsh(int flags, struct entersubsh_ret *retp)
     } else if (thisjob != -1 && (flags & ESUB_PGRP)) {
 	if (jobtab[list_pipe_job].gleader && (list_pipe || list_pipe_child)) {
 	    if (setpgrp(0L, jobtab[list_pipe_job].gleader) == -1 ||
-		killpg(jobtab[list_pipe_job].gleader, 0) == -1) {
+		(killpg(jobtab[list_pipe_job].gleader, 0) == -1  &&
+		errno == ESRCH)) {
 		jobtab[list_pipe_job].gleader =
 		    jobtab[thisjob].gleader = (list_pipe_child ? mypgrp : getpid());
 		setpgrp(0L, jobtab[list_pipe_job].gleader);
diff --git a/Src/jobs.c b/Src/jobs.c
index e7438251e..182e65f39 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -283,7 +283,8 @@ handle_sub(int job, int fg)
 
 	    if ((cp = ((WIFEXITED(jn->procs->status) ||
 			WIFSIGNALED(jn->procs->status)) &&
-		       killpg(jn->gleader, 0) == -1))) {
+		       (killpg(jn->gleader, 0) == -1 &&
+			errno == ESRCH)))) {
 		Process p;
 		for (p = jn->procs; p->next; p = p->next);
 		jn->gleader = p->pid;
@@ -541,9 +542,13 @@ update_job(Job jn)
 
 	/* is this job in the foreground of an interactive shell? */
 	if (mypgrp != pgrp && inforeground &&
-	    (jn->gleader == pgrp || (pgrp > 1 && kill(-pgrp, 0) == -1))) {
+	    (jn->gleader == pgrp ||
+	     (pgrp > 1 &&
+	      (kill(-pgrp, 0) == -1 && errno == ESRCH)))) {
 	    if (list_pipe) {
-		if (somestopped || (pgrp > 1 && kill(-pgrp, 0) == -1)) {
+		if (somestopped || (pgrp > 1 &&
+				    kill(-pgrp, 0) == -1 &&
+				    errno == ESRCH)) {
 		    attachtty(mypgrp);
 		    /* check window size and adjust if necessary */
 		    adjustwinsize(0);
@@ -2469,7 +2474,8 @@ bin_fg(char *name, char **argv, Options ops, int func)
 		    if ((jobtab[job].stat & STAT_SUPERJOB) &&
 			((!jobtab[job].procs->next ||
 			  (jobtab[job].stat & STAT_SUBLEADER) ||
-			  killpg(jobtab[job].gleader, 0) == -1)) &&
+			  (killpg(jobtab[job].gleader, 0) == -1  &&
+			  errno == ESRCH))) &&
 			jobtab[jobtab[job].other].gleader)
 			attachtty(jobtab[jobtab[job].other].gleader);
 		    else
diff --git a/Src/signals.c b/Src/signals.c
index 96ff9e9b3..4adf03202 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -539,7 +539,8 @@ wait_for_processes(void)
 #endif
 		if (WIFEXITED(status) &&
 		    pn->pid == jn->gleader &&
-		    killpg(pn->pid, 0) == -1) {
+		    killpg(pn->pid, 0) == -1  &&
+		    errno == ESRCH) {
 		    if (last_attached_pgrp == jn->gleader &&
 			!(jn->stat & STAT_NOSTTY)) {
 			/*


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh mysteriously suspending job with sudo
  2020-02-23 18:53   ` Ronan Pigott
@ 2020-02-23 23:57     ` Ronan Pigott
  2020-05-14 21:15       ` Ronan Pigott
  0 siblings, 1 reply; 8+ messages in thread
From: Ronan Pigott @ 2020-02-23 23:57 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 2829 bytes --]

Brilliant! Thank you Peter for the patch.

I have applied it to my zsh and pacman is working again.

On Sun, Feb 23, 2020 at 11:53 AM Ronan Pigott <rpigott314@gmail.com> wrote:

> Wow! Thank you Stephane for your analysis.
>
> So you think that a patch checking for ESRCH is not suitable?
>
> I tried poking around in Src/jobs.c in update_job where
> it sounds like this issue originates, but I'm not quite sure
> how the logic should be revised.
>
> Is there somewhere else where this bug should be reported?
>
> Thanks again for your help!
>
>
> On Sun, Feb 23, 2020 at 7:03 AM Stephane Chazelas <stephane@chazelas.org>
> wrote:
>
>> 2020-02-22 18:09:13 -0700, Ronan Pigott:
>> [...]
>> > $ sudo true
>> > [sudo] password for ronan:
>> > $ pacman -Qttdq | sudo pacman -Rns -
>> > [...]
>> > :: Do you want to remove these packages? [Y/n] zsh: done
>> >  pacman -Qttdq |
>> > zsh: suspended (tty output)  sudo pacman -Rns -
>> [...]
>>
>> Can be reproduced with:
>>
>> $ sleep 1 | sudo sh -c 'sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}"
>> /proc/self/stat /dev/tty'
>> UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
>> chazelas 25430  8308 25430 25430  0 13:44 pts/1    00:00:00 /bin/zsh
>> root     26867 25430 26866 25430  0 13:46 pts/1    00:00:00   sudo sh -c
>> sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
>> root     26868 26867 26866 25430  0 13:46 pts/1    00:00:00     sh -c
>> sleep 2; ps -jfHt "$(tty<&2)"; awk "{print \$8}" /proc/self/stat /dev/tty
>> root     26871 26868 26866 25430  0 13:46 pts/1    00:00:00       ps
>> -jfHt /dev/pts/1
>> 25430
>> zsh: done                   sleep 1 |
>> zsh: suspended (tty input)  sudo sh -c
>>
>> As seen above, at the time "ps" is run (and awk later), the
>> foreground process group of the terminal is 25430 which is the
>> pgid of the main shell, not the pgid of the foreground job
>> (26866), which is why that job gets a SIGTTIN when awk tries to
>> read from the terminal (or SIGTTOU when pacman does an ioctl to
>> the terminal).
>>
>> From "strace", it seems it's because when "sleep 1" (the process
>> group leader) finishes, zsh does a kill(-26866,0), presumably to
>> check that the process group is still alive, but that fails with
>> EPERM as there are processes running as root in that group, and
>> then zsh changes the foreground process group back to the main
>> shell's.
>>
>> So it seems indeed to be a bug in zsh.
>>
>> I suppose an easy fix would be to check for an errno of ESRCH
>> when kill(-pgid,0) fails to make sure it's because the process
>> group is gone. But, here the shell should be able to know that
>> the job is not gone as sudo, a direct children of the shell has
>> not returned yet, so there's probably something wrong with the
>> logic in the first place.
>>
>> --
>> Stephane
>>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh mysteriously suspending job with sudo
  2020-02-23 23:57     ` Ronan Pigott
@ 2020-05-14 21:15       ` Ronan Pigott
  2020-05-15  9:08         ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Ronan Pigott @ 2020-05-14 21:15 UTC (permalink / raw)
  To: Zsh Users, stephane

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

Is it possible to apply Peter's ESRCH patch to master?

I've run this patch for a while and I think many pacman users could benefit.

On Sun, Feb 23, 2020 at 4:57 PM Ronan Pigott <rpigott314@gmail.com> wrote:

> Brilliant! Thank you Peter for the patch.
>
> I have applied it to my zsh and pacman is working again.
>
> On Sun, Feb 23, 2020 at 11:53 AM Ronan Pigott <rpigott314@gmail.com>
> wrote:
>
>> Wow! Thank you Stephane for your analysis.
>>
>> So you think that a patch checking for ESRCH is not suitable?
>>
>> I tried poking around in Src/jobs.c in update_job where
>> it sounds like this issue originates, but I'm not quite sure
>> how the logic should be revised.
>>
>> Is there somewhere else where this bug should be reported?
>>
>> Thanks again for your help!
>>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh mysteriously suspending job with sudo
  2020-05-14 21:15       ` Ronan Pigott
@ 2020-05-15  9:08         ` Peter Stephenson
  2020-05-15  9:20           ` Ronan Pigott
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2020-05-15  9:08 UTC (permalink / raw)
  To: Ronan Pigott, Zsh Users, stephane

> On 14 May 2020 at 22:15 Ronan Pigott <rpigott314@gmail.com> wrote:
> Is it possible to apply Peter's ESRCH patch to master?

It's been there for a while now.

pws

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh mysteriously suspending job with sudo
  2020-05-15  9:08         ` Peter Stephenson
@ 2020-05-15  9:20           ` Ronan Pigott
  0 siblings, 0 replies; 8+ messages in thread
From: Ronan Pigott @ 2020-05-15  9:20 UTC (permalink / raw)
  To: Peter Stephenson, Zsh Users

[-- Attachment #1: Type: text/plain, Size: 321 bytes --]

Oh, indeed it is. Somehow I missed that. Thanks again!

On Fri, May 15, 2020 at 2:08 AM Peter Stephenson <
p.w.stephenson@ntlworld.com> wrote:

> > On 14 May 2020 at 22:15 Ronan Pigott <rpigott314@gmail.com> wrote:
> > Is it possible to apply Peter's ESRCH patch to master?
>
> It's been there for a while now.
>
> pws
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* zsh mysteriously suspending job with sudo
@ 2020-02-23  1:09 Ronan Pigott
  0 siblings, 0 replies; 8+ messages in thread
From: Ronan Pigott @ 2020-02-23  1:09 UTC (permalink / raw)
  To: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 2405 bytes --]

Hello zsh users,

I've encountered an unusual bug using zsh and I finally have a reliable
reproduction so I'm looking for some insight. The issue is as follows:
===
$ sudo true
[sudo] password for ronan:
$ pacman -Qttdq | sudo pacman -Rns -
[...]
:: Do you want to remove these packages? [Y/n] zsh: done
 pacman -Qttdq |
zsh: suspended (tty output)  sudo pacman -Rns -
===

When I try to pipe a package list for pacman to read from stdin, the job is
mysteriously suspended by zsh when "pacman -Rns -" prompts [Y/n] for
confirmation to continue, before I have a chance to input anything. I have
not
input the SUSP character. The issue is also reproducible from a 'zsh -f' in
a
clean environment.

Trying to 'fg' the suspended job always results in the following error:
===
$ fg[1]  + done       pacman -Qttdq |
       continued  sudo pacman -Rns -
zsh: can't set tty pgrp: no such process
===

I can only reproduce in zsh, not bash. The issue also only occurs when sudo
*does not* prompt for a password, which is why I use a "sudo true" to
temporarily authorize my user before demonstrating the issue.

Bizarrely, the following examples all work as intended:
===
# with an extra file
$ pacman -Qttdq > packages.txt; cat packages.txt | sudo pacman -Rns -
[...]
:: Do you want to remove these packages? [Y/n]
# with an extra cat
$ pacman -Qttdq | cat - | sudo pacman -Rns -
[...]
:: Do you want to remove these packages? [Y/n]
# with process substitution
$ sudo pacman -Rns - < <(pacman -Qttdq)
[...]
:: Do you want to remove these packages? [Y/n]
===

To exonerate pacman, a user from #archlinux helped create a script that
exhibits
similar behavior:
===
$ cat read-and-prompt.sh#!/bin/sh --

exec 3<&1

while IFS= read -r input; do
    printf '%s\n' "$input" >&3
done

exec < /dev/tty
printf 'prompt: ' >&2
read -r _

exec "$@" <&3
$ for a in a b c; do print "$a"; sleep 1; done | sudo ./read-and-prompt.sh
cat
a
b
c
prompt:zsh: done                   for a in a b c; do; print "$a"; sleep 1;
done |
zsh: suspended (tty input)  sudo ./read-and-prompt.sh cat
===

They also report that they cannot reproduce using doas in place of sudo.

Considering the above, I believe my issue is a bug in either sudo or zsh,
hence
why I am asking here.

Why is zsh suspending this job? My understanding is that jobs are suspended
only
in response to a signal, but I do not know where it could be generated in
this
case.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-15  9:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAG4d5Bg=vkj6PYYGVyNO3pWTh=RkT=2rTFdE_hPBj1xxz6YTSw__29611.299622205$1582420251$gmane$org@mail.gmail.com>
2020-02-23 14:03 ` zsh mysteriously suspending job with sudo Stephane Chazelas
2020-02-23 18:53   ` Ronan Pigott
2020-02-23 23:57     ` Ronan Pigott
2020-05-14 21:15       ` Ronan Pigott
2020-05-15  9:08         ` Peter Stephenson
2020-05-15  9:20           ` Ronan Pigott
2020-02-23 20:18   ` Peter Stephenson
2020-02-23  1:09 Ronan Pigott

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).