zsh-workers
 help / color / mirror / code / Atom feed
* "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
@ 2018-03-23 16:16 Stephane Chazelas
  2018-03-23 23:36 ` Bart Schaefer
  2018-03-25  7:38 ` Stephane Chazelas
  0 siblings, 2 replies; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-23 16:16 UTC (permalink / raw)
  To: Zsh hackers list

From
https://unix.stackexchange.com/questions/411688/zsh-cant-input-to-terminal-when-piping-stdin-and-stdout-with-variable-command-t

$ echo | ps -j $(:) | cat | cat | cat
  PID  PGID   SID TTY          TIME CMD
 4858  4858  4858 pts/4    00:00:02 zsh
23813 23813  4858 pts/4    00:00:00 zsh
23895 23895  4858 pts/4    00:00:00 ps
23896 23896  4858 pts/4    00:00:00 cat
23897 23897  4858 pts/4    00:00:00 cat
23898 23898  4858 pts/4    00:00:00 cat

See how they all run in different process groups (they should
all be in the same one, same job).

Note that the conditions to reproduce the problem seem to be
quite specific:

- the first pipeline component must be a builtin or compound command
- the second component must have a command substitution in the
  arguments (OK when in targets of redirections).

(tested with git head on Debian amd64)

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-23 16:16 "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups Stephane Chazelas
@ 2018-03-23 23:36 ` Bart Schaefer
  2018-03-24  5:19   ` Bart Schaefer
  2018-04-10  9:53   ` Peter Stephenson
  2018-03-25  7:38 ` Stephane Chazelas
  1 sibling, 2 replies; 41+ messages in thread
From: Bart Schaefer @ 2018-03-23 23:36 UTC (permalink / raw)
  To: Zsh hackers list

On Mar 23,  4:16pm, Stephane Chazelas wrote:
}
} $ echo | ps -j $(:) | cat | cat | cat
}   PID  PGID   SID TTY          TIME CMD
}  4858  4858  4858 pts/4    00:00:02 zsh
} 23813 23813  4858 pts/4    00:00:00 zsh
} 23895 23895  4858 pts/4    00:00:00 ps
} 23896 23896  4858 pts/4    00:00:00 cat
} 23897 23897  4858 pts/4    00:00:00 cat
} 23898 23898  4858 pts/4    00:00:00 cat
} 
} See how they all run in different process groups (they should
} all be in the same one, same job).

Hmm:
% strace -ff -p 9850 -esetpgid
Process 9850 attached - interrupt to quit
Process 10029 attached
[pid 10029] setpgid(0, 10029)           = 0
Process 10030 attached
Process 10029 detached
[pid  9850] --- SIGCHLD (Child exited) @ 0 (0) ---
Process 10030 detached
--- SIGCHLD (Child exited) @ 0 (0) ---
Process 10031 attached
[pid 10031] setpgid(0, 10029)           = -1 EPERM (Operation not permitted)
[pid 10031] setpgid(0, 10031)           = 0
Process 10032 attached
[pid 10032] setpgid(0, 10029)           = -1 EPERM (Operation not permitted)
[pid 10032] setpgid(0, 10032)           = 0
Process 10031 detached
[pid  9850] --- SIGCHLD (Child exited) @ 0 (0) ---
Process 10032 detached
--- SIGCHLD (Child exited) @ 0 (0) ---

Here 10029 is "echo" (forked to the left, as zsh does) and 10030 is the
process substitution $(:).  10031 is "ps" and 10032 is "cat".

Zsh wants to use "echo" as the group leader because it was the first to
fork, but because $(:) delayed the execution of "ps" the "echo" process
has already exited and cannot be made group leader.  As the fallback the
other processes each get made into its own process group.

So it's a race condition, adding the process substitution just happens
to expose it.

I'm not sure whether it would work to postpone reaping the would-be
group leader when that first SIGCHILD is received; can a zombie lead
a group?

Otherwise we'd have to notice that the group leader is gone and change
to the second forked process, and so on, until we found a leader that
would stick for the remainder of the forks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-23 23:36 ` Bart Schaefer
@ 2018-03-24  5:19   ` Bart Schaefer
  2018-03-24  8:05     ` Joey Pabalinas
  2018-04-10  9:53   ` Peter Stephenson
  1 sibling, 1 reply; 41+ messages in thread
From: Bart Schaefer @ 2018-03-24  5:19 UTC (permalink / raw)
  To: Zsh hackers list

On Mar 23,  4:36pm, Bart Schaefer wrote:
}
} Zsh wants to use "echo" as the group leader because it was the first to
} fork, but because $(:) delayed the execution of "ps" the "echo" process
} has already exited and cannot be made group leader.  As the fallback the
} other processes each get made into its own process group.
} 
} So it's a race condition, adding the process substitution just happens
} to expose it.

It may be this easy?

It fixes the specific example in the subject by causing the zsh parent
shell to re-assume job leader for all subsequent processes in the pipeline
if the current leader is dead.  However, I'm concerned that if the first
leader is reaped after a few processes in a long pipeline are created,
there may still be a new process group asserted for remaining processes.

On the other hand I haven't found an example where that occurs, instead
I get jobs where the process group ID is the PID of the now-dead leader.
It could be I just haven't constructed the right test.   

In any case "make check" still passes and this appears to be better than
the previous situation.

diff --git a/Src/exec.c b/Src/exec.c
index 35b0bb1..52b0eb4 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -3455,6 +3455,7 @@ execcmd_exec(Estate state, Execcmd_params eparams,
 	if (type == WC_SUBSH && !(how & Z_ASYNC))
 	    flags |= ESUB_JOB_CONTROL;
 	filelist = jobtab[thisjob].filelist;
+	list_pipe_child = list_pipe;
 	entersubsh(flags);
 	close(synch[1]);
 	forked = 1;


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24  5:19   ` Bart Schaefer
@ 2018-03-24  8:05     ` Joey Pabalinas
  2018-03-24 17:34       ` Stephane Chazelas
  2018-03-24 22:09       ` Bart Schaefer
  0 siblings, 2 replies; 41+ messages in thread
From: Joey Pabalinas @ 2018-03-24  8:05 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Joey Pabalinas, Stephane Chazelas, Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 2514 bytes --]

On Fri, Mar 23, 2018 at 10:19:59PM -0700, Bart Schaefer wrote:
> It may be this easy?
> 
> It fixes the specific example in the subject by causing the zsh parent
> shell to re-assume job leader for all subsequent processes in the pipeline
> if the current leader is dead.  However, I'm concerned that if the first
> leader is reaped after a few processes in a long pipeline are created,
> there may still be a new process group asserted for remaining processes.
> 
> On the other hand I haven't found an example where that occurs, instead
> I get jobs where the process group ID is the PID of the now-dead leader.
> It could be I just haven't constructed the right test.

It's kind of an *incredibly* unlikely edge-case but FWIW:

> $ echo | ps -j $(: $(cat)) | cat | cat | cat

The shell _seems_ to completely lock up after applying your patch; on
my zsh 5.4.2 release version there are no problems (aside from the
original process group leader issues).

This is the shell output after I `pkill -9 cat`:

> $ echo | ps -j $(: $(cat)) | cat | cat | cat
> ^C^C^C^C^Z^Z^Z^Z
> PID  PGID   SID TTY          TIME CMD
> 16007 16007 18994 pts/5    00:00:03 zsh
> 17582 17582 18994 pts/5    00:00:01 zsh
> 18994 18994 18994 pts/5    00:00:26 zsh
> 20626 17582 18994 pts/5    00:00:00 ps
> 20627 17582 18994 pts/5    00:00:00 cat
> 20629 17582 18994 pts/5    00:00:00 cat

The shell _appears_ to deadlock and accordingly ignores all user
input such as ^C or ^D (but you can, however, still externally
kill the parent zsh process group leader with any fatal signal
or `kill -9 cat` without issue). It seems it's mostly an issue
stemming from the parent zsh process still being the process
group leader; this causes cat to wait indefinitely for input
while being unable to actually receive any.

In my zsh release version 5.4.2 ^C works as expected, and to
be honest I don't actually know if this is anything worth
being concerned about.

Output from the release version (no need to ^C or ^D or anything):

> $ echo | ps -j $(: $(cat)) | cat | cat | cat
> cat: -: Input/output error
> PID  PGID   SID TTY          TIME CMD
> 21366 21366 27814 pts/12   00:00:00 ps
> 21367 21367 27814 pts/12   00:00:00 cat
> 21368 21368 27814 pts/12   00:00:00 cat
> 27814 27814 27814 pts/12   00:00:03 zsh

The "cat: -: Input/output error" and differing number of entries is kind
interesting, although I would be lying if I said that I understood why.

-- 
Cheers,
Joey Pabalinas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24  8:05     ` Joey Pabalinas
@ 2018-03-24 17:34       ` Stephane Chazelas
  2018-03-24 22:23         ` Bart Schaefer
  2018-03-27 10:17         ` Stephane Chazelas
  2018-03-24 22:09       ` Bart Schaefer
  1 sibling, 2 replies; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-24 17:34 UTC (permalink / raw)
  To: Joey Pabalinas; +Cc: Bart Schaefer, Zsh hackers list

2018-03-23 22:05:14 -1000, Joey Pabalinas:
[...]
> The "cat: -: Input/output error" and differing number of entries is kind
> interesting, although I would be lying if I said that I understood why.
[...]

Yes, I should have mentioned that the initial problem was that
some of the processes in that command line are not running in
foreground, are not in the foreground process group of the
terminal.

The initial problem at unix.stackexchange.com was on something
like

echo x | $(echo vipe) | cat

or

echo x | vipe $(:) | cat

And vipe (running an editor to edit the pipe) is suspended when
trying to read from the terminal while it's not in the
foreground process group of the terminal.

And:

echo x | ps -j $(:) | cat | cat | cat

shows that all those ps/cat processes running in different
process groups instead of just one process group as one would
expect.

Bart explained that it was because zsh uses the first spawned
process (here to run a builtin) for the process group, makes
that the foreground process group, but by the time it spawns the
other processes, that first process has already returned and has
been waited for so zsh can't make the other processes join the
foreground process group.

I think here, zsh should not wait for that process until all the
processes in the jobs have been started and joined the process
group if that's what from a now-zombie process.

>From stracing mksh, it looks like that's what it does. Stracing
ksh93, I can't seem to make it have the first process end before
the other ones are started, I'm not sure why that is.

That in a | b $(c) | d
c is not run in the same process group as the rest is probably a
separate though related issue I'd say.

That means, that one can't ^C

sleep 1 | cat $(sleep 10)

for instance.


-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24  8:05     ` Joey Pabalinas
  2018-03-24 17:34       ` Stephane Chazelas
@ 2018-03-24 22:09       ` Bart Schaefer
  2018-04-10 11:45         ` Peter Stephenson
  1 sibling, 1 reply; 41+ messages in thread
From: Bart Schaefer @ 2018-03-24 22:09 UTC (permalink / raw)
  To: Zsh hackers list

On Mar 23, 10:05pm, Joey Pabalinas wrote:
}
} > It could be I just haven't constructed the right test.
} 
} It's kind of an *incredibly* unlikely edge-case but FWIW:
} 
} > $ echo | ps -j $(: $(cat)) | cat | cat | cat
} 
} The shell _seems_ to completely lock up after applying your patch; on
} my zsh 5.4.2 release version there are no problems (aside from the
} original process group leader issues).

Hmm.  The parent shell has received a SIGTTIN signal during readoutput()
even though the descriptor from which it is reading is not a TTY.

This is because the tty process group is still that of the deceased job
leader, and $(cat) is in the new process group owned by the parent, so
when $(cat) is stopped the process group stops as well.

The parent needs to reattach to the terminal when reaping the group
leader, so that $(cat) doesn't get SIGTTYIN and so that the parent
begins receiving tty interrupts etc. again.

I'm not yet sure exactly where that should happen.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24 17:34       ` Stephane Chazelas
@ 2018-03-24 22:23         ` Bart Schaefer
  2018-03-25  7:00           ` Stephane Chazelas
  2018-03-25  7:48           ` Stephane Chazelas
  2018-03-27 10:17         ` Stephane Chazelas
  1 sibling, 2 replies; 41+ messages in thread
From: Bart Schaefer @ 2018-03-24 22:23 UTC (permalink / raw)
  To: Zsh hackers list

On Mar 24,  5:34pm, Stephane Chazelas wrote:
} Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components in diff
}
} Bart explained that it was because zsh uses the first spawned
} process (here to run a builtin) for the process group, makes
} other processes, that first process has already returned and has
} been waited for so zsh can't make the other processes join the
} foreground process group.
} 
} I think here, zsh should not wait for that process until all the
} processes in the jobs have been started and joined the process
} group if that's what from a now-zombie process.

That's going to require a lot of mucking about, because currently
all process reaping is done in signal handlers.  The shell only
explicitly waits for the rightmost process in a pipeline.  The
SIGCHLD can't be blocked because then the exit of $(cat) in the
second pipe step could not be captured, so the handler *will* be
called when "echo" exits.

Also once that process has exited, setpgrp() won't make it a group
leader any longer.  Zsh would have to block "echo" from *starting*
until the rest of the pipeline was built.

This is all because most shells fork both sides of a pipeline,
whereas zsh attempts to only fork the left side so that a builtin
loop on the right can capture the output into the current shell.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24 22:23         ` Bart Schaefer
@ 2018-03-25  7:00           ` Stephane Chazelas
  2018-03-25  8:26             ` Stephane Chazelas
  2018-03-25  7:48           ` Stephane Chazelas
  1 sibling, 1 reply; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-25  7:00 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2018-03-24 15:23:57 -0700, Bart Schaefer:
> On Mar 24,  5:34pm, Stephane Chazelas wrote:
> } Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components in diff
> }
> } Bart explained that it was because zsh uses the first spawned
> } process (here to run a builtin) for the process group, makes
> } other processes, that first process has already returned and has
> } been waited for so zsh can't make the other processes join the
> } foreground process group.
> } 
> } I think here, zsh should not wait for that process until all the
> } processes in the jobs have been started and joined the process
> } group if that's what from a now-zombie process.
> 
> That's going to require a lot of mucking about, because currently
> all process reaping is done in signal handlers.  The shell only
> explicitly waits for the rightmost process in a pipeline.  The
> SIGCHLD can't be blocked because then the exit of $(cat) in the
> second pipe step could not be captured, so the handler *will* be
> called when "echo" exits.
[...]

Note that the fact that in

a | b $(c) | D

The $(c) is done by the top-level process, that c is a child of
the top-level shell process as opposed to the process that will
eventually execute b is unique to zsh (and is complicating
things here).

That also causes some surprises like in 

$ set +o multios
$ (sleep 1; echo A >&2) | echo B $(sleep 2) >&2 | echo C
A
B
C

("echo C" not started before "sleep 2" finishes) or:

Or (though some would find the zsh behaviour less surprising):

$ n=1; echo $((++n)) >&2 | echo $((++n)) >&2 | echo $n
2
3
3

(2, 2, 1 in any order in other shells)

See also:

$ set +o multios
$ echo A >&2 | echo $((+)) | echo B
A
zsh: bad math expression: operand expected at end of string
[1]    done       echo A >&2

"echo A" run but not "echo B" and message about a background
job returning.

$ sleep 10 | echo $((+)) | echo B
zsh: bad math expression: operand expected at end of string
$ jobs
[1]    running    sleep 10
$ fg
fg: no current job # fg %1 works
$
[1]    done       sleep 10

IIRC, that was discussed not so long ago on the austin-group
mailing list (which process should perform the expansions in
pipelines) though I didn't follow the discussion.

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-23 16:16 "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups Stephane Chazelas
  2018-03-23 23:36 ` Bart Schaefer
@ 2018-03-25  7:38 ` Stephane Chazelas
  1 sibling, 0 replies; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-25  7:38 UTC (permalink / raw)
  To: Zsh hackers list

2018-03-23 16:16:13 +0000, Stephane Chazelas:
[...]
> - the first pipeline component must be a builtin or compound command

To clarify, as explained by Bart, it doesn't have to be builtin.
It's more easily reproducible with builtins as they exit quickly
but one just needs a command that exits before the command
substitution as in:

sleep 1 | ps -j $(sleep 2) | cat|cat|cat...

Another side effect is that

echo test | tee $(uname) | sleep 10 | cat

cannot be aborted with ^C as only "cat" is in foreground (echo
gone, tee (soon gone too) and sleep on their own process group).
Also, ^Z suspends cat only causing a deadlock.

That's probably a worse problem and could explain the similar
lock ups I sometimes experience in zsh.


> - the second component must have a command substitution in the
>   arguments (OK when in targets of redirections).
[...]

Following-up on my previous email, that means:

$ n=1; echo $((++n)) | cat; echo $n
2
2
$ n=1; echo > $((++n)) | cat; echo $n
1

Which looks a bit inconsistent.

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24 22:23         ` Bart Schaefer
  2018-03-25  7:00           ` Stephane Chazelas
@ 2018-03-25  7:48           ` Stephane Chazelas
  2018-03-25 18:09             ` Stephane Chazelas
  1 sibling, 1 reply; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-25  7:48 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

2018-03-24 15:23:57 -0700, Bart Schaefer:
[...]
> Zsh would have to block "echo" from *starting*
> until the rest of the pipeline was built.

Note that it looks as if that's what ksh93 is doing, because
looking at strace -f, and even if I add a hundred |cat, echo's
write() is only done after all the processes have been started.

As I said earlier, while looking at strace output, I don't
understand how that works. Unless I'm missing something, it's as
if it was not done with syscalls. But then what could it be?
Busy loop on shared memory? One would have to look at the code I
suppose.

Note that with ksh93, like in other shells, one can't delay the
creation of the other processes with command substitutions as
they are not done in the main process.

I suppose pipes could be used. Is it not what zsh already does
in other contexts?

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-25  7:00           ` Stephane Chazelas
@ 2018-03-25  8:26             ` Stephane Chazelas
  0 siblings, 0 replies; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-25  8:26 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

2018-03-25 08:00:53 +0100, Stephane Chazelas:
[...]
> That also causes some surprises like in 
[...]
> Or (though some would find the zsh behaviour less surprising):
> 
> $ n=1; echo $((++n)) >&2 | echo $((++n)) >&2 | echo $n
> 2
> 3
> 3
[...]

Or:

$ true; true | true $(false) | echo $?
1
$ false; echo $(:) | echo $?
0

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-25  7:48           ` Stephane Chazelas
@ 2018-03-25 18:09             ` Stephane Chazelas
  0 siblings, 0 replies; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-25 18:09 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

2018-03-25 08:48:27 +0100, Stephane Chazelas:
[...]
> Note that with ksh93, like in other shells, one can't delay the
> creation of the other processes with command substitutions as
> they are not done in the main process.
[...]

Well, no that's not true. Since like in zsh, the last pipe
component is not evaluated in a child, in:

sleep 1 | { ps -j; sleep 2; ps -j; }

the second ps -j will be started while the original job process
group leader and everybody else in the group is gone, so has to
be in a different group. I find that ksh93 uses the shell's
process group in that case:

ksh93$ alias sleep=/bin/sleep # otherwise sleep is the builtin one
ksh93$ sleep 1 | { ps -j; sleep 2; ps -j; }
  PID  PGID   SID TTY          TIME CMD
 7765  7765  7765 pts/6    00:00:00 zsh
 7771  7771  7765 pts/6    00:00:00 ksh93
20782 20782  7765 pts/6    00:00:00 sleep
20783 20782  7765 pts/6    00:00:00 ps
  PID  PGID   SID TTY          TIME CMD
 7765  7765  7765 pts/6    00:00:00 zsh
 7771  7771  7765 pts/6    00:00:00 ksh93
20785  7771  7765 pts/6    00:00:00 ps


I find that a  sleep 1 | { sleep 2; echo go; sleep 10; } cannot
be suspended after "go". I've seen ksh93 also become very
confused and claiming "no job control" in one situation.

Do we agree that in:

{ a; b; c & } | d $(e) > $(f) | { g $(h); i; }

All those processes should ideally run in the same process group
as they're meant to be part of the same job? Or at least that at
any given time, all running processes in there should be in the
same process group and that process group be the foreground one?

process substitution probably have to be different. AFAICT, in
all shells, they run in background. Some even let you access
them with $! (or $apids in rc) and wait for them. There's also
the exec 3> >(cmd) where clearly cmd has to be in background.

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24 17:34       ` Stephane Chazelas
  2018-03-24 22:23         ` Bart Schaefer
@ 2018-03-27 10:17         ` Stephane Chazelas
  1 sibling, 0 replies; 41+ messages in thread
From: Stephane Chazelas @ 2018-03-27 10:17 UTC (permalink / raw)
  To: Joey Pabalinas, Bart Schaefer, Zsh hackers list

2018-03-24 17:34:06 +0000, Stephane Chazelas:
[...]
> That means, that one can't ^C
> 
> sleep 1 | cat $(sleep 10)
> 
> for instance.
[...]

For the record, while

/bin/sleep 1 | cat $(/bin/sleep 10) | cat

can be interrupted in ksh93,

ksh93$ /bin/sleep 1 | cat $(ps -jf>&2) $(/bin/sleep 20)
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
chazelas 16259  5079 16259 16259  0 10:58 pts/7    00:00:00 /bin/zsh
chazelas 16281 16259 16281 16259  0 10:59 pts/7    00:00:00 ksh93 -o emacs
chazelas 16355 16281 16355 16259  0 11:02 pts/7    00:00:00 /bin/sleep 1
chazelas 16356 16281 16281 16259  0 11:02 pts/7    00:00:00 ps -jf
chazelas 16357 16281 16281 16259  0 11:02 pts/7    00:00:00 /bin/sleep 20

Cannot (see how ps -j and sleep 20 run (concurrently!) in a
different process group from sleep 1) like in zsh (and a ^Z
while sleep 1 is running causes a deadlock).

One difference with zsh is that it can't be worked around by
running that last pipe component in a subshell (as ksh93 fakes
subshells instead of forking).

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-23 23:36 ` Bart Schaefer
  2018-03-24  5:19   ` Bart Schaefer
@ 2018-04-10  9:53   ` Peter Stephenson
  1 sibling, 0 replies; 41+ messages in thread
From: Peter Stephenson @ 2018-04-10  9:53 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, 23 Mar 2018 16:36:14 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Mar 23,  4:16pm, Stephane Chazelas wrote:
> }
> } $ echo | ps -j $(:) | cat | cat | cat
> }   PID  PGID   SID TTY          TIME CMD
> }  4858  4858  4858 pts/4    00:00:02 zsh
> } 23813 23813  4858 pts/4    00:00:00 zsh
> } 23895 23895  4858 pts/4    00:00:00 ps
> } 23896 23896  4858 pts/4    00:00:00 cat
> } 23897 23897  4858 pts/4    00:00:00 cat
> } 23898 23898  4858 pts/4    00:00:00 cat
> } 
> } See how they all run in different process groups (they should
> } all be in the same one, same job).
>...
> Zsh wants to use "echo" as the group leader because it was the first
> to fork, but because $(:) delayed the execution of "ps" the "echo"
> process has already exited and cannot be made group leader.  As the
> fallback the other processes each get made into its own process group.
>...
> Otherwise we'd have to notice that the group leader is gone and change
> to the second forked process, and so on, until we found a leader that
> would stick for the remainder of the forks.

The following patch gives me the same with and without the $(:).
Changing gleader ASAP seems the only sane thing to do --- else we end up
in a maze of forks and none of the subshells know what's going on.

But I'm a bit worried that we're going to end up with something
following the original leader and something not?  If that did happen,
does it mean the pgrp is still valid, and if so can we test that without
actually changing it?  Or does it mean there's no zsh-specific problem
because those earlier processes are pgrp orphans in any case?

As the setpgrp() happens in subprocess I don't think there's any future
in recording what processes have set the process group already.  Surely
this must be a "well-known problem" with a "well-known" solution?

I'm not convinced Joey's case is actually (directly) related to this (or
rather, to anything fixed by this or Bart's change --- it's obviously
somewhere in the same area).

% echo | ps -j $(:) | cat | cat | cat
  PID  PGID   SID TTY          TIME CMD
18367 18367 18367 pts/4    00:00:07 zsh
20179 20179 18367 pts/4    00:00:02 zsh
20199 20199 18367 pts/4    00:00:00 ps
20200 20199 18367 pts/4    00:00:00 cat
20201 20199 18367 pts/4    00:00:00 cat
20202 20199 18367 pts/4    00:00:00 cat
% echo | ps -j | cat | cat | cat
  PID  PGID   SID TTY          TIME CMD
18367 18367 18367 pts/4    00:00:07 zsh
20179 20179 18367 pts/4    00:00:02 zsh
20204 20203 18367 pts/4    00:00:00 ps
20205 20203 18367 pts/4    00:00:00 cat
20206 20203 18367 pts/4    00:00:00 cat
20207 20203 18367 pts/4    00:00:00 cat

diff --git a/Src/signals.c b/Src/signals.c
index 94f379e..5917683 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -537,6 +537,18 @@ wait_for_processes(void)
 #else
 		update_process(pn, status);
 #endif
+		if (pn->pid == jn->gleader) {
+		    Process pnn = pn->next;
+		    while (pnn) {
+			if (pnn->status == SP_RUNNING) {
+			    jn->gleader = pnn->pid;
+			    break;
+			}
+			pnn = pnn->next;
+		    }
+		    if (pn->pid == jn->gleader)
+			jn->gleader = 0;
+		}
 	    }
 	    update_job(jn);
 	} else if (findproc(pid, &jn, &pn, 1)) {

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-03-24 22:09       ` Bart Schaefer
@ 2018-04-10 11:45         ` Peter Stephenson
  2018-04-10 13:59           ` Peter Stephenson
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-10 11:45 UTC (permalink / raw)
  To: Zsh hackers list

On Sat, 24 Mar 2018 15:09:44 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Mar 23, 10:05pm, Joey Pabalinas wrote:
> }
> } > It could be I just haven't constructed the right test.
> } 
> } It's kind of an *incredibly* unlikely edge-case but FWIW:
> } 
> } > $ echo | ps -j $(: $(cat)) | cat | cat | cat
> } 
> } The shell _seems_ to completely lock up after applying your patch;
> on } my zsh 5.4.2 release version there are no problems (aside from
> the } original process group leader issues).
> 
> Hmm.  The parent shell has received a SIGTTIN signal during
> readoutput() even though the descriptor from which it is reading is
> not a TTY.
> 
> This is because the tty process group is still that of the deceased
> job leader, and $(cat) is in the new process group owned by the
> parent, so when $(cat) is stopped the process group stops as well.
> 
> The parent needs to reattach to the terminal when reaping the group
> leader, so that $(cat) doesn't get SIGTTYIN and so that the parent
> begins receiving tty interrupts etc. again.

The following is all a bit tentative...

If we allow a simpler version of the patch I just posted --- where we
always make the next process to start the new process group leader and
shrug our shoulders over anything that's started so far --- then
reattaching looks simple enough.  The following's probably not
sophisticated enough yet since we should only take back the terminal if
the exited group leader was in the foreground.

But looking at ps output while ^C is failing to have any effect suggests
things are worse than that.  I think at this point the parent shell
hasn't actually reaped the echo (it's <defunct>), so hasn't executed
this chunk of code.  That's got something to do with the fact that it's
got child reaping blocked while trying to read output from the the $(:
...) process --- it's executing the same code as the forked shell
waiting for the cat.  However, even simple-minded child unblocking
didn't seem to cause the zombie to go away, at which point I got stuck.

(The following change isn't going anyway near the main repository at the
moment, obviously.)

pws

diff --git a/Src/signals.c b/Src/signals.c
index 94f379e..f86ae54 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -537,6 +537,11 @@ wait_for_processes(void)
 #else
 		update_process(pn, status);
 #endif
+		if (pn->pid == jn->gleader) {
+		    jn->gleader = 0;
+		    /* HERE: check this was actually in foreground... */
+		    attachtty(mypgrp);
+		}
 	    }
 	    update_job(jn);
 	} else if (findproc(pid, &jn, &pn, 1)) {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-10 11:45         ` Peter Stephenson
@ 2018-04-10 13:59           ` Peter Stephenson
  2018-04-11 22:10             ` Bart Schaefer
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-10 13:59 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, 10 Apr 2018 12:45:45 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:

> On Sat, 24 Mar 2018 15:09:44 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > On Mar 23, 10:05pm, Joey Pabalinas wrote:
> > }
> > } > It could be I just haven't constructed the right test.
> > } 
> > } It's kind of an *incredibly* unlikely edge-case but FWIW:
> > } 
> > } > $ echo | ps -j $(: $(cat)) | cat | cat | cat
> > } 
> > } The shell _seems_ to completely lock up after applying your patch;
> > on } my zsh 5.4.2 release version there are no problems (aside from
> > the } original process group leader issues).
> 
>... [Stuff about the process group aspects I'm not discussing here] ...

I think all this still applies, but I've nothing new to add, so I've
edited it down to keep the messages manageable.

> But looking at ps output while ^C is failing to have any effect
> suggests things are worse than that.  I think at this point the
> parent shell hasn't actually reaped the echo (it's <defunct>), so
> hasn't executed this chunk of code.  That's got something to do with
> the fact that it's got child reaping blocked while trying to read
> output from the the $(: ...) process --- it's executing the same code
> as the forked shell waiting for the cat.  However, even simple-minded
> child unblocking didn't seem to cause the zombie to go away, at which
> point I got stuck.

OK, the following change --- also too simple for the real world, so
treat as proof of concept --- *does* make this case interruptible.  I
was forgetting that child_unblock() wasn't enough, you need to ensure
signals aren't queued.

So I suppose we need a way of making this safe(r)...

pws

diff --git a/Src/exec.c b/Src/exec.c
index e154d12..c22d37d 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -4543,11 +4543,15 @@ getoutput(char *cmd, int qt)
 	return NULL;
     } else if (pid) {
 	LinkList retval;
+	int q = queue_signal_level();
 
 	zclose(pipes[1]);
+	dont_queue_signals();
+	child_unblock();
 	retval = readoutput(pipes[0], qt, NULL);
 	fdtable[pipes[0]] = FDT_UNUSED;
 	waitforpid(pid, 0);		/* unblocks */
+	restore_queue_signals(q);
 	lastval = cmdoutval;
 	return retval;
     }
diff --git a/Src/signals.c b/Src/signals.c
index 94f379e..f86ae54 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -537,6 +537,11 @@ wait_for_processes(void)
 #else
 		update_process(pn, status);
 #endif
+		if (pn->pid == jn->gleader) {
+		    jn->gleader = 0;
+		    /* HERE: check this was actually in foreground... */
+		    attachtty(mypgrp);
+		}
 	    }
 	    update_job(jn);
 	} else if (findproc(pid, &jn, &pn, 1)) {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-10 13:59           ` Peter Stephenson
@ 2018-04-11 22:10             ` Bart Schaefer
  2018-04-12 16:23               ` Peter Stephenson
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Schaefer @ 2018-04-11 22:10 UTC (permalink / raw)
  To: Zsh hackers list

On Apr 10,  2:59pm, Peter Stephenson wrote:
} Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components in diff
}
} +		if (pn->pid == jn->gleader) {
} +		    jn->gleader = 0;
} +		    /* HERE: check this was actually in foreground... */
} +		    attachtty(mypgrp);
} +		}

I think attachtty() will simply fail here if mypgrp is not already in
the foreground, but I confess not to be 100% certain.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-11 22:10             ` Bart Schaefer
@ 2018-04-12 16:23               ` Peter Stephenson
  2018-04-15 16:23                 ` Stephane Chazelas
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-12 16:23 UTC (permalink / raw)
  To: Zsh hackers list

On Wed, 11 Apr 2018 15:10:24 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Apr 10,  2:59pm, Peter Stephenson wrote:
> } Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components
> in diff }
> } +		if (pn->pid == jn->gleader) {
> } +		    jn->gleader = 0;
> } +		    /* HERE: check this was actually in
> foreground... */ } +		    attachtty(mypgrp);
> } +		}
> 
> I think attachtty() will simply fail here if mypgrp is not already in
> the foreground, but I confess not to be 100% certain.

The man page suggests it's SIGTTOU time, though this isn't my natural
territory either.

I think the right test is for STAT_NOSTTY, which is the flag associated
with running in the background.  (Paranoid checK: we set this with
"bg" as well as "&".)

Moving the unqueuing inside readoutput() both catches other similar
cases and allows us to continue protecting memory allocation while
allowing signals around the blocking read.

So I think this is a definite improvement, though it's probably not safe
enough to sneak into 5.5.1.

pws

diff --git a/Src/exec.c b/Src/exec.c
index e154d12..65ce348 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -4576,10 +4576,20 @@ readoutput(int in, int qt, int *readerror)
     char *buf, *ptr;
     int bsiz, c, cnt = 0;
     FILE *fin;
+    int q = queue_signal_level();
 
     fin = fdopen(in, "r");
     ret = newlinklist();
     ptr = buf = (char *) hcalloc(bsiz = 64);
+    /*
+     * We need to be sensitive to SIGCHLD else we can be
+     * stuck forever with important processes unreaped.
+     * The case that triggered this was where the exiting
+     * process is group leader of the foreground process and we need
+     * to reclaim the terminal else ^C doesn't work.
+     */
+    dont_queue_signals();
+    child_unblock();
     while ((c = fgetc(fin)) != EOF || errno == EINTR) {
 	if (c == EOF) {
 	    errno = 0;
@@ -4592,13 +4602,18 @@ readoutput(int in, int qt, int *readerror)
 	    cnt++;
 	}
 	if (++cnt >= bsiz) {
-	    char *pp = (char *) hcalloc(bsiz *= 2);
+	    char *pp;
+	    queue_signals();
+	    pp = (char *) hcalloc(bsiz *= 2);
+	    dont_queue_signals();
 
 	    memcpy(pp, buf, cnt - 1);
 	    ptr = (buf = pp) + cnt - 1;
 	}
 	*ptr++ = c;
     }
+    child_block();
+    restore_queue_signals(q);
     if (readerror)
 	*readerror = ferror(fin) ? errno : 0;
     fclose(fin);
diff --git a/Src/signals.c b/Src/signals.c
index 94f379e..2a6aa3f 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -537,6 +537,19 @@ wait_for_processes(void)
 #else
 		update_process(pn, status);
 #endif
+		if (pn->pid == jn->gleader) {
+		    jn->gleader = 0;
+		    if (!(jn->stat & STAT_NOSTTY)) {
+			/*
+			 * This PID was in control of the terminal;
+			 * reclaim terminal now it has exited.
+			 * It's still possible some future forked
+			 * process of this job will become group
+			 * leader, however.
+			 */
+			attachtty(mypgrp);
+		    }
+		}
 	    }
 	    update_job(jn);
 	} else if (findproc(pid, &jn, &pn, 1)) {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-12 16:23               ` Peter Stephenson
@ 2018-04-15 16:23                 ` Stephane Chazelas
  2018-04-15 17:38                   ` Bart Schaefer
  0 siblings, 1 reply; 41+ messages in thread
From: Stephane Chazelas @ 2018-04-15 16:23 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

2018-04-12 17:23:42 +0100, Peter Stephenson:
[...]
> So I think this is a definite improvement, though it's probably not safe
> enough to sneak into 5.5.1.
[...]

Thanks, that's definitely an improvement indeed.

There are still a few problems. For instance, it takes 3 ^C to
interrupt:

sleep 10 $(sleep 11) | sleep 12 $(sleep 13) | sleep 14 $(sleep 15) | sleep 16 $(sleep 17)

The first one kills "sleep 11" (running in the shell's process
group), after which "sleep 10" and "sleep 13" start, the second
kills "sleep 10", the 3rd kills "sleep 13" and the command seems
to be aborted (sleep 12 and sleep 14 are not started for some
reason).

^Z still causes an unrecoverable dead-lock (the shell reading
the output of a suspended "sleep 11").

Also note that

echo $(sleep 10) & echo started

Only outputs "started" after 10 seconds.

sleep 10 | echo $((+)) | cat

still starts a pseudo-background job.

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-15 16:23                 ` Stephane Chazelas
@ 2018-04-15 17:38                   ` Bart Schaefer
  2018-04-15 18:58                     ` Stephane Chazelas
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Schaefer @ 2018-04-15 17:38 UTC (permalink / raw)
  To: Peter Stephenson, Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 400 bytes --]

On Sun, Apr 15, 2018 at 9:23 AM, Stephane Chazelas <
stephane.chazelas@gmail.com> wrote:

>
> Also note that
>
> echo $(sleep 10) & echo started
>
> Only outputs "started" after 10 seconds.
>

I would say that's actually correct.  $(sleep 10) is *not* a background
job, it's output has to be collected before the first "echo" can run.  The
shell has no intrinsic idea that there won't be any output.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-15 17:38                   ` Bart Schaefer
@ 2018-04-15 18:58                     ` Stephane Chazelas
  2018-04-17  5:39                       ` Bart Schaefer
  0 siblings, 1 reply; 41+ messages in thread
From: Stephane Chazelas @ 2018-04-15 18:58 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Peter Stephenson, Zsh hackers list

2018-04-15 10:38:07 -0700, Bart Schaefer:
> On Sun, Apr 15, 2018 at 9:23 AM, Stephane Chazelas <
> stephane.chazelas@gmail.com> wrote:
> 
> >
> > Also note that
> >
> > echo $(sleep 10) & echo started
> >
> > Only outputs "started" after 10 seconds.
> >
> 
> I would say that's actually correct.  $(sleep 10) is *not* a background
> job, it's output has to be collected before the first "echo" can run.  The
> shell has no intrinsic idea that there won't be any output.

But it could do that collection in the child process like all
other shells do.

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-15 18:58                     ` Stephane Chazelas
@ 2018-04-17  5:39                       ` Bart Schaefer
  2018-04-17  9:19                         ` Peter Stephenson
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Schaefer @ 2018-04-17  5:39 UTC (permalink / raw)
  To: zsh-workers

On Apr 15,  7:58pm, Stephane Chazelas wrote:
}
} > > echo $(sleep 10) & echo started
} > >
} > > Only outputs "started" after 10 seconds.
} > 
} > I would say that's actually correct. $(sleep 10) is *not* a
} > background job, it's output has to be collected
} 
} But it could do that collection in the child process like all
} other shells do.

Hmm.  At first I was going to assert that this would require a lot
of changes to order of operations in exec.c, but then I noticed
this:

% time echo $(sleep 5; echo finished) & echo started
[1] 31864
started
% finished

[1]  + done       time echo $(sleep 5; echo finished)
% 

The builtin "time" (keyword really) does not time other builtins,
because they don't fork.  So it's a silent no-op in this example,
which might be an issue all its own.  However, it still introduces
an extra bit of wordcode.

What it comes down to is that "time", being a keyword, is found at
exec.c:3077 to have no "args" list, so prefork() is not invoked.
Instead the rest of the expression is in the Estate.  The whole
thing gets backgrounded and then exectime() unpacks the state and
re-calls execpline()

When "time" is not there [or has been stepped over via exectime()],
prefork() is immediately called and the substitution is waited for.

So if, when we determine that "&" is the command separator, we could
treat the command in the way the "time" prefix does, this would all
work out without mangling execcmd_exec() and prefork().

However, there are some other unique zsh-isms that would be altered
by this.  For example:

% echo ${foo::=bar} & echo $foo
[1] 31940
bar
% bar

[1]  + done       echo ${foo::=bar}
% unset foo
% time echo ${foo::=bar} & echo $foo
[1] 31943

% bar

[1]  + done       time echo ${foo::=bar}
% 

Note the assignment "sticks" in the current shell in the first case,
because it happens before the fork, but is lost in the second case.
Perhaps the former is a bug as well.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-17  5:39                       ` Bart Schaefer
@ 2018-04-17  9:19                         ` Peter Stephenson
  2018-04-17 16:09                           ` Bart Schaefer
  2018-04-18  6:29                           ` Stephane Chazelas
  0 siblings, 2 replies; 41+ messages in thread
From: Peter Stephenson @ 2018-04-17  9:19 UTC (permalink / raw)
  To: zsh-workers

On Mon, 16 Apr 2018 22:39:10 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> So if, when we determine that "&" is the command separator, we could
> treat the command in the way the "time" prefix does, this would all
> work out without mangling execcmd_exec() and prefork().

That's good, I think that's probably worth doing.

> However, there are some other unique zsh-isms that would be altered
> by this.  For example:
> 
> % echo ${foo::=bar} & echo $foo
> [1] 31940
> bar
> % bar
> 
> [1]  + done       echo ${foo::=bar}
> % unset foo
> % time echo ${foo::=bar} & echo $foo
> [1] 31943
> 
> % bar
> 
> [1]  + done       time echo ${foo::=bar}
> % 
> 
> Note the assignment "sticks" in the current shell in the first case,
> because it happens before the fork, but is lost in the second case.
> Perhaps the former is a bug as well.

It certainly looks rather unexpected.  I think most of us staring at it
would assume the entire command line was evaluated in the background.
It's hard to believe anyone is deliberately relying on this behaviour.
I think it would be fine to change with a note about the
incompatibility.

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-17  9:19                         ` Peter Stephenson
@ 2018-04-17 16:09                           ` Bart Schaefer
  2018-04-17 16:27                             ` Bart Schaefer
  2018-04-17 16:35                             ` Peter Stephenson
  2018-04-18  6:29                           ` Stephane Chazelas
  1 sibling, 2 replies; 41+ messages in thread
From: Bart Schaefer @ 2018-04-17 16:09 UTC (permalink / raw)
  To: zsh-workers

On Apr 17, 10:19am, Peter Stephenson wrote:
} Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components in diff
}
} On Mon, 16 Apr 2018 22:39:10 -0700
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > So if, when we determine that "&" is the command separator, we could
} > treat the command in the way the "time" prefix does, this would all
} > work out without mangling execcmd_exec() and prefork().
} 
} That's good, I think that's probably worth doing.
} 
} I think it would be fine to change with a note about the
} incompatibility.

Glad to hear that, however, I have very few clues about zsh wordcode (or
in particular about how to put something into the wordcode back at the
beginning of the command when we reach the separator at the end) so I'm
not likely to be able to change this myself.

There's also workers/42233 and related thread, if we're already going
to be fiddling with wordcode.  (Limitations on the size of a script it
is possible to zcompile.)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-17 16:09                           ` Bart Schaefer
@ 2018-04-17 16:27                             ` Bart Schaefer
  2018-04-17 16:35                             ` Peter Stephenson
  1 sibling, 0 replies; 41+ messages in thread
From: Bart Schaefer @ 2018-04-17 16:27 UTC (permalink / raw)
  To: zsh-workers

On Apr 17,  9:09am, Bart Schaefer wrote:
} Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components in diff
}
} On Apr 17, 10:19am, Peter Stephenson wrote:
} } Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components in diff
} }
} } On Mon, 16 Apr 2018 22:39:10 -0700
} } Bart Schaefer <schaefer@brasslantern.com> wrote:
} } > So if, when we determine that "&" is the command separator, we could
} } > treat the command in the way the "time" prefix does, this would all
} } > work out without mangling execcmd_exec() and prefork().
} } 
} } That's good, I think that's probably worth doing.

It occurs to me that doing this with "|" as well as "&" might resolve
some other problems, but (testing just by scattering otherwise useless
"time" around) it doesn't seem to fix all the problems noted in the
earlier messages on this thread.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-17 16:09                           ` Bart Schaefer
  2018-04-17 16:27                             ` Bart Schaefer
@ 2018-04-17 16:35                             ` Peter Stephenson
  2018-04-17 17:52                               ` Bart Schaefer
  1 sibling, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-17 16:35 UTC (permalink / raw)
  To: zsh-workers

On Tue, 17 Apr 2018 09:09:26 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Apr 17, 10:19am, Peter Stephenson wrote:
> } Subject: Re: "echo | ps -j $(:) | cat | cat | cat" runs components
> in diff }
> } On Mon, 16 Apr 2018 22:39:10 -0700
> } Bart Schaefer <schaefer@brasslantern.com> wrote:
> } > So if, when we determine that "&" is the command separator, we
> could } > treat the command in the way the "time" prefix does, this
> would all } > work out without mangling execcmd_exec() and prefork().
> } 
> } That's good, I think that's probably worth doing.
> } 
> } I think it would be fine to change with a note about the
> } incompatibility.
> 
> Glad to hear that, however, I have very few clues about zsh wordcode
> (or in particular about how to put something into the wordcode back
> at the beginning of the command when we reach the separator at the
> end) so I'm not likely to be able to change this myself.

It's the whole structure of the parsing rather than the resulting word
code (though of course the effect is encoded there).  When "time" is
parsed we're at the level of a command, but instead of treating the
arguments as a command (which wouldn't work as we need to time am entire
pipeline), the parser recursively parses a nested pipeline (essentially
--- it's actually called "sublist2" in the parser because it can have
"coproc" or "!" in front, which also apply to the whole pipeline).  The
fork then happens at the level of the "time" command handling ---
because the "&" is parsed at the level of the list that includes the
time and its arguments, not the nested pipeline --- leaving the entire
pipeline after it to be run in the subshell from within the execution
environment for the time command (exectime()).

The ampersand is therefore parsed rather late to be able to see you need
this structure.  So it probably needs some other trick --- a different
list marker that causes a special null command akin to time to do the
fork, for example, as it's easy to update word code tokens when the
structure doesn't change.  The logic would then be within the exec code,
based on detecting the new token, rather than by restructuring the word
code.  Not actually very neat internally.

> There's also workers/42233 and related thread, if we're already going
> to be fiddling with wordcode.  (Limitations on the size of a script it
> is possible to zcompile.)

I think we demonstrated fairly conclusively that no one has the faintest
clue what's happening here.

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-17 16:35                             ` Peter Stephenson
@ 2018-04-17 17:52                               ` Bart Schaefer
  2018-04-19  9:40                                 ` Peter Stephenson
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Schaefer @ 2018-04-17 17:52 UTC (permalink / raw)
  To: zsh-workers

On Apr 17,  5:35pm, Peter Stephenson wrote:
}
} It's the whole structure of the parsing rather than the resulting word
} code (though of course the effect is encoded there).  When "time" is
} parsed we're at the level of a command, but instead of treating the
} arguments as a command (which wouldn't work as we need to time am entire
} pipeline), the parser recursively parses a nested pipeline (essentially
} --- it's actually called "sublist2" in the parser because it can have
} "coproc" or "!" in front, which also apply to the whole pipeline).

And sublist2() still stops at the "&".  Understood so far, I think.

} The ampersand is therefore parsed rather late to be able to see you need
} this structure.  So it probably needs some other trick --- a different
} list marker that causes a special null command akin to time to do the
} fork, for example, as it's easy to update word code tokens when the
} structure doesn't change.

So what we'd need to do (?) is insert a no-op token in the wordcode at
the "front" of the parse, and update it to be an "in background" token
in the event the parse ends at "&" (or "&!" etc.) ...

} The logic would then be within the exec code

... which would ignore the no-op or invoke an additional execcmd_bg() to
exec the pipeline after the fork.

} > There's also workers/42233 and related thread
} 
} I think we demonstrated fairly conclusively that no one has the faintest
} clue what's happening here.

Actually I'm fairly sure that what's happening is that the resulting
wordcode is so large that the tail of it has overflowed the integer
type used as an internal offset for byte positions / program counter.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-17  9:19                         ` Peter Stephenson
  2018-04-17 16:09                           ` Bart Schaefer
@ 2018-04-18  6:29                           ` Stephane Chazelas
  2018-04-18 16:33                             ` Bart Schaefer
  1 sibling, 1 reply; 41+ messages in thread
From: Stephane Chazelas @ 2018-04-18  6:29 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

2018-04-17 10:19:47 +0100, Peter Stephenson:
[...]
> > % echo ${foo::=bar} & echo $foo
> > [1] 31940
> > bar
> > % bar
[...]
> It certainly looks rather unexpected.  I think most of us staring at it
> would assume the entire command line was evaluated in the background.
> It's hard to believe anyone is deliberately relying on this behaviour.
> I think it would be fine to change with a note about the
> incompatibility.
[...]

But then again (as already mentioned in 42527)

$ echo ${foo::=bar} | echo $foo
bar

is equally surprising to me (and at the root of most of the
problems in this thread as the expansions are not done in the
job's process group).

foo | bar

to me is like

foo & bar

except for the pipe() that connects them.

I suspect if you reuse "time" code, that won't address this
case.

-- 
Stephane


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-18  6:29                           ` Stephane Chazelas
@ 2018-04-18 16:33                             ` Bart Schaefer
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Schaefer @ 2018-04-18 16:33 UTC (permalink / raw)
  To: Peter Stephenson, zsh-workers

On Tue, Apr 17, 2018 at 11:29 PM, Stephane Chazelas
<stephane.chazelas@gmail.com> wrote:
>
> $ echo ${foo::=bar} | echo $foo
> bar
>
> I suspect if you reuse "time" code, that won't address this
> case.

The time parse isn't directly re-usable anyway; it's the general
technique (have a placeholder at the front that can be updated if the
command separator indicates the job will be backgrounded) that's of
interest.  The case above needs a placeholder at the beginning of each
command, whereas the "&" case needs one at the beginning of each
pipeline.  That does mean that

$ echo ${foo::=bar} | echo $foo & echo started

needs two placeholders before the first echo.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-17 17:52                               ` Bart Schaefer
@ 2018-04-19  9:40                                 ` Peter Stephenson
  2018-04-20  9:28                                   ` Forking earlier for background code Peter Stephenson
       [not found]                                   ` <F3A62E38-24E2-4A62-8E19-F54C9B81E9E5@kba.biglobe.ne.jp>
  0 siblings, 2 replies; 41+ messages in thread
From: Peter Stephenson @ 2018-04-19  9:40 UTC (permalink / raw)
  To: zsh-workers

>} The ampersand is therefore parsed rather late to be able to see you
>need } this structure.  So it probably needs some other trick --- a
>different } list marker that causes a special null command akin to
>time to do the } fork, for example, as it's easy to update word code
>tokens when the } structure doesn't change.
>
> So what we'd need to do (?) is insert a no-op token in the wordcode at
> the "front" of the parse, and update it to be an "in background" token
> in the event the parse ends at "&" (or "&!" etc.) ...
>
>} The logic would then be within the exec code
>
> ... which would ignore the no-op or invoke an additional execcmd_bg()
> to exec the pipeline after the fork.

This strikes me as baroque enough we might just as well look at fixing
it properly.  This is likely to have obscure side effects of one sort or
another, too, but with them being localised to execcmd_exec() I hope
they'd be a bit more obvious than changing the execution structure...

This is a first go that doesn't appear to be completely broken.  Lots of
rearranging and reindenting, very little novel code.  I'm not sure how
much git users feel like being guinea pigs.

pws


diff --git a/Src/exec.c b/Src/exec.c
index f6c768f..5ea3e0b 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -2707,6 +2707,77 @@ static void execcmd_getargs(LinkList preargs, LinkList args, int expand)
     }
 }
 
+/**/
+static int execcmd_fork(Estate state, int how, int type, Wordcode varspc,
+			LinkList *filelistp, char *text, int oautocont)
+{
+    pid_t pid;
+    int synch[2], flags;
+    char dummy;
+    struct timeval bgtime;
+
+    child_block();
+
+    if (pipe(synch) < 0) {
+	zerr("pipe failed: %e", errno);
+	return -1;
+    } else if ((pid = zfork(&bgtime)) == -1) {
+	close(synch[0]);
+	close(synch[1]);
+	lastval = 1;
+	errflag |= ERRFLAG_ERROR;
+	return -1;
+    }
+    if (pid) {
+	close(synch[1]);
+	read_loop(synch[0], &dummy, 1);
+	close(synch[0]);
+	if (how & Z_ASYNC) {
+	    lastpid = (zlong) pid;
+	} else if (!jobtab[thisjob].stty_in_env && varspc) {
+	    /* search for STTY=... */
+	    Wordcode p = varspc;
+	    wordcode ac;
+
+	    while (wc_code(ac = *p) == WC_ASSIGN) {
+		if (!strcmp(ecrawstr(state->prog, p + 1, NULL), "STTY")) {
+		    jobtab[thisjob].stty_in_env = 1;
+		    break;
+		}
+		p += (WC_ASSIGN_TYPE(ac) == WC_ASSIGN_SCALAR ?
+		      3 : WC_ASSIGN_NUM(ac) + 2);
+	    }
+	}
+	addproc(pid, text, 0, &bgtime);
+	if (oautocont >= 0)
+	    opts[AUTOCONTINUE] = oautocont;
+	pipecleanfilelist(jobtab[thisjob].filelist, 1);
+	return pid;
+    }
+
+    /* pid == 0 */
+    close(synch[0]);
+    flags = ((how & Z_ASYNC) ? ESUB_ASYNC : 0) | ESUB_PGRP;
+    if ((type != WC_SUBSH) && !(how & Z_ASYNC))
+	flags |= ESUB_KEEPTRAP;
+    if (type == WC_SUBSH && !(how & Z_ASYNC))
+	flags |= ESUB_JOB_CONTROL;
+    *filelistp = jobtab[thisjob].filelist;
+    entersubsh(flags);
+    close(synch[1]);
+
+    if (sigtrapped[SIGINT] & ZSIG_IGNORED)
+	holdintr();
+#ifdef HAVE_NICE
+    /* Check if we should run background jobs at a lower priority. */
+    if ((how & Z_ASYNC) && isset(BGNICE))
+	if (nice(5) < 0)
+	    zwarn("nice(5) failed: %e", errno);
+#endif /* HAVE_NICE */
+
+    return 0;
+}
+
 /*
  * Execute a command at the lowest level of the hierarchy.
  */
@@ -3074,6 +3145,21 @@ execcmd_exec(Estate state, Execcmd_params eparams,
     esprefork = (magic_assign ||
 		 (isset(MAGICEQUALSUBST) && type != WC_TYPESET)) ?
 		 PREFORK_TYPESET : 0;
+    if (how & Z_ASYNC) {
+	text = getjobtext(state->prog, eparams->beg);
+	switch (execcmd_fork(state, how, type, varspc, &filelist,
+			     text, oautocont)) {
+	case -1:
+	    goto fatal;
+	case 0:
+	    break;
+	default:
+	    return;
+	}
+	forked = 1;
+    } else
+	text = NULL;
+
     if (args) {
 	if (eparams->htok)
 	    prefork(args, esprefork, NULL);
@@ -3226,11 +3312,9 @@ execcmd_exec(Estate state, Execcmd_params eparams,
     }
 
     /* Get the text associated with this command. */
-    if ((how & Z_ASYNC) ||
+    if (!text &&
 	(!sfcontext && (jobbing || (how & Z_TIMED))))
 	text = getjobtext(state->prog, eparams->beg);
-    else
-	text = NULL;
 
     /*
      * Set up special parameter $_
@@ -3397,99 +3481,45 @@ execcmd_exec(Estate state, Execcmd_params eparams,
      * current shell.                                                         *
      **************************************************************************/
 
-    if ((how & Z_ASYNC) ||
-	(!do_exec &&
-	 (((is_builtin || is_shfunc) && output) ||
-	  (!is_cursh && (last1 != 1 || nsigtrapped || havefiles() ||
-			 fdtable_flocks))))) {
-
-	pid_t pid;
-	int synch[2], flags;
-	char dummy;
-	struct timeval bgtime;
-
-	child_block();
-
-	if (pipe(synch) < 0) {
-	    zerr("pipe failed: %e", errno);
-	    goto fatal;
-	} else if ((pid = zfork(&bgtime)) == -1) {
-	    close(synch[0]);
-	    close(synch[1]);
-	    lastval = 1;
-	    errflag |= ERRFLAG_ERROR;
-	    goto fatal;
-	}
-	if (pid) {
-
-	    close(synch[1]);
-	    read_loop(synch[0], &dummy, 1);
-	    close(synch[0]);
-	    if (how & Z_ASYNC) {
-		lastpid = (zlong) pid;
-	    } else if (!jobtab[thisjob].stty_in_env && varspc) {
-		/* search for STTY=... */
-		Wordcode p = varspc;
-		wordcode ac;
-
-		while (wc_code(ac = *p) == WC_ASSIGN) {
-		    if (!strcmp(ecrawstr(state->prog, p + 1, NULL), "STTY")) {
-			jobtab[thisjob].stty_in_env = 1;
-			break;
-		    }
-		    p += (WC_ASSIGN_TYPE(ac) == WC_ASSIGN_SCALAR ?
-			  3 : WC_ASSIGN_NUM(ac) + 2);
-		}
+    if (!forked) {
+	if (!do_exec &&
+	    (((is_builtin || is_shfunc) && output) ||
+	     (!is_cursh && (last1 != 1 || nsigtrapped || havefiles() ||
+			    fdtable_flocks)))) {
+	    switch (execcmd_fork(state, how, type, varspc, &filelist,
+				 text, oautocont)) {
+	    case -1:
+		goto fatal;
+	    case 0:
+		break;
+	    default:
+		return;
 	    }
-	    addproc(pid, text, 0, &bgtime);
-	    if (oautocont >= 0)
-		opts[AUTOCONTINUE] = oautocont;
-	    pipecleanfilelist(jobtab[thisjob].filelist, 1);
-	    return;
-	}
-	/* pid == 0 */
-	close(synch[0]);
-	flags = ((how & Z_ASYNC) ? ESUB_ASYNC : 0) | ESUB_PGRP;
-	if ((type != WC_SUBSH) && !(how & Z_ASYNC))
-	    flags |= ESUB_KEEPTRAP;
-	if (type == WC_SUBSH && !(how & Z_ASYNC))
-	    flags |= ESUB_JOB_CONTROL;
-	filelist = jobtab[thisjob].filelist;
-	entersubsh(flags);
-	close(synch[1]);
-	forked = 1;
-	if (sigtrapped[SIGINT] & ZSIG_IGNORED)
-	    holdintr();
-#ifdef HAVE_NICE
-	/* Check if we should run background jobs at a lower priority. */
-	if ((how & Z_ASYNC) && isset(BGNICE))
-	    if (nice(5) < 0)
-		zwarn("nice(5) failed: %e", errno);
-#endif /* HAVE_NICE */
-
-    } else if (is_cursh) {
-	/* This is a current shell procedure that didn't need to fork.    *
-	 * This includes current shell procedures that are being exec'ed, *
-	 * as well as null execs.                                         */
-	jobtab[thisjob].stat |= STAT_CURSH;
-	if (!jobtab[thisjob].procs)
-	    jobtab[thisjob].stat |= STAT_NOPRINT;
-	if (is_builtin)
-	  jobtab[thisjob].stat |= STAT_BUILTIN;
-    } else {
-	/* This is an exec (real or fake) for an external command.    *
-	 * Note that any form of exec means that the subshell is fake *
-	 * (but we may be in a subshell already).                     */
-	is_exec = 1;
-	/*
-	 * If we are in a subshell environment anyway, say we're forked,
-	 * even if we're actually not forked because we know the
-	 * subshell is exiting.  This ensures SHLVL reflects the current
-	 * shell, and also optimises out any save/restore we'd need to
-	 * do if we were returning to the main shell.
-	 */
-	if (type == WC_SUBSH)
 	    forked = 1;
+	} else if (is_cursh) {
+	    /* This is a current shell procedure that didn't need to fork.    *
+	     * This includes current shell procedures that are being exec'ed, *
+	     * as well as null execs.                                         */
+	    jobtab[thisjob].stat |= STAT_CURSH;
+	    if (!jobtab[thisjob].procs)
+		jobtab[thisjob].stat |= STAT_NOPRINT;
+	    if (is_builtin)
+		jobtab[thisjob].stat |= STAT_BUILTIN;
+	} else {
+	    /* This is an exec (real or fake) for an external command.    *
+	     * Note that any form of exec means that the subshell is fake *
+	     * (but we may be in a subshell already).                     */
+	    is_exec = 1;
+	    /*
+	     * If we are in a subshell environment anyway, say we're forked,
+	     * even if we're actually not forked because we know the
+	     * subshell is exiting.  This ensures SHLVL reflects the current
+	     * shell, and also optimises out any save/restore we'd need to
+	     * do if we were returning to the main shell.
+	     */
+	    if (type == WC_SUBSH)
+		forked = 1;
+	}
     }
 
     if ((esglob = !(cflags & BINF_NOGLOB)) && args && eparams->htok) {


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Forking earlier for background code
  2018-04-19  9:40                                 ` Peter Stephenson
@ 2018-04-20  9:28                                   ` Peter Stephenson
  2018-04-20 10:01                                     ` Peter Stephenson
  2018-04-23  8:29                                     ` Peter Stephenson
       [not found]                                   ` <F3A62E38-24E2-4A62-8E19-F54C9B81E9E5@kba.biglobe.ne.jp>
  1 sibling, 2 replies; 41+ messages in thread
From: Peter Stephenson @ 2018-04-20  9:28 UTC (permalink / raw)
  To: zsh-workers

I've renamed this part of the previous pipeline + process group thread.

This patch:

> This is a first go that doesn't appear to be completely broken.  Lots
> of rearranging and reindenting, very little novel code.  I'm not sure
> how much git users feel like being guinea pigs.

to fork earlier when a process is being put into the background looks
fairly benign to me, so I'll commit it (with a few more comments) in
order to see what oddities we can track down.

This was the easy case because the test for (how & Z_ASYNC) wasn't
combined with anything else. so factored out trivially.


Deep breath.


It may well be possible to expand the logic to track down other cases
where we know we can know we're going to need to fork, so can do so early
to get fewer side effects (and slightly optimise the main shell).
However, we're going to have to disentangle the code shown below.  At
the moment this relies on us being able to expand the command line to
decide if the command is a builtin or shell.  However, with enough
banging of heads against granite outcrops it might be possible to
rationalise what's going on with "do_exec", "output" and "last1" enough
so this isn't always necessary.

"last1" --- at the end of the exec list and also the pipeline:  it's
effectively an "and" of choices made in execlist() and execpline2().
Note special case below.

"do_exec" --- last1 and doing an exec, so no need to fork.

"output" --- output of pipeline, so zero if not a pipeline (we never
pipe output to fd 0).

The "if (!forked)" test is the result of the new early fork.

One complication is in execpline2() where it says

	/* if we are doing "foo | bar" where foo is a current *
	 * shell command, do foo in a subshell and do the     *
	 * rest of the pipeline in the current shell.         */

I think this may just be an early attempt to get the effects Stephane
wants?  In any case, last1 is passed down to execcmd as 1 here so we
don't fork again (except the logic is complicated enough it's not
obvious to me that's necessarily true).

So I think if "last1" is zero and "output" is non-zero we might be able to
fork early, perhaps?  And possibly if we do that the special code in
execpline2() becomes redundant, maybe?  So then we only need to be
sensitive to "output", perhaps maybe?

(I don't suppose anybody actually understands this?)


    /**************************************************************************
     * Do we need to fork?  We need to fork if:                               *
     * 1) The command is supposed to run in the background.  This             *
     *    case is now handled above (forked = 1 here). (or)                   *
     * 2) There is no `exec' flag, and either:                                *
     *    a) This is a builtin or shell function with output piped somewhere. *
     *    b) This is an external command and we can't do a `fake exec'.       *
     *                                                                        *
     * A `fake exec' is possible if we have all the following conditions:     *
     * 1) last1 flag is 1.  This indicates that the current shell will not    *
     *    be needed after the current command.  This is typically the case    *
     *    when the command is the last stage in a subshell, or is the         *
     *    last command after the option `-c'.                                 *
     * 2) We don't have any traps set.                                        *
     * 3) We don't have any files to delete.                                  *
     *                                                                        *
     * The condition above for a `fake exec' will also work for a current     *
     * shell command such as a builtin, but doesn't really buy us anything    *
     * (doesn't save us a process), since it is already running in the        *
     * current shell.                                                         *
     **************************************************************************/

    if (!forked) {
	if (!do_exec &&
	    (((is_builtin || is_shfunc) && output) ||
	     (!is_cursh && (last1 != 1 || nsigtrapped || havefiles() ||
			    fdtable_flocks)))) {

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Forking earlier for background code
  2018-04-20  9:28                                   ` Forking earlier for background code Peter Stephenson
@ 2018-04-20 10:01                                     ` Peter Stephenson
  2018-04-20 12:54                                       ` Vin Shelton
  2018-04-23  8:29                                     ` Peter Stephenson
  1 sibling, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-20 10:01 UTC (permalink / raw)
  To: zsh-workers

(Sorry about multiple messages, but I'm sure you're fascinated...)

On Fri, 20 Apr 2018 10:28:39 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> It may well be possible to expand the logic to track down other cases
> where we know we can know we're going to need to fork, so can do so
> early to get fewer side effects (and slightly optimise the main
> shell).
>...
> So I think if "last1" is zero and "output" is non-zero we might be
> able to fork early, perhaps?

If you want to play along at home with step 1 here, try the following.

As far as I can see, we don't need to take special account of "exec"
here.  For the pipeline to work we always need to fork even if the
command has the "exec" keyword in front (same true of "&", in fact).


diff --git a/Src/exec.c b/Src/exec.c
index 1b622d5..768e3ef 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -3146,10 +3146,11 @@ execcmd_exec(Estate state, Execcmd_params eparams,
     esprefork = (magic_assign ||
 		 (isset(MAGICEQUALSUBST) && type != WC_TYPESET)) ?
 		 PREFORK_TYPESET : 0;
-    if (how & Z_ASYNC) {
+    if ((how & Z_ASYNC) || (output && !last1)) {
 	/*
-	 * If running in the background, we don't need any of
-	 * the rest of this functino to affect the state in the
+	 * If running in the background, or not the last command in a
+	 * pipeline and not already forked, we don't need any of
+	 * the rest of this function to affect the state in the
 	 * main shell, so fork immediately.
 	 *
 	 * In other cases we may need to process the command line

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Forking earlier for background code
  2018-04-20 10:01                                     ` Peter Stephenson
@ 2018-04-20 12:54                                       ` Vin Shelton
       [not found]                                         ` <CGME20180420132008eucas1p1c297624e870975cd892c74254970faab@eucas1p1.samsung.com>
  0 siblings, 1 reply; 41+ messages in thread
From: Vin Shelton @ 2018-04-20 12:54 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh Hackers' List

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]

I really think 5.5.2 should go out with only 5.5.1 + the signals.c change
first.

My $.02.

  - Vin

On Fri, Apr 20, 2018 at 6:01 AM, Peter Stephenson <p.stephenson@samsung.com>
wrote:

> (Sorry about multiple messages, but I'm sure you're fascinated...)
>
> On Fri, 20 Apr 2018 10:28:39 +0100
> Peter Stephenson <p.stephenson@samsung.com> wrote:
> > It may well be possible to expand the logic to track down other cases
> > where we know we can know we're going to need to fork, so can do so
> > early to get fewer side effects (and slightly optimise the main
> > shell).
> >...
> > So I think if "last1" is zero and "output" is non-zero we might be
> > able to fork early, perhaps?
>
> If you want to play along at home with step 1 here, try the following.
>
> As far as I can see, we don't need to take special account of "exec"
> here.  For the pipeline to work we always need to fork even if the
> command has the "exec" keyword in front (same true of "&", in fact).
>
>
> diff --git a/Src/exec.c b/Src/exec.c
> index 1b622d5..768e3ef 100644
> --- a/Src/exec.c
> +++ b/Src/exec.c
> @@ -3146,10 +3146,11 @@ execcmd_exec(Estate state, Execcmd_params eparams,
>      esprefork = (magic_assign ||
>                  (isset(MAGICEQUALSUBST) && type != WC_TYPESET)) ?
>                  PREFORK_TYPESET : 0;
> -    if (how & Z_ASYNC) {
> +    if ((how & Z_ASYNC) || (output && !last1)) {
>         /*
> -        * If running in the background, we don't need any of
> -        * the rest of this functino to affect the state in the
> +        * If running in the background, or not the last command in a
> +        * pipeline and not already forked, we don't need any of
> +        * the rest of this function to affect the state in the
>          * main shell, so fork immediately.
>          *
>          * In other cases we may need to process the command line
>
> pws
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Forking earlier for background code
       [not found]                                         ` <CGME20180420132008eucas1p1c297624e870975cd892c74254970faab@eucas1p1.samsung.com>
@ 2018-04-20 13:20                                           ` Peter Stephenson
  2018-04-20 16:01                                             ` Vin Shelton
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-20 13:20 UTC (permalink / raw)
  To: Vin Shelton, Zsh Hackers' List

On Fri, 20 Apr 2018 08:54:21 -0400
Vin Shelton <acs@alumni.princeton.edu> wrote:
> I really think 5.5.2 should go out with only 5.5.1 + the signals.c
> change first.

As I explained yesterday (if it was yesterday), 5.5.1 doesn't need it.
It doesn't have the problem you came across.

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Forking earlier for background code
  2018-04-20 13:20                                           ` Peter Stephenson
@ 2018-04-20 16:01                                             ` Vin Shelton
  0 siblings, 0 replies; 41+ messages in thread
From: Vin Shelton @ 2018-04-20 16:01 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh Hackers' List

[-- Attachment #1: Type: text/plain, Size: 454 bytes --]

Sorry, I misinterpreted what you wrote.

My mistake.

  - Vin

On Fri, Apr 20, 2018 at 9:20 AM, Peter Stephenson <p.stephenson@samsung.com>
wrote:

> On Fri, 20 Apr 2018 08:54:21 -0400
> Vin Shelton <acs@alumni.princeton.edu> wrote:
> > I really think 5.5.2 should go out with only 5.5.1 + the signals.c
> > change first.
>
> As I explained yesterday (if it was yesterday), 5.5.1 doesn't need it.
> It doesn't have the problem you came across.
>
> pws
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Forking earlier for background code
  2018-04-20  9:28                                   ` Forking earlier for background code Peter Stephenson
  2018-04-20 10:01                                     ` Peter Stephenson
@ 2018-04-23  8:29                                     ` Peter Stephenson
  2018-05-01  9:23                                       ` Peter Stephenson
  1 sibling, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-23  8:29 UTC (permalink / raw)
  To: zsh-workers

On Fri, 20 Apr 2018 10:28:39 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> It may well be possible to expand the logic to track down other cases
> where we know we can know we're going to need to fork, so can do so
> early to get fewer side effects (and slightly optimise the main
> shell).

I've been working on this, and as it's kind of interesting I think I'm
going to push my additional work on a branch called fork_early.  The
commit log entries are fairly descriptive, but here's a summary of what
I've done.

- Move the early fork even earlier --- the code immediately above it was
to do with initial examination of command line arguments, and the
new fork code deals exactly with the cases where we don't need to know
this.

- _exit if we forked and were going to return early from
execcmd_exec().  This shows up particularly in one case with the
preceding change --- namely assignments.

- As suggested, remove the previous code in the caller that did the fork
there if we recognised early this was code to run within the shell but
in an early part of the pipeline.  This is now redundant.  (I'm
wondering if this special code was there so that things like "foo=bar
&", which would have polluted the main shell, didn't push the original
problem beyond the pain barrier --- i.e. it was always just a partial
workaround.)

- That change revealed two other things that needed doing.  First, set
"last1" when we do the early fork so that subsequent code in the forked
shell knows we are going to exit.  This was revealed by the failure of a
test that sets an EXIT trap in the left hand side of a pipeline.
(Existing exit traps are cleared here, but it's possible to set a new
one and it should go off when the forked shell exits.)

- Second, Bart's woraround in zsh-wrokers/32171 for a leaked fd needed
rewriting.  The change is to ensure we always close the pipe fd that's not
needed in the forked code when we fork (i.e. the input side of the pipe
on the LHS --- the output side on the RHS was simpler and always managed
OK).  This seems reasonably transparent so I hope it's not going to lead
us down a blind alley (Bart added an explicit test for the case that was
failing).

We could probably do with some tests, although testing with background
code is a pain.

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
       [not found]                                   ` <F3A62E38-24E2-4A62-8E19-F54C9B81E9E5@kba.biglobe.ne.jp>
@ 2018-04-23 13:52                                     ` Peter Stephenson
  2018-04-23 14:03                                       ` Peter Stephenson
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Stephenson @ 2018-04-23 13:52 UTC (permalink / raw)
  To: Zsh hackers' list

On Mon, 23 Apr 2018 21:58:52 +0900
Jun T <takimoto-j@kba.biglobe.ne.jp> wrote:
> After the following commit:
> 
> commit 3c74891fcd68d37c1629943f703ac70428e3ce9f
> Author: Peter Stephenson <p.stephenson@samsung.com>
> Date:   Fri Apr 13 12:09:34 2018 +0100
> 
>     42630: Improve process group handling in pipelines.
> 
> a simple pipeline like the following fails:
> 
> zsh% ls | less
> zsh: done                    ls | 
> zsh: suspended (tty output)  less
> 
> 
> With the current master (9e2afb92987d7fd96a838c15b6641cc1b634a825)
> 'tty output' is replaced by 'tty input'.

Somehow nobody's noticed that yet...  I think there's a race, making
this only happen sometimes, presumably with greater frequency on some
systems than others.  That's not going to help matters.  I've managed to
get it to happen a couple of times.

This problem suggesting something basic in the ordering of events.

ls has presumably exited and presumably also been reaped, or we don't
get the new behaviour --- I guess this happened in such a way that less
can't grab the terminal back from the shell.

Hmmm --- I wonder if less was already forked and that's the problem?  So
maybe we should check if there are other processes already forked and
avoid resetting the pgrp leader in that case?  Does anyone actually
know?

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-23 13:52                                     ` "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups Peter Stephenson
@ 2018-04-23 14:03                                       ` Peter Stephenson
       [not found]                                         ` <CGME20180423140859eucas1p2591bf1422614209979d4890383268c37@eucas1p2.samsung.com>
  2018-04-23 15:29                                         ` Jun T.
  0 siblings, 2 replies; 41+ messages in thread
From: Peter Stephenson @ 2018-04-23 14:03 UTC (permalink / raw)
  To: Zsh hackers' list

On Mon, 23 Apr 2018 14:52:38 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> > zsh% ls | less
> > zsh: done                    ls | 
> > zsh: suspended (tty output)  less
> 
> Hmmm --- I wonder if less was already forked and that's the problem?
> So maybe we should check if there are other processes already forked
> and avoid resetting the pgrp leader in that case?  Does anyone
> actually know?

Actually, I think this is the *other* thing I was wondering about but
didn't get around to taking account of...

See if this makes it reliable.  I don't think it breaks the original
fix.

The main reason I didn't do this is the effect of killpg with signal 0
isn't actually defined in the Linu man pages or as far as I can see in
POSIX. But we do this elsewhere so presumably it's a well-known fact
this works...

pws

diff --git a/Src/signals.c b/Src/signals.c
index 6e12158..f2165c0 100644
--- a/Src/signals.c
+++ b/Src/signals.c
@@ -538,7 +538,8 @@ wait_for_processes(void)
 		update_process(pn, status);
 #endif
 		if (WIFEXITED(status) &&
-		    pn->pid == jn->gleader) {
+		    pn->pid == jn->gleader &&
+		    killpg(pn->pid, 0) == -1) {
 		    jn->gleader = 0;
 		    if (!(jn->stat & STAT_NOSTTY)) {
 			/*


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
       [not found]                                         ` <CGME20180423140859eucas1p2591bf1422614209979d4890383268c37@eucas1p2.samsung.com>
@ 2018-04-23 14:08                                           ` Peter Stephenson
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Stephenson @ 2018-04-23 14:08 UTC (permalink / raw)
  To: Zsh hackers' list

On Mon, 23 Apr 2018 15:03:12 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> The main reason I didn't do this is the effect of killpg with signal 0
> isn't actually defined in the Linu man pages or as far as I can see in
> POSIX. But we do this elsewhere so presumably it's a well-known fact
> this works...

After enough poking in manuals, this certainly looks pukka on Linux...

"man killpg" sez

       On  Linux, killpg() is implemented as a library function that makes the
       call kill(-pgrp, sig).

and "man 2 kill" sez

       If  sig  is 0, then no signal is sent, but error checking is still per‐
       formed; this can be used to check for the existence of a process ID  or
       process group ID.

Given it's already all over the shell it's obviously the right thing to
do here, too.

pws


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups
  2018-04-23 14:03                                       ` Peter Stephenson
       [not found]                                         ` <CGME20180423140859eucas1p2591bf1422614209979d4890383268c37@eucas1p2.samsung.com>
@ 2018-04-23 15:29                                         ` Jun T.
  1 sibling, 0 replies; 41+ messages in thread
From: Jun T. @ 2018-04-23 15:29 UTC (permalink / raw)
  To: zsh-workers


> 2018/04/23 23:03, Peter Stephenson <p.stephenson@samsung.com> wrote:
> 
> See if this makes it reliable.  I don't think it breaks the original
> fix.

On macOS, it was failing 100% but with the patch it is working normally.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Forking earlier for background code
  2018-04-23  8:29                                     ` Peter Stephenson
@ 2018-05-01  9:23                                       ` Peter Stephenson
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Stephenson @ 2018-05-01  9:23 UTC (permalink / raw)
  To: zsh-workers

I haven't seen any problems with the changes in the list below over the
last week so I'm going to squash the changes and push the result.

pws

On Mon, 23 Apr 2018 09:29:00 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> - Move the early fork even earlier --- the code immediately above it
> was to do with initial examination of command line arguments, and the
> new fork code deals exactly with the cases where we don't need to know
> this.
> 
> - _exit if we forked and were going to return early from
> execcmd_exec().  This shows up particularly in one case with the
> preceding change --- namely assignments.
> 
> - As suggested, remove the previous code in the caller that did the
> fork there if we recognised early this was code to run within the
> shell but in an early part of the pipeline.  This is now redundant.
> (I'm wondering if this special code was there so that things like
> "foo=bar &", which would have polluted the main shell, didn't push
> the original problem beyond the pain barrier --- i.e. it was always
> just a partial workaround.)
> 
> - That change revealed two other things that needed doing.  First, set
> "last1" when we do the early fork so that subsequent code in the
> forked shell knows we are going to exit.  This was revealed by the
> failure of a test that sets an EXIT trap in the left hand side of a
> pipeline. (Existing exit traps are cleared here, but it's possible to
> set a new one and it should go off when the forked shell exits.)
> 
> - Second, Bart's woraround in zsh-wrokers/32171 for a leaked fd needed
> rewriting.  The change is to ensure we always close the pipe fd
> that's not needed in the forked code when we fork (i.e. the input
> side of the pipe on the LHS --- the output side on the RHS was
> simpler and always managed OK).  This seems reasonably transparent so
> I hope it's not going to lead us down a blind alley (Bart added an
> explicit test for the case that was failing).


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-05-01  9:30 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-23 16:16 "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups Stephane Chazelas
2018-03-23 23:36 ` Bart Schaefer
2018-03-24  5:19   ` Bart Schaefer
2018-03-24  8:05     ` Joey Pabalinas
2018-03-24 17:34       ` Stephane Chazelas
2018-03-24 22:23         ` Bart Schaefer
2018-03-25  7:00           ` Stephane Chazelas
2018-03-25  8:26             ` Stephane Chazelas
2018-03-25  7:48           ` Stephane Chazelas
2018-03-25 18:09             ` Stephane Chazelas
2018-03-27 10:17         ` Stephane Chazelas
2018-03-24 22:09       ` Bart Schaefer
2018-04-10 11:45         ` Peter Stephenson
2018-04-10 13:59           ` Peter Stephenson
2018-04-11 22:10             ` Bart Schaefer
2018-04-12 16:23               ` Peter Stephenson
2018-04-15 16:23                 ` Stephane Chazelas
2018-04-15 17:38                   ` Bart Schaefer
2018-04-15 18:58                     ` Stephane Chazelas
2018-04-17  5:39                       ` Bart Schaefer
2018-04-17  9:19                         ` Peter Stephenson
2018-04-17 16:09                           ` Bart Schaefer
2018-04-17 16:27                             ` Bart Schaefer
2018-04-17 16:35                             ` Peter Stephenson
2018-04-17 17:52                               ` Bart Schaefer
2018-04-19  9:40                                 ` Peter Stephenson
2018-04-20  9:28                                   ` Forking earlier for background code Peter Stephenson
2018-04-20 10:01                                     ` Peter Stephenson
2018-04-20 12:54                                       ` Vin Shelton
     [not found]                                         ` <CGME20180420132008eucas1p1c297624e870975cd892c74254970faab@eucas1p1.samsung.com>
2018-04-20 13:20                                           ` Peter Stephenson
2018-04-20 16:01                                             ` Vin Shelton
2018-04-23  8:29                                     ` Peter Stephenson
2018-05-01  9:23                                       ` Peter Stephenson
     [not found]                                   ` <F3A62E38-24E2-4A62-8E19-F54C9B81E9E5@kba.biglobe.ne.jp>
2018-04-23 13:52                                     ` "echo | ps -j $(:) | cat | cat | cat" runs components in different process groups Peter Stephenson
2018-04-23 14:03                                       ` Peter Stephenson
     [not found]                                         ` <CGME20180423140859eucas1p2591bf1422614209979d4890383268c37@eucas1p2.samsung.com>
2018-04-23 14:08                                           ` Peter Stephenson
2018-04-23 15:29                                         ` Jun T.
2018-04-18  6:29                           ` Stephane Chazelas
2018-04-18 16:33                             ` Bart Schaefer
2018-04-10  9:53   ` Peter Stephenson
2018-03-25  7:38 ` Stephane Chazelas

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).