zsh-workers
 help / color / mirror / code / Atom feed
* Re: multios doesn't work with 2>&1
       [not found] ` <131027100137.ZM4100@torch.brasslantern.com>
@ 2013-10-27 17:46   ` Peter Stephenson
  2013-10-27 18:27     ` Bart Schaefer
  2013-10-27 18:11   ` multios doesn't work with 2>&1 Bart Schaefer
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2013-10-27 17:46 UTC (permalink / raw)
  To: Zsh Hackers' List

Bart Schaefer wrote:
> If I back out workers/20666 (Jan 2005), then this example works again.

(This is really a zsh-workers thing, too, so I've moved it.)

Crikey, I'd worked out it was already broken in 2007 and I thought I was
doing well...

Playing around here, the other observation I have before looking at the
thread you're talking about is that

echo foo 2>&1 >/dev/null > >(sed 's/foo/bar/')
echo foo >/dev/null 2>&1 > >(sed 's/foo/bar/')
echo foo >/dev/null > >(sed 's/foo/bar/') 2>&1

all work as expected.  This seems inconsistent --- the synchronous /
asynchronous and run-in-shell / fork behaviour is clearly different in
that case, but it's not clear that should affect redirection.

(I'd also worked out that

echo foo >/dev/null 2>&1 | sed 's/foo/bar/'

gives a different bad effect, namely you get the output you want but the
shell hangs, but that maybe another natural consequence of the thread I
haven't read yet.)

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: multios doesn't work with 2>&1
       [not found] ` <131027100137.ZM4100@torch.brasslantern.com>
  2013-10-27 17:46   ` multios doesn't work with 2>&1 Peter Stephenson
@ 2013-10-27 18:11   ` Bart Schaefer
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2013-10-27 18:11 UTC (permalink / raw)
  To: zsh-workers

[> workers]

On Oct 27, 10:01am, Bart Schaefer wrote:
}
} }     % echo foo 2>&1 >/dev/null | sed 's/foo/bar/'
} } 
} } should output "bar" if multios has been set. But it outputs nothing
} 
} If there's a way to fix the original complaint (that >&- doesn't really
} close the descriptor when a multio is involved, so unnecessary multios
} result) without the above example remaining broken, I may need some help
} to find it.

Turns out I didn't need help after all.

As usual, it's just a matter of passing around enough information.


diff --git a/Src/exec.c b/Src/exec.c
index d5fe69e..99c7eaa 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1967,7 +1967,7 @@ clobber_open(struct redir *f)
 
 /**/
 static void
-closemn(struct multio **mfds, int fd)
+closemn(struct multio **mfds, int fd, int type)
 {
     if (fd >= 0 && mfds[fd] && mfds[fd]->ct >= 2) {
 	struct multio *mn = mfds[fd];
@@ -2026,7 +2026,7 @@ closemn(struct multio **mfds, int fd)
 		}
 	}
 	_exit(0);
-    } else if (fd >= 0)
+    } else if (fd >= 0 && type == REDIR_CLOSE)
 	mfds[fd] = NULL;
 }
 
@@ -3093,7 +3093,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 		    }
 		}
 		if (fn->fd1 < 10)
-		    closemn(mfds, fn->fd1);
+		    closemn(mfds, fn->fd1, REDIR_CLOSE);
 		if (!closed && zclose(fn->fd1) < 0) {
 		    zwarn("failed to close file descriptor %d: %e",
 			  fn->fd1, errno);
@@ -3102,7 +3102,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 	    case REDIR_MERGEIN:
 	    case REDIR_MERGEOUT:
 		if (fn->fd2 < 10)
-		    closemn(mfds, fn->fd2);
+		    closemn(mfds, fn->fd2, fn->type);
 		if (!checkclobberparam(fn))
 		    fil = -1;
 		else if (fn->fd2 > 9 &&
@@ -3181,7 +3181,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
      * spawning tee/cat processes as necessary.         */
     for (i = 0; i < 10; i++)
 	if (mfds[i] && mfds[i]->ct >= 2)
-	    closemn(mfds, i);
+	    closemn(mfds, i, REDIR_CLOSE);
 
     if (nullexec) {
 	if (nullexec == 1) {


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: multios doesn't work with 2>&1
  2013-10-27 17:46   ` multios doesn't work with 2>&1 Peter Stephenson
@ 2013-10-27 18:27     ` Bart Schaefer
  2013-10-27 19:39       ` Multio deadlock (Re: multios doesn't work with 2>&1) Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2013-10-27 18:27 UTC (permalink / raw)
  To: Zsh Hackers' List

On Oct 27,  5:46pm, Peter Stephenson wrote:
} Subject: Re: multios doesn't work with 2>&1
}
} Bart Schaefer wrote:
} > If I back out workers/20666 (Jan 2005), then this example works again.
} 
} (This is really a zsh-workers thing, too, so I've moved it.)
} 
} Crikey, I'd worked out it was already broken in 2007 and I thought I was
} doing well...

I went through "git whatchanged Src/exec.c" searching for the string
"multio" and then cross-referenced with Etc/ChangeLog-4.3.

} echo foo >/dev/null 2>&1 | sed 's/foo/bar/'
} 
} gives a different bad effect, namely you get the output you want but the
} shell hangs

This problem is still there after 31912.

The parent shell is in zwaitjob(), as is the shell that spawned sed.
The multio copy-input-to-output job is blocked on read of the pipe,
which is never going to close because the other two jobs are waiting.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Multio deadlock (Re: multios doesn't work with 2>&1)
  2013-10-27 18:27     ` Bart Schaefer
@ 2013-10-27 19:39       ` Bart Schaefer
  2013-10-27 20:24         ` Bart Schaefer
  2013-10-27 20:33         ` Peter Stephenson
  0 siblings, 2 replies; 9+ messages in thread
From: Bart Schaefer @ 2013-10-27 19:39 UTC (permalink / raw)
  To: Zsh Hackers' List

On Oct 27, 11:27am, Bart Schaefer wrote:
}
} } echo foo >/dev/null 2>&1 | sed 's/foo/bar/'
} } 
} } gives a different bad effect, namely you get the output you want but the
} } shell hangs
} 
} The parent shell is in zwaitjob(), as is the shell that spawned sed.

That should actually say "as is the shell that was forked for echo".
The parent shell is waiting for "sed".

Here's the spot where the "echo" shell is stopped:

	/*
	 * So what's going on here then?  Well, I'm glad you asked.
	 *
	 * If we create multios for use in a subshell we do
	 * this after forking, in this function above.  That
	 * means that the current (sub)process is responsible
	 * for clearing them up.  However, the processes won't
	 * go away until we have closed the fd's talking to them.
	 * Since we're about to exit the shell there's nothing
	 * to stop us closing all fd's (including the ones 0 to 9
	 * that we usually leave alone).
	 *
	 * Then we wait for any processes.  When we forked,
	 * we cleared the jobtable and started a new job just for
	 * any oddments like this, so if there aren't any we won't
	 * need to wait.  The result of not waiting is that
	 * the multios haven't flushed the fd's properly, leading
	 * to obscure missing data.
	 *
	 * It would probably be cleaner to ensure that the
	 * parent shell handled multios, but that requires
	 * some architectural changes which are likely to be
	 * hairy.
	 */
	for (i = 0; i < 10; i++)
	    if (fdtable[i] != FDT_UNUSED)
		close(i);
	closem(FDT_UNUSED);
	if (thisjob != -1)
	    waitjobs();
	_exit(lastval);

Obviously we've not succeeded in closing all the necessary descriptors.
Here's what's still open (PID 16361 == parent, 16383 == echo, 16384 ==	/*
	 * So what's going on here then?  Well, I'm glad you asked.
	 *
	 * If we create multios for use in a subshell we do
	 * this after forking, in this function above.  That
	 * means that the current (sub)process is responsible
	 * for clearing them up.  However, the processes won't
	 * go away until we have closed the fd's talking to them.
	 * Since we're about to exit the shell there's nothing
	 * to stop us closing all fd's (including the ones 0 to 9
	 * that we usually leave alone).
	 *
	 * Then we wait for any processes.  When we forked,
	 * we cleared the jobtable and started a new job just for
	 * any oddments like this, so if there aren't any we won't
	 * need to wait.  The result of not waiting is that
	 * the multios haven't flushed the fd's properly, leading
	 * to obscure missing data.
	 *
	 * It would probably be cleaner to ensure that the
	 * parent shell handled multios, but that requires
	 * some architectural changes which are likely to be
	 * hairy.
	 */
	for (i = 0; i < 10; i++)
	    if (fdtable[i] != FDT_UNUSED)
		close(i);
	closem(FDT_UNUSED);
	if (thisjob != -1)
	    waitjobs();
	_exit(lastval);

Obviously we've not succeeded in closing all the necessary descriptors.
Here's what's still open:

zsh     16361 16361 schaefer    0u   CHR  136,3         5 /dev/pts/3
zsh     16361 16361 schaefer    1u   CHR  136,3         5 /dev/pts/3
zsh     16361 16361 schaefer    2u   CHR  136,3         5 /dev/pts/3
zsh     16361 16361 schaefer   10u   CHR  136,3         5 /dev/pts/3
zsh     16383 16383 schaefer    2w  FIFO    0,7      1227018 pipe
zsh     16384 16383 schaefer   12w  FIFO    0,7      1227016 pipe
zsh     16384 16383 schaefer   13w   CHR    1,3         2056 /dev/null
zsh     16384 16383 schaefer   14r  FIFO    0,7      1227018 pipe
sed     16385 16383 schaefer    0r  FIFO    0,7      1227016 pipe
sed     16385 16383 schaefer    1u   CHR  136,3            5 /dev/pts/3
sed     16385 16383 schaefer    2u   CHR  136,3            5 /dev/pts/3

16361 is the parent, it's clean.  16383 is echo and 16384 is the multio.
The multio is blocked reading fd 14 (1227018 pipe), which it's parent
still has open as stderr because fdtable[2] == FDT_UNUSED.

Does the following look right?  It does fix the deadlock, but we might
call close() on an already closed fd, which it appears this is trying
to avoid (maybe so as not to change errno?).

diff --git a/Src/exec.c b/Src/exec.c
index 99c7eaa..7ac1ad5 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -3372,7 +3372,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 	 * hairy.
 	 */
 	for (i = 0; i < 10; i++)
-	    if (fdtable[i] != FDT_UNUSED)
+	    if (i < 3 || fdtable[i] != FDT_UNUSED)
 		close(i);
 	closem(FDT_UNUSED);
 	if (thisjob != -1)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multio deadlock (Re: multios doesn't work with 2>&1)
  2013-10-27 19:39       ` Multio deadlock (Re: multios doesn't work with 2>&1) Bart Schaefer
@ 2013-10-27 20:24         ` Bart Schaefer
  2013-10-27 20:33         ` Peter Stephenson
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2013-10-27 20:24 UTC (permalink / raw)
  To: Zsh Hackers' List

Sorry about the duplication in that last message, something obviously
got pasted twice by mistake.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multio deadlock (Re: multios doesn't work with 2>&1)
  2013-10-27 19:39       ` Multio deadlock (Re: multios doesn't work with 2>&1) Bart Schaefer
  2013-10-27 20:24         ` Bart Schaefer
@ 2013-10-27 20:33         ` Peter Stephenson
  2013-10-27 21:16           ` Peter Stephenson
  2013-10-27 22:18           ` Bart Schaefer
  1 sibling, 2 replies; 9+ messages in thread
From: Peter Stephenson @ 2013-10-27 20:33 UTC (permalink / raw)
  To: Zsh Hackers' List

On Sun, 27 Oct 2013 12:39:17 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> Does the following look right?  It does fix the deadlock, but we might
> call close() on an already closed fd, which it appears this is trying
> to avoid (maybe so as not to change errno?).
> 
> diff --git a/Src/exec.c b/Src/exec.c
> index 99c7eaa..7ac1ad5 100644
> --- a/Src/exec.c
> +++ b/Src/exec.c
> @@ -3372,7 +3372,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
>  	 * hairy.
>  	 */
>  	for (i = 0; i < 10; i++)
> -	    if (fdtable[i] != FDT_UNUSED)
> +	    if (i < 3 || fdtable[i] != FDT_UNUSED)
>  		close(i);
>  	closem(FDT_UNUSED);
>  	if (thisjob != -1)

Hmm... why don't we initialise fdtable[0..2] to FDT_INTERNAL at boot?  I
can see we don't, since from a shell soon after start I get

(gdb) p fdtable[0]@11
$6 = "\000\000\000\000\000\000\000\000\000\000\001"

(FDT_UNUSED = 0, FDT_INTERNAL = 1).  However, I've alsoe looked later
and seen all three set to FDT_INTERNAL; not sure where that happens but
presumably something to do with redir.  Shouldn't we just initialise
them properly?

We could in principle detect if they're open by attempting to dup() and
seeing what fd we get.  Not sure if that's worth it, but it might be
cleaner.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multio deadlock (Re: multios doesn't work with 2>&1)
  2013-10-27 20:33         ` Peter Stephenson
@ 2013-10-27 21:16           ` Peter Stephenson
  2013-10-27 22:31             ` Bart Schaefer
  2013-10-27 22:18           ` Bart Schaefer
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2013-10-27 21:16 UTC (permalink / raw)
  To: Zsh Hackers' List

On Sun, 27 Oct 2013 20:33:47 +0000
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> Hmm... why don't we initialise fdtable[0..2] to FDT_INTERNAL at boot?

Er, because it doesn't mean that, FDT_INTERNAL means "users can't monkey with
this".  We need something to mean "open and users can monkey with it".
Maybe something with a similar function to FDT_EXTERNAL would
work... but we'd need to be careful about the points where things got
closed.  We'd probably have to special case it in closem() when that
gets passed FDT_UNUSED.

Not sure how much this is worth it to avoid a bad close that's ignored anyway.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multio deadlock (Re: multios doesn't work with 2>&1)
  2013-10-27 20:33         ` Peter Stephenson
  2013-10-27 21:16           ` Peter Stephenson
@ 2013-10-27 22:18           ` Bart Schaefer
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2013-10-27 22:18 UTC (permalink / raw)
  To: Zsh Hackers' List

On Oct 27,  8:33pm, Peter Stephenson wrote:
} Subject: Re: Multio deadlock (Re: multios doesn't work with 2>&1)
}
} >  	for (i = 0; i < 10; i++)
} > -	    if (fdtable[i] != FDT_UNUSED)
} > +	    if (i < 3 || fdtable[i] != FDT_UNUSED)
} >  		close(i);
} 
} Hmm... why don't we initialise fdtable[0..2] to FDT_INTERNAL at boot?

I don't know.  Is FDT_INTERNAL the correct flag?  It supposedly means
that those descriptors are "not visible to other processes" which does
not sound right to me.  I think they should be FDT_EXTERNAL by default?

} (FDT_UNUSED = 0, FDT_INTERNAL = 1).  However, I've alsoe looked later
} and seen all three set to FDT_INTERNAL; not sure where that happens but
} presumably something to do with redir.

It happens in utils.c:redup() but I'm not sure it's strictly correct.
FDT_INTERNAL means they'll be "closed on exec" in execcmd() which is
not right in the general case, but maybe it is when they've been dup'd.

} Shouldn't we just initialise them properly?

That would be better, yes, but they get toggled between FDT_UNUSED and
FDT_INTERNAL every time there's a redirection, first when the original
fd is closed [zclose()] and then when restored again afterward [redup()].

So the question is, in the deadlock case why is fd 2 open but still
marked FDT_UNUSED?  Who dup'd onto it without going through redup()?
Or is that not what happened?  (Its really difficult to trace this
with gdb because all of this happens in one of the child processes.)

See patch below, done essentially on a guess.

Interesting aside:  If fdtable[x] starts out FDT_EXTERNAL and is then
saved with movefd() and restored with redup(), its state changes to
FDT_INTERNAL because movefd() does not preserve the original state [it
marks everything internal], but redup() copies the state.

} We could in principle detect if they're open by attempting to dup() and
} seeing what fd we get.  Not sure if that's worth it, but it might be
} cleaner.

You mean, do that in order to determine whether they are already "unused"
when the shell starts up?  I don't think that's necessary.  If you mean
do it to avoid close(), it won't help, errno will still get frobbed.

The following does appear to prevent the deadlock, but probably is not
yet sufficient because it doesn't check that fdtable[] is big enough.


diff --git a/Src/exec.c b/Src/exec.c
index 99c7eaa..2e57443 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -3124,6 +3124,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 		    if(fd == -2)
 			fd = (fn->type == REDIR_MERGEOUT) ? coprocout : coprocin;
 		    fil = dup(fd);
+		    fdtable[fil] = FDT_INTERNAL;
 		}
 		if (fil == -1) {
 		    char fdstr[4];
@@ -3150,9 +3151,10 @@ execcmd(Estate state, int input, int output, int how, int last1)
 			       O_WRONLY | O_APPEND | O_CREAT | O_NOCTTY, 0666);
 		else
 		    fil = clobber_open(fn);
-		if(fil != -1 && IS_ERROR_REDIR(fn->type))
+		if(fil != -1 && IS_ERROR_REDIR(fn->type)) {
 		    dfil = dup(fil);
-		else
+		    fdtable[dfil] = FDT_INTERNAL;
+		} else
 		    dfil = 0;
 		if (fil == -1 || dfil == -1) {
 		    if(fil != -1)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Multio deadlock (Re: multios doesn't work with 2>&1)
  2013-10-27 21:16           ` Peter Stephenson
@ 2013-10-27 22:31             ` Bart Schaefer
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2013-10-27 22:31 UTC (permalink / raw)
  To: Zsh Hackers' List

On Oct 27,  9:16pm, Peter Stephenson wrote:
} Subject: Re: Multio deadlock (Re: multios doesn't work with 2>&1)
}
} Er, because it doesn't mean that, FDT_INTERNAL means "users can't
} monkey with this".

See my other message RE asymmetry of movefd/redup.

} We need something to mean "open and users can monkey with it".

Except that obviously FDT_INTERNAL doesn't really prevent user monkeying,
because fdtable[0..2] spend a lot of time in FDT_INTERNAL state without
breaking any redirections.  So either users could in fact monkey with an
FDT_INTERNAL fd if they were able to guess the FD number, or the monkeying
is barred some other way.

In any case this patch finally does seem to resolve the deadlock in a
safe manner; it can't have been correct that those dup() calls could be
allowed to return an FD in 0..2 range.

I threw in the init.c bit just for completeness, as mentioned that state
is rapidly stomped on if any redirections of 0..2 are done.

diff --git a/Src/exec.c b/Src/exec.c
index 99c7eaa..df915e1 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -3123,7 +3123,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 		    int fd = fn->fd2;
 		    if(fd == -2)
 			fd = (fn->type == REDIR_MERGEOUT) ? coprocout : coprocin;
-		    fil = dup(fd);
+		    fil = movefd(dup(fd));
 		}
 		if (fil == -1) {
 		    char fdstr[4];
@@ -3151,7 +3151,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 		else
 		    fil = clobber_open(fn);
 		if(fil != -1 && IS_ERROR_REDIR(fn->type))
-		    dfil = dup(fil);
+		    dfil = movefd(dup(fil));
 		else
 		    dfil = 0;
 		if (fil == -1 || dfil == -1) {
diff --git a/Src/init.c b/Src/init.c
index 01a969d..7032ff8 100644
--- a/Src/init.c
+++ b/Src/init.c
@@ -1584,6 +1584,7 @@ zsh_main(UNUSED(int argc), char **argv)
 
     fdtable_size = zopenmax();
     fdtable = zshcalloc(fdtable_size*sizeof(*fdtable));
+    fdtable[0] = fdtable[1] = fdtable[2] = FDT_EXTERNAL;
 
     createoptiontable();
     emulate(zsh_name, 1, &emulation, opts);   /* initialises most options */


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-27 22:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20131027145917.GA5509@localhost.localdomain>
     [not found] ` <131027100137.ZM4100@torch.brasslantern.com>
2013-10-27 17:46   ` multios doesn't work with 2>&1 Peter Stephenson
2013-10-27 18:27     ` Bart Schaefer
2013-10-27 19:39       ` Multio deadlock (Re: multios doesn't work with 2>&1) Bart Schaefer
2013-10-27 20:24         ` Bart Schaefer
2013-10-27 20:33         ` Peter Stephenson
2013-10-27 21:16           ` Peter Stephenson
2013-10-27 22:31             ` Bart Schaefer
2013-10-27 22:18           ` Bart Schaefer
2013-10-27 18:11   ` multios doesn't work with 2>&1 Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).