zsh-workers
 help / color / mirror / code / Atom feed
* Subshell with multios causes hang
@ 2007-05-22 11:21 John Buddery
  2007-05-22 17:29 ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: John Buddery @ 2007-05-22 11:21 UTC (permalink / raw)
  To: Zsh-Workers

Hi, since upgrading from 2.4.5 to 2.4.6 I find that one of my functions
which uses a multios redirect on a subshell list is hanging. I tried
4.3.4 as well with no luck.

Essentially I run the equivalent of:

   ( echo hello ) >| /tmp/out >| /tmp/out2

and in an interactive shell (or any with job control) this hangs.

Digging a little I find the change between 2.4.5 and 2.4.6 which causes
this was the fix to clearjobtab() in jobs.c to make it actually clear
the job table. What happens is this:

    entersubsh() calls clearjobtab() which clears the job table.
        Note thisjob == 1 at this point.

    The multios are applied, starting a subprocess in closemn().

    closemn() registers the pid of the subprocess with addproc()
       This updates the auxprocs list for thisjob (still == 1).
       Note that the stat of thisjob is 0 (not in use), since the jobtab
       was cleared, so this seems wrong.

    execpline is called to run the subshell list, and calls initjob().
       The new job is given number 1, since this is the first free slot.
       Note that the pid for the multios is still in the auxprocs list
       for job 1, this seems very wrong.

    When the echo has finished, execpline() proceeds to wait for the
    auxproc pid, since this is listed against the current job. This
    hangs, since the multios process is still reading the unclosed
    pipe.

All of the following fixes solve this problem, but I don't know what
else they break:

    Not clearing the job table in clearjobtab() - works, but just seems
wrong, and a step backwards.

    Preserving the entry for "thisjob" in clearjobtab() - not much
better, it might be a subjob and it's parent is no longer there, and
it's pid lists might not be valid.

    Setting thisjob = -1 in clearjobtab(), since there is no current
job, and making addproc() ignore the addition of aux processes if
thisjob == -1. This also seems wrong, as we are completely loosing the
pid information for the multios, so for example we can't kill it.

    Setting thisjob = 1 in clearjobtab (if it was >= 0), and setting
jobtab[thisjob].stat = STAT_INUSE after clearing jobtab. This is what I
ended up with, but is it a valid thing to do ?

Thanks for any help, and for reading this stupidly long post...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Subshell with multios causes hang
  2007-05-22 11:21 Subshell with multios causes hang John Buddery
@ 2007-05-22 17:29 ` Peter Stephenson
  2007-05-23 10:12   ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2007-05-22 17:29 UTC (permalink / raw)
  To: Zsh-Workers

On Tue, 22 May 2007 12:21:43 +0100
John Buddery <jvb@cyberscience.com> wrote:
> Hi, since upgrading from 2.4.5 to 2.4.6 I find that one of my
> functions which uses a multios redirect on a subshell list is
> hanging. I tried 4.3.4 as well with no luck.
> 
> Essentially I run the equivalent of:
> 
>    ( echo hello ) >| /tmp/out >| /tmp/out2
> 
> and in an interactive shell (or any with job control) this hangs.
>...
> All of the following fixes solve this problem, but I don't know what
> else they break:
>..
>     Setting thisjob = -1 in clearjobtab(), since there is no current
> job, and making addproc() ignore the addition of aux processes if
> thisjob == -1. This also seems wrong, as we are completely loosing the
> pid information for the multios, so for example we can't kill it.
> 
>     Setting thisjob = 1 in clearjobtab (if it was >= 0), and setting
> jobtab[thisjob].stat = STAT_INUSE after clearing jobtab. This is what
> I ended up with, but is it a valid thing to do ?
>...

Thanks for the detailed analysis, which will have saved me hours.

There's clearly something of a design flaw here: we're using (an effect
of) job control when no job control is present.  However, the shell
does use the so-called job table for this purpose (managing processes
even if they're not strictly associated with a job), so we have to live
with it.

In that spirit what I'd *like* to suggest is something close to what you
came up with: set thisjob to -1 in clearjobtab() (it's sure as heck
invalid), and then when we need a job table entry in closemn(), detect
that thisjob is -1 and initialise a new job.

Problem 1: this happens before execpline() runs in the subshell, which
grabs a different job table entry.  The one generated by closemn() is
forgotten.  We can fix this by setting a temporary job number saying
"use me! use me!".  This isn't very nice but doesn't involve
redesigning the shell from scratch.

Problem 2: this is where it gets really nasty to the extent that I'm
worried I must be missing something basic about multios.  We now do the
"echo" in the subshell, and on return to execpline() wait for the
auxiliary process handling the multios to exit.  But it's never going
to!  It's waiting for end-of-file on the data it's reading from the
subshell that's waiting for it.  Because we attached the multios
process after the fork, we have deadlock.

Wossgoingon?  How do multios ever work?  Is there some call to close the
shell fd's (giving the EOF the aux proc is waiting for) that hasn't
quite been handled at that point, but usually has?

Possible clue: last1 is 1 in this version of execpline(), indicating
we're about to leave the shell.  The auxprocs are the only reason we
can't.  So there must be some solution...

I'll carry on looking at this when I get a chance, but for now I'm
confused enough to go to the beer festival.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Subshell with multios causes hang
  2007-05-22 17:29 ` Peter Stephenson
@ 2007-05-23 10:12   ` Peter Stephenson
  2007-05-23 11:25     ` John Buddery
  2007-05-23 15:00     ` Bart Schaefer
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Stephenson @ 2007-05-23 10:12 UTC (permalink / raw)
  To: Zsh-Workers

On Tue, 22 May 2007 18:29:25 +0100
Peter Stephenson <pws@csr.com> wrote:
> On Tue, 22 May 2007 12:21:43 +0100
> John Buddery <jvb@cyberscience.com> wrote:
> > Essentially I run the equivalent of:
> > 
> >    ( echo hello ) >| /tmp/out >| /tmp/out2
> > 
> > and in an interactive shell (or any with job control) this hangs.
>...
> Wossgoingon?
>...
> I'll carry on looking at this when I get a chance, but for now I'm
> confused enough to go to the beer festival.

Success, I think.  The answer started to come to me in the queue.  John
sent me another email; the basic problem seems to be the obscure
interaction between forking, job control and multios in this case.

execpline() is not the right place to handle this.  The pipeline
contains the contents of the subshell; after that, we return to
execcmd() where we forked for the subshell, which is the appropriate
level.  At this point, we're done with the subshell, so we simply
_exit().  That's where things need fixing up to ensure the multios are
happy.

So what I've done is:

- When we clear the job table, initialise a fresh job.  This is
solely used for anything that happens to want a job in the subshell.
This doesn't rely on any accidental hangovers from the main shell.
The execpline() will still initialise its own new job, but as
I argued above that's the right thing to do.  execpline() saves
and restores thisjob, so we get the value we want back in execcmd().

- When we _exit() after handling the subshell, close all fd's.  This is
bound be to be OK given that we're about to bail out.  (The logic only
applies if we have forked, obviously.)

- Then wait for any processes in the current job.  Usually there will be
none, and this goes straight through.  If there are multios we wait for
those processes.

This ensures that the multio processes are properly waited for.  I don't
think that ever happened before, even when this apparently worked (as
John noted, largely by accident).

This doesn't break any of the existing tests; I've added a new one.

I don't think it's straightforward to set up the multios before we fork,
since globbing is allowed in multios and that's done after forking.
However, it's all so complicated that reasoning may be too simplistic
(there are multiple forks involved).

Index: Src/exec.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/exec.c,v
retrieving revision 1.111
diff -u -r1.111 exec.c
--- Src/exec.c	8 May 2007 10:02:59 -0000	1.111
+++ Src/exec.c	23 May 2007 10:08:49 -0000
@@ -1098,10 +1098,10 @@
     child_block();
 
     /*
-     * Get free entry in job table and initialize it.
-     * This is currently the only call to initjob(), so this
-     * is also the only place where we can expand the job table
-     * under us.
+     * Get free entry in job table and initialize it.  This is
currently
+     * the only call to initjob() (apart from a minor exception in
+     * clearjobtab()), so this is also the only place where we can
+     * expand the job table under us.
      */
     if ((thisjob = newjob = initjob()) == -1) {
 	child_unblock();
@@ -2816,8 +2816,38 @@
     }
 
   err:
-    if (forked)
+    if (forked) {
+	/*
+	 * So what's going on here then?  Well, I'm glad you asked.
+	 *
+	 * If we create multios for use in a subshell we do
+	 * this after forking, in this function above.  That
+	 * means that the current (sub)process is responsible
+	 * for clearing them up.  However, the processes won't
+	 * go away until we have closed the fd's talking to them.
+	 * Since we're about to exit the shell there's nothing
+	 * to stop us closing all fd's (including the ones 0 to 9
+	 * that we usually leave alone).
+	 *
+	 * Then we wait for any processes.  When we forked,
+	 * we cleared the jobtable and started a new job just for
+	 * any oddments like this, so if there aren't any we won't
+	 * need to wait.  The result of not waiting is that
+	 * the multios haven't flushed the fd's properly, leading
+	 * to obscure missing data.
+	 *
+	 * It would probably be cleaner to ensure that the
+	 * parent shell handled multios, but that requires
+	 * some architectural changes which are likely to be
+	 * hairy.
+	 */
+	for (i = 0; i < 10; i++)
+	    if (fdtable[i] != FDT_UNUSED)
+		close(i);
+	closem(FDT_UNUSED);
+	waitjobs();
 	_exit(lastval);
+    }
     fixfds(save);
 
  done:
Index: Src/jobs.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/jobs.c,v
retrieving revision 1.56
diff -u -r1.56 jobs.c
--- Src/jobs.c	27 Mar 2007 15:16:57 -0000	1.56
+++ Src/jobs.c	23 May 2007 10:08:49 -0000
@@ -1296,6 +1296,15 @@
 
     memset(jobtab, 0, jobtabsize * sizeof(struct job)); /* zero out
table */ maxjob = 0;
+
+    /*
+     * Although we don't have job control in subshells, we
+     * sometimes needs control structures for other purposes such
+     * as multios.  Grab a job for this purpose; any will do
+     * since we've freed them all up (so there's no question
+     * of problems with the job table size here).
+     */
+    thisjob = initjob();
 }
 
 static int initnewjob(int i)
Index: Test/E01options.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/E01options.ztst,v
retrieving revision 1.17
diff -u -r1.17 E01options.ztst
--- Test/E01options.ztst	1 Nov 2006 12:25:22 -0000	1.17
+++ Test/E01options.ztst	23 May 2007 10:08:49 -0000
@@ -655,6 +655,18 @@
 >This is in1
 >This is in2
 
+# This is trickier than it looks.  There's a hack at the end of
+# execcmd() to catch the multio processes attached to the
+# subshell, which otherwise sort of get lost in the general turmoil.
+# Without that, the multios aren't synchronous with the subshell
+# or the main shell starting the "cat", so the output files appear
+# empty.
+  setopt multios
+  ( echo hello ) >multio_out1 >multio_out2 && cat multio_out*
+0:Multios attached to a subshell
+>hello
+>hello
+
 # tried this with other things, but not on its own, so much.
   unsetopt nomatch
   print with nonomatch: flooble*

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Subshell with multios causes hang
  2007-05-23 10:12   ` Peter Stephenson
@ 2007-05-23 11:25     ` John Buddery
  2007-05-23 11:50       ` Peter Stephenson
  2007-05-23 15:00     ` Bart Schaefer
  1 sibling, 1 reply; 6+ messages in thread
From: John Buddery @ 2007-05-23 11:25 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh-Workers

On Wed, 2007-05-23 at 11:12 +0100, Peter Stephenson wrote:
> On Tue, 22 May 2007 18:29:25 +0100
> Peter Stephenson <pws@csr.com> wrote:
> > On Tue, 22 May 2007 12:21:43 +0100
> > John Buddery <jvb@cyberscience.com> wrote:
> > > Essentially I run the equivalent of:
> > > 
> > >    ( echo hello ) >| /tmp/out >| /tmp/out2
> > > 
> > > and in an interactive shell (or any with job control) this hangs.
> >...
> > Wossgoingon?
> >...
> > I'll carry on looking at this when I get a chance, but for now I'm
> > confused enough to go to the beer festival.
> 
> Success, I think.  The answer started to come to me in the queue.  John
> sent me another email; the basic problem seems to be the obscure
> interaction between forking, job control and multios in this case.
> 

Excellent, works for me - thanks very much!

If you're interested while you're dealing with multios, there was one
other fault I noticed when digging around: with an input multios, there
is a race condition after the fork in closemn() between the addproc()
and the child process exiting. If the child multios exits (having sent
it's output to the buffered pipe) before the addproc(), the shell hangs
later when it tries to wait for the already-exited pid.

This can be reproduced fairly reliably with the example you posted last
year:

% echo This is the file >file
% fn() { cat; }
% fn <$(echo file file)
This is the file
This is the file
(hang at this point)

One way to fix it would be to block the child signal until the child pid was registered (patch below).
Admittedly it's a bit obscure, though - not that my last example wasn't...

Thanks,

John

Index: Src/exec.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/exec.c,v
retrieving revision 1.111
diff -u -r1.111 exec.c
--- Src/exec.c  8 May 2007 10:02:59 -0000       1.111
+++ Src/exec.c  23 May 2007 11:09:53 -0000
@@ -1549,20 +1549,24 @@
        pid_t pid;
        struct timeval bgtime;
 
+        child_block();
        if ((pid = zfork(&bgtime))) {
            for (i = 0; i < mn->ct; i++)
                zclose(mn->fds[i]);
            zclose(mn->pipe);
            if (pid == -1) { 
                mfds[fd] = NULL;
+                child_unblock();
                return;
            }
            mn->ct = 1;
            mn->fds[0] = fd;
            addproc(pid, NULL, 1, &bgtime);
+            child_unblock();
            return;
        }
        /* pid == 0 */
+        child_unblock();
        closeallelse(mn);
        if (mn->rflag) {
            /* tee process */




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Subshell with multios causes hang
  2007-05-23 11:25     ` John Buddery
@ 2007-05-23 11:50       ` Peter Stephenson
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Stephenson @ 2007-05-23 11:50 UTC (permalink / raw)
  To: Zsh-Workers

On Wed, 23 May 2007 12:25:13 +0100
John Buddery <jvb@cyberscience.com> wrote:
> % echo This is the file >file
> % fn() { cat; }
> % fn <$(echo file file)
> This is the file
> This is the file
> (hang at this point)
> 
> One way to fix it would be to block the child signal until the child
> pid was registered (patch below).

That seems entirely reasonable and unproblematic.  I've committed it (with
a comment and the case above in another test).

Sorry about the wrapping in the previous post---I've tracked down
claws-mail's autowrap feature and turned it off (though these things have
a habit of coming back from the dead...)

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Subshell with multios causes hang
  2007-05-23 10:12   ` Peter Stephenson
  2007-05-23 11:25     ` John Buddery
@ 2007-05-23 15:00     ` Bart Schaefer
  1 sibling, 0 replies; 6+ messages in thread
From: Bart Schaefer @ 2007-05-23 15:00 UTC (permalink / raw)
  To: Zsh-Workers

On May 23, 11:12am, Peter Stephenson wrote:
}
} So what I've done is:

This all sounds sane to me.

I also agree with John's child blocking tweak for the race condition.

} I don't think it's straightforward to set up the multios before we
} fork, since globbing is allowed in multios and that's done after
} forking.

I don't think it's workable to push this up a level in any case. You
can always construct an example where "up a level" is still a subshell
and the problem returns.  To try to have the topmost shell "lift" all
the globs and multios out of the subshells would introduce sequencing
problems with complex commands that expect to be able to create a file
and then glob its name or redirect from it.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-05-23 15:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-22 11:21 Subshell with multios causes hang John Buddery
2007-05-22 17:29 ` Peter Stephenson
2007-05-23 10:12   ` Peter Stephenson
2007-05-23 11:25     ` John Buddery
2007-05-23 11:50       ` Peter Stephenson
2007-05-23 15:00     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).