* Subshell with multios causes hang @ 2007-05-22 11:21 John Buddery 2007-05-22 17:29 ` Peter Stephenson 0 siblings, 1 reply; 6+ messages in thread From: John Buddery @ 2007-05-22 11:21 UTC (permalink / raw) To: Zsh-Workers Hi, since upgrading from 2.4.5 to 2.4.6 I find that one of my functions which uses a multios redirect on a subshell list is hanging. I tried 4.3.4 as well with no luck. Essentially I run the equivalent of: ( echo hello ) >| /tmp/out >| /tmp/out2 and in an interactive shell (or any with job control) this hangs. Digging a little I find the change between 2.4.5 and 2.4.6 which causes this was the fix to clearjobtab() in jobs.c to make it actually clear the job table. What happens is this: entersubsh() calls clearjobtab() which clears the job table. Note thisjob == 1 at this point. The multios are applied, starting a subprocess in closemn(). closemn() registers the pid of the subprocess with addproc() This updates the auxprocs list for thisjob (still == 1). Note that the stat of thisjob is 0 (not in use), since the jobtab was cleared, so this seems wrong. execpline is called to run the subshell list, and calls initjob(). The new job is given number 1, since this is the first free slot. Note that the pid for the multios is still in the auxprocs list for job 1, this seems very wrong. When the echo has finished, execpline() proceeds to wait for the auxproc pid, since this is listed against the current job. This hangs, since the multios process is still reading the unclosed pipe. All of the following fixes solve this problem, but I don't know what else they break: Not clearing the job table in clearjobtab() - works, but just seems wrong, and a step backwards. Preserving the entry for "thisjob" in clearjobtab() - not much better, it might be a subjob and it's parent is no longer there, and it's pid lists might not be valid. Setting thisjob = -1 in clearjobtab(), since there is no current job, and making addproc() ignore the addition of aux processes if thisjob == -1. This also seems wrong, as we are completely loosing the pid information for the multios, so for example we can't kill it. Setting thisjob = 1 in clearjobtab (if it was >= 0), and setting jobtab[thisjob].stat = STAT_INUSE after clearing jobtab. This is what I ended up with, but is it a valid thing to do ? Thanks for any help, and for reading this stupidly long post... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Subshell with multios causes hang 2007-05-22 11:21 Subshell with multios causes hang John Buddery @ 2007-05-22 17:29 ` Peter Stephenson 2007-05-23 10:12 ` Peter Stephenson 0 siblings, 1 reply; 6+ messages in thread From: Peter Stephenson @ 2007-05-22 17:29 UTC (permalink / raw) To: Zsh-Workers On Tue, 22 May 2007 12:21:43 +0100 John Buddery <jvb@cyberscience.com> wrote: > Hi, since upgrading from 2.4.5 to 2.4.6 I find that one of my > functions which uses a multios redirect on a subshell list is > hanging. I tried 4.3.4 as well with no luck. > > Essentially I run the equivalent of: > > ( echo hello ) >| /tmp/out >| /tmp/out2 > > and in an interactive shell (or any with job control) this hangs. >... > All of the following fixes solve this problem, but I don't know what > else they break: >.. > Setting thisjob = -1 in clearjobtab(), since there is no current > job, and making addproc() ignore the addition of aux processes if > thisjob == -1. This also seems wrong, as we are completely loosing the > pid information for the multios, so for example we can't kill it. > > Setting thisjob = 1 in clearjobtab (if it was >= 0), and setting > jobtab[thisjob].stat = STAT_INUSE after clearing jobtab. This is what > I ended up with, but is it a valid thing to do ? >... Thanks for the detailed analysis, which will have saved me hours. There's clearly something of a design flaw here: we're using (an effect of) job control when no job control is present. However, the shell does use the so-called job table for this purpose (managing processes even if they're not strictly associated with a job), so we have to live with it. In that spirit what I'd *like* to suggest is something close to what you came up with: set thisjob to -1 in clearjobtab() (it's sure as heck invalid), and then when we need a job table entry in closemn(), detect that thisjob is -1 and initialise a new job. Problem 1: this happens before execpline() runs in the subshell, which grabs a different job table entry. The one generated by closemn() is forgotten. We can fix this by setting a temporary job number saying "use me! use me!". This isn't very nice but doesn't involve redesigning the shell from scratch. Problem 2: this is where it gets really nasty to the extent that I'm worried I must be missing something basic about multios. We now do the "echo" in the subshell, and on return to execpline() wait for the auxiliary process handling the multios to exit. But it's never going to! It's waiting for end-of-file on the data it's reading from the subshell that's waiting for it. Because we attached the multios process after the fork, we have deadlock. Wossgoingon? How do multios ever work? Is there some call to close the shell fd's (giving the EOF the aux proc is waiting for) that hasn't quite been handled at that point, but usually has? Possible clue: last1 is 1 in this version of execpline(), indicating we're about to leave the shell. The auxprocs are the only reason we can't. So there must be some solution... I'll carry on looking at this when I get a chance, but for now I'm confused enough to go to the beer festival. -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Subshell with multios causes hang 2007-05-22 17:29 ` Peter Stephenson @ 2007-05-23 10:12 ` Peter Stephenson 2007-05-23 11:25 ` John Buddery 2007-05-23 15:00 ` Bart Schaefer 0 siblings, 2 replies; 6+ messages in thread From: Peter Stephenson @ 2007-05-23 10:12 UTC (permalink / raw) To: Zsh-Workers On Tue, 22 May 2007 18:29:25 +0100 Peter Stephenson <pws@csr.com> wrote: > On Tue, 22 May 2007 12:21:43 +0100 > John Buddery <jvb@cyberscience.com> wrote: > > Essentially I run the equivalent of: > > > > ( echo hello ) >| /tmp/out >| /tmp/out2 > > > > and in an interactive shell (or any with job control) this hangs. >... > Wossgoingon? >... > I'll carry on looking at this when I get a chance, but for now I'm > confused enough to go to the beer festival. Success, I think. The answer started to come to me in the queue. John sent me another email; the basic problem seems to be the obscure interaction between forking, job control and multios in this case. execpline() is not the right place to handle this. The pipeline contains the contents of the subshell; after that, we return to execcmd() where we forked for the subshell, which is the appropriate level. At this point, we're done with the subshell, so we simply _exit(). That's where things need fixing up to ensure the multios are happy. So what I've done is: - When we clear the job table, initialise a fresh job. This is solely used for anything that happens to want a job in the subshell. This doesn't rely on any accidental hangovers from the main shell. The execpline() will still initialise its own new job, but as I argued above that's the right thing to do. execpline() saves and restores thisjob, so we get the value we want back in execcmd(). - When we _exit() after handling the subshell, close all fd's. This is bound be to be OK given that we're about to bail out. (The logic only applies if we have forked, obviously.) - Then wait for any processes in the current job. Usually there will be none, and this goes straight through. If there are multios we wait for those processes. This ensures that the multio processes are properly waited for. I don't think that ever happened before, even when this apparently worked (as John noted, largely by accident). This doesn't break any of the existing tests; I've added a new one. I don't think it's straightforward to set up the multios before we fork, since globbing is allowed in multios and that's done after forking. However, it's all so complicated that reasoning may be too simplistic (there are multiple forks involved). Index: Src/exec.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/exec.c,v retrieving revision 1.111 diff -u -r1.111 exec.c --- Src/exec.c 8 May 2007 10:02:59 -0000 1.111 +++ Src/exec.c 23 May 2007 10:08:49 -0000 @@ -1098,10 +1098,10 @@ child_block(); /* - * Get free entry in job table and initialize it. - * This is currently the only call to initjob(), so this - * is also the only place where we can expand the job table - * under us. + * Get free entry in job table and initialize it. This is currently + * the only call to initjob() (apart from a minor exception in + * clearjobtab()), so this is also the only place where we can + * expand the job table under us. */ if ((thisjob = newjob = initjob()) == -1) { child_unblock(); @@ -2816,8 +2816,38 @@ } err: - if (forked) + if (forked) { + /* + * So what's going on here then? Well, I'm glad you asked. + * + * If we create multios for use in a subshell we do + * this after forking, in this function above. That + * means that the current (sub)process is responsible + * for clearing them up. However, the processes won't + * go away until we have closed the fd's talking to them. + * Since we're about to exit the shell there's nothing + * to stop us closing all fd's (including the ones 0 to 9 + * that we usually leave alone). + * + * Then we wait for any processes. When we forked, + * we cleared the jobtable and started a new job just for + * any oddments like this, so if there aren't any we won't + * need to wait. The result of not waiting is that + * the multios haven't flushed the fd's properly, leading + * to obscure missing data. + * + * It would probably be cleaner to ensure that the + * parent shell handled multios, but that requires + * some architectural changes which are likely to be + * hairy. + */ + for (i = 0; i < 10; i++) + if (fdtable[i] != FDT_UNUSED) + close(i); + closem(FDT_UNUSED); + waitjobs(); _exit(lastval); + } fixfds(save); done: Index: Src/jobs.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/jobs.c,v retrieving revision 1.56 diff -u -r1.56 jobs.c --- Src/jobs.c 27 Mar 2007 15:16:57 -0000 1.56 +++ Src/jobs.c 23 May 2007 10:08:49 -0000 @@ -1296,6 +1296,15 @@ memset(jobtab, 0, jobtabsize * sizeof(struct job)); /* zero out table */ maxjob = 0; + + /* + * Although we don't have job control in subshells, we + * sometimes needs control structures for other purposes such + * as multios. Grab a job for this purpose; any will do + * since we've freed them all up (so there's no question + * of problems with the job table size here). + */ + thisjob = initjob(); } static int initnewjob(int i) Index: Test/E01options.ztst =================================================================== RCS file: /cvsroot/zsh/zsh/Test/E01options.ztst,v retrieving revision 1.17 diff -u -r1.17 E01options.ztst --- Test/E01options.ztst 1 Nov 2006 12:25:22 -0000 1.17 +++ Test/E01options.ztst 23 May 2007 10:08:49 -0000 @@ -655,6 +655,18 @@ >This is in1 >This is in2 +# This is trickier than it looks. There's a hack at the end of +# execcmd() to catch the multio processes attached to the +# subshell, which otherwise sort of get lost in the general turmoil. +# Without that, the multios aren't synchronous with the subshell +# or the main shell starting the "cat", so the output files appear +# empty. + setopt multios + ( echo hello ) >multio_out1 >multio_out2 && cat multio_out* +0:Multios attached to a subshell +>hello +>hello + # tried this with other things, but not on its own, so much. unsetopt nomatch print with nonomatch: flooble* -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Subshell with multios causes hang 2007-05-23 10:12 ` Peter Stephenson @ 2007-05-23 11:25 ` John Buddery 2007-05-23 11:50 ` Peter Stephenson 2007-05-23 15:00 ` Bart Schaefer 1 sibling, 1 reply; 6+ messages in thread From: John Buddery @ 2007-05-23 11:25 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh-Workers On Wed, 2007-05-23 at 11:12 +0100, Peter Stephenson wrote: > On Tue, 22 May 2007 18:29:25 +0100 > Peter Stephenson <pws@csr.com> wrote: > > On Tue, 22 May 2007 12:21:43 +0100 > > John Buddery <jvb@cyberscience.com> wrote: > > > Essentially I run the equivalent of: > > > > > > ( echo hello ) >| /tmp/out >| /tmp/out2 > > > > > > and in an interactive shell (or any with job control) this hangs. > >... > > Wossgoingon? > >... > > I'll carry on looking at this when I get a chance, but for now I'm > > confused enough to go to the beer festival. > > Success, I think. The answer started to come to me in the queue. John > sent me another email; the basic problem seems to be the obscure > interaction between forking, job control and multios in this case. > Excellent, works for me - thanks very much! If you're interested while you're dealing with multios, there was one other fault I noticed when digging around: with an input multios, there is a race condition after the fork in closemn() between the addproc() and the child process exiting. If the child multios exits (having sent it's output to the buffered pipe) before the addproc(), the shell hangs later when it tries to wait for the already-exited pid. This can be reproduced fairly reliably with the example you posted last year: % echo This is the file >file % fn() { cat; } % fn <$(echo file file) This is the file This is the file (hang at this point) One way to fix it would be to block the child signal until the child pid was registered (patch below). Admittedly it's a bit obscure, though - not that my last example wasn't... Thanks, John Index: Src/exec.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/exec.c,v retrieving revision 1.111 diff -u -r1.111 exec.c --- Src/exec.c 8 May 2007 10:02:59 -0000 1.111 +++ Src/exec.c 23 May 2007 11:09:53 -0000 @@ -1549,20 +1549,24 @@ pid_t pid; struct timeval bgtime; + child_block(); if ((pid = zfork(&bgtime))) { for (i = 0; i < mn->ct; i++) zclose(mn->fds[i]); zclose(mn->pipe); if (pid == -1) { mfds[fd] = NULL; + child_unblock(); return; } mn->ct = 1; mn->fds[0] = fd; addproc(pid, NULL, 1, &bgtime); + child_unblock(); return; } /* pid == 0 */ + child_unblock(); closeallelse(mn); if (mn->rflag) { /* tee process */ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Subshell with multios causes hang 2007-05-23 11:25 ` John Buddery @ 2007-05-23 11:50 ` Peter Stephenson 0 siblings, 0 replies; 6+ messages in thread From: Peter Stephenson @ 2007-05-23 11:50 UTC (permalink / raw) To: Zsh-Workers On Wed, 23 May 2007 12:25:13 +0100 John Buddery <jvb@cyberscience.com> wrote: > % echo This is the file >file > % fn() { cat; } > % fn <$(echo file file) > This is the file > This is the file > (hang at this point) > > One way to fix it would be to block the child signal until the child > pid was registered (patch below). That seems entirely reasonable and unproblematic. I've committed it (with a comment and the case above in another test). Sorry about the wrapping in the previous post---I've tracked down claws-mail's autowrap feature and turned it off (though these things have a habit of coming back from the dead...) -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Subshell with multios causes hang 2007-05-23 10:12 ` Peter Stephenson 2007-05-23 11:25 ` John Buddery @ 2007-05-23 15:00 ` Bart Schaefer 1 sibling, 0 replies; 6+ messages in thread From: Bart Schaefer @ 2007-05-23 15:00 UTC (permalink / raw) To: Zsh-Workers On May 23, 11:12am, Peter Stephenson wrote: } } So what I've done is: This all sounds sane to me. I also agree with John's child blocking tweak for the race condition. } I don't think it's straightforward to set up the multios before we } fork, since globbing is allowed in multios and that's done after } forking. I don't think it's workable to push this up a level in any case. You can always construct an example where "up a level" is still a subshell and the problem returns. To try to have the topmost shell "lift" all the globs and multios out of the subshells would introduce sequencing problems with complex commands that expect to be able to create a file and then glob its name or redirect from it. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-05-23 15:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-05-22 11:21 Subshell with multios causes hang John Buddery 2007-05-22 17:29 ` Peter Stephenson 2007-05-23 10:12 ` Peter Stephenson 2007-05-23 11:25 ` John Buddery 2007-05-23 11:50 ` Peter Stephenson 2007-05-23 15:00 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).