* How to misplace an entire pipeline @ 2011-08-06 3:31 Bart Schaefer 2011-08-07 17:50 ` Peter Stephenson 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2011-08-06 3:31 UTC (permalink / raw) To: zsh-workers Suspend a pipeline whose tail is a builtin that has already exited, and the whole job gets lost, orphaning the left hand side: torch% print $ZSH_VERSION $ZSH_PATCHLEVEL 4.3.12-dev-1 1.5412 torch% sleep 10 | true torch% jobs torch% ps ax | grep sleep 32474 pts/2 T 0:00 sleep 10 32480 pts/2 S+ 0:00 grep sleep torch% Worse, suspend a pipeline whose tail is a builtin that blocks on I/O: torch% sleep 10 | sleep 20 | read The builtin does not get suspended, so the shell is stuck; fortunately in this case one can interrupt the builtin, again leaving the left side in limbo: torch% ps ax | grep sleep 32493 pts/2 T 0:00 sleep 10 32494 pts/2 T 0:00 sleep 20 32501 pts/2 S+ 0:00 grep sleep This used to work, sort of: torch% print $ZSH_VERSION $ZSH_PATCHLEVEL 4.2.0 torch% sleep 10 | sleep 20 | true zsh: suspended sleep 10 | sleep 20 torch% jobs [1] + suspended sleep 10 | sleep 20 torch% Note that the builtin is gone but the rest of the pipeline remains a job. -- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-06 3:31 How to misplace an entire pipeline Bart Schaefer @ 2011-08-07 17:50 ` Peter Stephenson 2011-08-07 21:43 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Peter Stephenson @ 2011-08-07 17:50 UTC (permalink / raw) To: zsh-workers On Fri, 05 Aug 2011 20:31:09 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > Suspend a pipeline whose tail is a builtin that has already exited, and > the whole job gets lost, orphaning the left hand side: This sounds like the extra-special code that takes account of the fact that the right hand side of a pipline is run in the current job but allows you by weird tricks to treat it as a separate job anyway. The suspension is supposed to cause a fork, so you forfeit the ability to have the result within the current shell in return for being able to put the job into the background. What it's supposed to do if the part executing in the current shell is already finished I have no idea; presumably then it has no need to fork, though whether there's a special case for the special case is another question. This is code I don't go near; I'm not even convinced in this case that Sven ever completely understood it and I remember trying it years ago and being puzzled by the results. Search for the quotation from the Nibelungenlied. -- Peter Stephenson <p.w.stephenson@ntlworld.com> Web page now at http://homepage.ntlworld.com/p.w.stephenson/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-07 17:50 ` Peter Stephenson @ 2011-08-07 21:43 ` Bart Schaefer 2011-08-08 4:05 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2011-08-07 21:43 UTC (permalink / raw) To: zsh-workers On Aug 7, 6:50pm, Peter Stephenson wrote: } } This is code I don't go near Obviously somebody went near it at some point not as long ago as Sven, because it behaved differently in 4.2.x. Zsh does still know about the pipeline, because if you explicitly wait for the subjobs by PID it does block rather than saying "not a child of this shell". findproc() is able to see them (so it's not related to the recent go-round with that.) (By the way, arguably "wait" ought to continue the job if it's stopped, which it does if you wait for it by job number but does not if you wait for it by PID.) Starting from torch% sleep 10 | sleep 20 | true Then hit ^Z, then torch% ps ax | grep slee 27868 pts/2 T 0:00 sleep 10 27869 pts/2 T 0:00 sleep 20 27873 pts/2 S+ 0:00 grep slee torch% wait 27868 Here we are in bin_fg right after the findproc(): (gdb) p *j $4 = {gleader = 27868, other = 0, stat = 1139, pwd = 0x0, procs = 0x9f83070, auxprocs = 0x0, filelist = 0x0, stty_in_env = 0, ty = 0x0} (gdb) p *p $5 = {next = 0x9f83128, pid = 27868, text = "sleep 10", '\0' <repeats 71 times>, status = 5247, ti = {ru_utime = { tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 1999}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 242, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 3, ru_nivcsw = 0}, bgtime = {tv_sec = 1312751280, tv_usec = 828721}, endtime = {tv_sec = 1312751282, tv_usec = 474423}} If I wait for the "sleep 20" instead: (gdb) p *j $2 = {gleader = 27868, other = 0, stat = 1139, pwd = 0x0, procs = 0x9f83070, auxprocs = 0x0, filelist = 0x0, stty_in_env = 0, ty = 0x0} (gdb) p *p $3 = {next = 0x0, pid = 27869, text = "sleep 20", '\0' <repeats 71 times>, status = 5247, ti = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = { tv_sec = 0, tv_usec = 999}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 242, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 3, ru_nivcsw = 1}, bgtime = { tv_sec = 1312751280, tv_usec = 829868}, endtime = {tv_sec = 1312751282, tv_usec = 474440}} Notice that at some point "sleep 10" was made the process group leader for the whole pipeline. Now, here's an interesting tidbit: torch% jobs %?foo jobs: job not found: ?foo torch% jobs %?sleep jobs: %?sleep: no such job Note the difference? The latter message means that getjob() found the pipeline, but either it's _not_ STAT_INUSE or it _is_ STAT_NOPRINT. So I think what we have here is a simple failure to communicate. Unfortunately I can't chase this any further this afternoon. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-07 21:43 ` Bart Schaefer @ 2011-08-08 4:05 ` Bart Schaefer 2011-08-08 18:27 ` Peter Stephenson 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2011-08-08 4:05 UTC (permalink / raw) To: zsh-workers Two patches inline with other discussion below. On Aug 7, 2:43pm, Bart Schaefer wrote: } } (By the way, arguably "wait" ought to continue the job if it's stopped, } which it does if you wait for it by job number but does not if you wait } for it by PID.) Patch for that follows; I'm not sure the makerunning() is correct: --- ../zsh-forge/current/Src/jobs.c 2011-06-14 19:54:57.000000000 -0700 +++ Src/jobs.c 2011-08-07 18:47:58.000000000 -0700 @@ -1933,12 +1933,19 @@ Process p; if (findproc(pid, &j, &p, 0)) { - /* - * returns 0 for normal exit, else signal+128 - * in which case we should return that status. - */ - retval = waitforpid(pid, 1); - if (!retval) + if (j->stat & STAT_STOPPED) { + retval = (killjb(j, SIGCONT) != 0); + if (retval == 0) + makerunning(j); + } + if (retval == 0) { + /* + * returns 0 for normal exit, else signal+128 + * in which case we should return that status. + */ + retval = waitforpid(pid, 1); + } + if (retval == 0) retval = lastval2; } else if (isset(POSIXJOBS) && pid == lastpid && lastpid_status >= 0L) { } Now, here's an interesting tidbit: } } torch% jobs %?foo } jobs: job not found: ?foo } torch% jobs %?sleep } jobs: %?sleep: no such job } } Note the difference? The latter message means that getjob() found the } pipeline, but either it's _not_ STAT_INUSE or it _is_ STAT_NOPRINT. So } I think what we have here is a simple failure to communicate. The following is clearly not a complete fix and maybe is even wrong if a different problem is fixed elsewhere, but this at least allows the suspended pipeline to be manipulated with jobs/fg/bg/wait. --- ../zsh-forge/current/Src/exec.c 2011-07-27 01:13:48.000000000 -0700 +++ Src/exec.c 2011-08-07 19:07:59.000000000 -0700 @@ -2845,7 +2845,9 @@ /* This is a current shell procedure that didn't need to fork. * * This includes current shell procedures that are being exec'ed, * * as well as null execs. */ - jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT; + jobtab[thisjob].stat |= STAT_CURSH; + if (!jobtab[thisjob].procs) + jobtab[thisjob].stat |= STAT_NOPRINT; } else { /* This is an exec (real or fake) for an external command. * * Note that any form of exec means that the subshell is fake * Even with this patch, "... | read" followed by ^Z results in "read" left waiting for a pipeline that has been stopped. What seems to be going on here is that a job table entry was created for the pipeline and the forked-off left side made the group leader, but then that job table entry is overloaded to represent the current shell builtin that is also running as the right-hand-side. When the terminal driver generates a TSTP it hits the forked-off left side (via the group leader). This comes through zhandler() into update_job() which does notice that the job is STAT_CURSH, but in this case list_pipe == 0 because "read" is an ordinary builtin, not a loop or other compound construct; so STAT_CURSH is ignored, and the group leader is attached to the tty (I believe that's a no-op as it already is attached). The other side-effect of this snippet: if (jn->stat & STAT_CURSH) inforeground = 1; else if (job == thisjob) { lastval = val; inforeground = 2; } is that inforeground != 2, so at the end of update_job() when checking to see if the current shell should pretend to have been signaled, the test fails; but that doesn't matter for TSTP because only INT and QUIT are handled specially. When "read" is the tail of the pipe, the above all happens behind the scenes and then the I/O system call gets restarted, which is how the shell ends up stuck. I'm not sure how to escape from that, except maybe to have zhandler() kill the shell with a different signal from which the system call will not recover. When something like "true" is the tail of the pipe, we return into execpline at line 1500 (from waitjobs()), where list_pipe_job is set but list_pipe and list_pipe_child are not. If all three were nonzero, a dummy shell would be forked off as PWS described to act as the suspended job, but instead execpline() simply returns because the last job in the pipeline has exited. The only obvious thing I can think to do here is to note in zhandler() that we have STAT_CURSH but not list_pipe, and therefore SIGCONT the left-hand-side immediately and return as if no signal had occurred (possibly printing a warning about not being able to suspend the job, which is what happens elsewhere if pipe() or fork() fails). However, that could lead to a serious busy-loop if somehow TTIN or TTOU was the signal instead of TSTP. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-08 4:05 ` Bart Schaefer @ 2011-08-08 18:27 ` Peter Stephenson 2011-08-09 6:10 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Peter Stephenson @ 2011-08-08 18:27 UTC (permalink / raw) To: zsh-workers On Sun, 07 Aug 2011 21:05:07 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > The following is clearly not a complete fix and maybe is even wrong if > a different problem is fixed elsewhere, but this at least allows the > suspended pipeline to be manipulated with jobs/fg/bg/wait. > > --- ../zsh-forge/current/Src/exec.c 2011-07-27 01:13:48.000000000 -0700 > +++ Src/exec.c 2011-08-07 19:07:59.000000000 -0700 > @@ -2845,7 +2845,9 @@ > /* This is a current shell procedure that didn't need to fork. * > * This includes current shell procedures that are being exec'ed, * > * as well as null execs. */ > - jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT; > + jobtab[thisjob].stat |= STAT_CURSH; > + if (!jobtab[thisjob].procs) > + jobtab[thisjob].stat |= STAT_NOPRINT; > } else { > /* This is an exec (real or fake) for an external command. * > * Note that any form of exec means that the subshell is fake * Looks fairly plausible, anyway. > When "read" is the tail of the pipe, the above all happens behind the > scenes and then the I/O system call gets restarted, which is how the > shell ends up stuck. I'm not sure how to escape from that, except > maybe to have zhandler() kill the shell with a different signal from > which the system call will not recover. I suppose so. I can hardly believe this ever worked. > When something like "true" is the tail of the pipe, we return into > execpline at line 1500 (from waitjobs()), where list_pipe_job is set > but list_pipe and list_pipe_child are not. If all three were nonzero, > a dummy shell would be forked off as PWS described to act as the > suspended job, but instead execpline() simply returns because the > last job in the pipeline has exited. > > The only obvious thing I can think to do here is to note in zhandler() > that we have STAT_CURSH but not list_pipe, and therefore SIGCONT the > left-hand-side immediately and return as if no signal had occurred > (possibly printing a warning about not being able to suspend the job, > which is what happens elsewhere if pipe() or fork() fails). However, > that could lead to a serious busy-loop if somehow TTIN or TTOU was > the signal instead of TSTP. But we can test if it's TSTP (or STOP)? There are so many special cases here --- there's one for each state the current shell might be in during the process, each of which would be handled straighforwardly if the processing was in another shell --- that even if it sometimes worked before it's not that surprising if it's sensitive to knock-on effects. I suppose it would be nice to be able to test the things that seem to be working, at least, which would need some care and probably a bit of extra test framework. -- Peter Stephenson <p.w.stephenson@ntlworld.com> Web page now at http://homepage.ntlworld.com/p.w.stephenson/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-08 18:27 ` Peter Stephenson @ 2011-08-09 6:10 ` Bart Schaefer 2011-08-09 20:19 ` Peter Stephenson 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2011-08-09 6:10 UTC (permalink / raw) To: zsh-workers On Aug 8, 7:27pm, Peter Stephenson wrote: } Subject: Re: How to misplace an entire pipeline } } On Sun, 07 Aug 2011 21:05:07 -0700 } Bart Schaefer <schaefer@brasslantern.com> wrote: } > - jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT; } > + jobtab[thisjob].stat |= STAT_CURSH; } > + if (!jobtab[thisjob].procs) } > + jobtab[thisjob].stat |= STAT_NOPRINT; } } Looks fairly plausible, anyway. Should that be "if (hasprocs(thisjob))" instead, do you think? } > When "read" is the tail of the pipe, the above all happens behind the } > scenes and then the I/O system call gets restarted, which is how the } > shell ends up stuck. I'm not sure how to escape from that, except } > maybe to have zhandler() kill the shell with a different signal from } > which the system call will not recover. } } I suppose so. I can hardly believe this ever worked. It didn't, exactly. This particular one behaves almost the same in 4.3.9 as it does in recent dev releases. The main difference is that in 4.3.9 the ^Z echoes to the terminal; in 4.3.12 there's no feedback at all. I haven't figured out how/why that changed. } > The only obvious thing I can think to do here is to note in zhandler() } > that we have STAT_CURSH but not list_pipe, and therefore SIGCONT the } > left-hand-side immediately and return as if no signal had occurred } > (possibly printing a warning about not being able to suspend the job, } > which is what happens elsewhere if pipe() or fork() fails). However, } > that could lead to a serious busy-loop if somehow TTIN or TTOU was } > the signal instead of TSTP. } } But we can test if it's TSTP (or STOP)? Yes, when WSTOPSIG() is defined. The parent shell never gets the original signal, it only gets the SIGCHLD and can then examine the return value from wait3() or whatever system call was available. SIGSTOP can't be caught or blocked so in that case it's likely the parent shell has also stopped. I'm not sure it'd be good behavior for the shell to auto-resume a process that was hit with STOP, even if it could. So, some more about what's going on ... Near the end of execpline2() there's this snippet: /* if another execpline() is invoked because the command is * * a list it must know that we're already in a pipeline */ cmdpush(CS_PIPE); list_pipe = 1; execpline2(state, *state->pc++, how, pipes[0], output, last1); list_pipe = old_list_pipe; cmdpop(); Somewhere in the call stack beyond this execpline2(), execbuiltin() is called. If that returns immediately we climb out of execpline2() and clear list_pipe and wait for the pipeline at execpline() line 1500. However, in the case of "read", execbuiltin() blocks on the I/O, so list_pipe remains true, and when the TSTP is received, update_job() proceeds as if execpline() is going to fork. So the following hack does the right thing in the case of piping to true where execbuiltin() has returned, but does not do the right thing in the case of read. What needs to be tested in place of (!list_pipe) to determine that the tail of the current pipeline is a simple shell builtin? In fact even that may not be enough, maybe this needs to know if the current job is a simple shell builtin -- it might be blocked on a "while read; do ..." or on a "wait" in the middle of a loop, etc. --- ../zsh-forge/current/Src/signals.c 2011-08-06 11:13:23.000000000 -0700 +++ Src/signals.c 2011-08-08 22:28:11.000000000 -0700 @@ -490,14 +490,20 @@ * update it. */ if (findproc(pid, &jn, &pn, 0)) { + if (!list_pipe && (jn->stat & STAT_CURSH) && + WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) { + killjb(jn, SIGCONT); + zwarn("job can't be suspended"); + } else { #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE) - struct timezone dummy_tz; - gettimeofday(&pn->endtime, &dummy_tz); - pn->status = status; - pn->ti = ru; + struct timezone dummy_tz; + gettimeofday(&pn->endtime, &dummy_tz); + pn->status = status; + pn->ti = ru; #else - update_process(pn, status); + update_process(pn, status); #endif + } update_job(jn); } else if (findproc(pid, &jn, &pn, 1)) { pn->status = status; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-09 6:10 ` Bart Schaefer @ 2011-08-09 20:19 ` Peter Stephenson 2011-08-13 18:52 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Peter Stephenson @ 2011-08-09 20:19 UTC (permalink / raw) To: zsh-workers On Mon, 08 Aug 2011 23:10:32 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > On Aug 8, 7:27pm, Peter Stephenson wrote: > } Subject: Re: How to misplace an entire pipeline > } > } On Sun, 07 Aug 2011 21:05:07 -0700 > } Bart Schaefer <schaefer@brasslantern.com> wrote: > } > - jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT; > } > + jobtab[thisjob].stat |= STAT_CURSH; > } > + if (!jobtab[thisjob].procs) > } > + jobtab[thisjob].stat |= STAT_NOPRINT; > } > } Looks fairly plausible, anyway. > > Should that be "if (hasprocs(thisjob))" instead, do you think? To a first approximation, the user isn't directly interested in the aux procs, but there might still be some knock-on effect. So until we find otherwise, I would guess not. > So the following hack does the right thing in the case of piping to > true where execbuiltin() has returned, but does not do the right thing > in the case of read. What needs to be tested in place of (!list_pipe) > to determine that the tail of the current pipeline is a simple shell > builtin? I don't think we've remembered that fact, but it's available (as is_builtin) in execcmd(), so could be stored before the builtin is called. What I don't know about is where to put that information (is it as simple as using thisjob?), and what might invalidate it. > In fact even that may not be enough, maybe this needs to know if the > current job is a simple shell builtin -- it might be blocked on a > "while read; do ..." or on a "wait" in the middle of a loop, etc. I suppose it depends on what cases the list_pipe logic would be triggered. -- Peter Stephenson <p.w.stephenson@ntlworld.com> Web page now at http://homepage.ntlworld.com/p.w.stephenson/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-09 20:19 ` Peter Stephenson @ 2011-08-13 18:52 ` Bart Schaefer 2011-08-13 20:15 ` Bart Schaefer 2011-09-12 13:51 ` Alexey I. Froloff 0 siblings, 2 replies; 16+ messages in thread From: Bart Schaefer @ 2011-08-13 18:52 UTC (permalink / raw) To: zsh-workers On Aug 9, 9:19pm, Peter Stephenson wrote: } Subject: Re: How to misplace an entire pipeline } } > } > + jobtab[thisjob].stat |= STAT_CURSH; } > } > + if (!jobtab[thisjob].procs) } > } > + jobtab[thisjob].stat |= STAT_NOPRINT; } > } > Should that be "if (hasprocs(thisjob))" instead, do you think? } } To a first approximation, the user isn't directly interested in the aux } procs, but there might still be some knock-on effect. So until we find } otherwise, I would guess not. Assuming that the aux procs are not subject to tty process group signals, it should be harmless to ignore them. I believe the patch below would make it possible to back out the above, because when the pipeline can't be stopped it's impossible to examine the job table anyway. However, I think it's not a problem to leave it, and there may be a few cases where it affects $jobtexts et al. in a useful way. } > What needs to be tested in place of (!list_pipe) to determine that } > the tail of the current pipeline is a simple shell builtin? } } I don't think we've remembered that fact, but it's available (as } is_builtin) in execcmd(), so could be stored before the builtin is } called. It appears sufficient to add this as another STAT_* bitflag in the job structure. It might actually make sense to make list_pipe into a bitflag as well, and mark every job in the table for which it was true at the time the job began, but I'm not prepared for that dive. } > In fact even that may not be enough, maybe this needs to know if the } > current job is a simple shell builtin -- it might be blocked on a } > "while read; do ..." or on a "wait" in the middle of a loop, etc. } } I suppose it depends on what cases the list_pipe logic would be } triggered. I fooled around with a few cases and I think I caught this one too. At this point list_pipe is true but the job receiving the stop signal is not in the process list of the current job (thisjob), so it's necessary to check whether the current job is a builtin. Unless you can think of a case where thisjob would not correctly identify the foreground job at the time the signal is handled? I'm still a little nervous about coughing up that zwarn() in the signal handler but I can't find a better place for it. One drawback here is that if the tail of the pipeline is a loop that's rapidly alternating between a builtin and an external command, ^Z might stop the loop sometimes and issue the warning other times, seemingly at random. diff -u ../zsh-forge/current/Src/exec.c ./Src/exec.c --- ../zsh-forge/current/Src/exec.c 2011-08-09 20:01:14.000000000 -0700 +++ ./Src/exec.c 2011-08-13 11:00:36.000000000 -0700 @@ -2848,6 +2848,8 @@ jobtab[thisjob].stat |= STAT_CURSH; if (!jobtab[thisjob].procs) jobtab[thisjob].stat |= STAT_NOPRINT; + if (is_builtin) + jobtab[thisjob].stat |= STAT_BUILTIN; } else { /* This is an exec (real or fake) for an external command. * * Note that any form of exec means that the subshell is fake * diff -u ../zsh-forge/current/Src/signals.c ./Src/signals.c --- ../zsh-forge/current/Src/signals.c 2011-08-06 11:13:23.000000000 -0700 +++ ./Src/signals.c 2011-08-11 09:55:26.000000000 -0700 @@ -490,14 +490,21 @@ * update it. */ if (findproc(pid, &jn, &pn, 0)) { + if (((jn->stat & STAT_BUILTIN) || + (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) && + WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) { + killjb(jn, SIGCONT); + zwarn("job can't be suspended"); + } else { #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE) - struct timezone dummy_tz; - gettimeofday(&pn->endtime, &dummy_tz); - pn->status = status; - pn->ti = ru; + struct timezone dummy_tz; + gettimeofday(&pn->endtime, &dummy_tz); + pn->status = status; + pn->ti = ru; #else - update_process(pn, status); + update_process(pn, status); #endif + } update_job(jn); } else if (findproc(pid, &jn, &pn, 1)) { pn->status = status; diff -u ../zsh-forge/current/Src/zsh.h ./Src/zsh.h --- ../zsh-forge/current/Src/zsh.h 2011-05-13 21:08:04.000000000 -0700 +++ ./Src/zsh.h 2011-08-10 09:10:35.000000000 -0700 @@ -907,6 +907,8 @@ #define STAT_ATTACH (0x1000) /* delay reattaching shell to tty */ #define STAT_SUBLEADER (0x2000) /* is super-job, but leader is sub-shell */ +#define STAT_BUILTIN (0x4000) /* job at tail of pipeline is a builtin */ + #define SP_RUNNING -1 /* fake status for jobs currently running */ struct timeinfo { ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-13 18:52 ` Bart Schaefer @ 2011-08-13 20:15 ` Bart Schaefer 2011-09-12 13:51 ` Alexey I. Froloff 1 sibling, 0 replies; 16+ messages in thread From: Bart Schaefer @ 2011-08-13 20:15 UTC (permalink / raw) To: zsh-workers Here's a not-new bug (occurs in, e.g., 4.2.0) that I found while trying to test the preceding patch for side-effects. torch% sleep 200 | sleep 300 | while :; do read '?GO:' < /dev/tty; done (rapid infinite loop printing "GO:" repeatedly) ^Z zsh: suspended sleep 200 | sleep 300 | zsh: running while :; do; read '?GO:' < /dev/tty; done torch% Note that it says the "while" loop is running, even though control has returned to the top-level prompt. If you bring this job back into the foreground, it now properly blocks on the "read": torch% fg [1] + continued sleep 200 | sleep 300 | while :; do; read '?GO:' < /dev/tty; done GO: Again, my patch didn't change this behavior, it's been this way for at least several revisions. It works somewhat better with "read -q". In 4.2.0, at the prompt printed by read, ^Z is silently ignored. Here is what happens with ^C instead: torch% sleep 200 | sleep 300 | while :; do read -q '?GO:' ; done GO: torch% jobs [1] running sleep 200 | sleep 300 torch% fg fg: no current job torch% %1 [1] + running sleep 200 | sleep 300 (These are now in the foreground.) With my patch from 29677, 4.3.12-dev-1 still silently ignores the ^Z, but now: torch% sleep 200 | sleep 300 | while :; do read -q '?GO:' ; done GO:% torch% [1] interrupt sleep 200 | sleep 300 torch% Which seems to make more sense. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-08-13 18:52 ` Bart Schaefer 2011-08-13 20:15 ` Bart Schaefer @ 2011-09-12 13:51 ` Alexey I. Froloff 2011-09-12 16:03 ` Bart Schaefer 1 sibling, 1 reply; 16+ messages in thread From: Alexey I. Froloff @ 2011-09-12 13:51 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1572 bytes --] On Sat, Aug 13, 2011 at 11:52:08AM -0700, Bart Schaefer wrote: > if (findproc(pid, &jn, &pn, 0)) { > + if (((jn->stat & STAT_BUILTIN) || > + (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) && > + WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) { > + killjb(jn, SIGCONT); > + zwarn("job can't be suspended"); > + } else { Sometimes, zsh segfaults here while doing B03print.ztst in POSIX locale with thisjob equals to -1. Not 100% reproducable, though. $ LC_ALL=C zsh +Z -f ./ztst.zsh ./B03print.ztst ./B03print.ztst: starting. [1] 20076 segmentation fault (core dumped) LC_ALL=C zsh +Z -f ./ztst.zsh ./B03print.ztst ... Core was generated by `zsh +Z -f ./ztst.zsh ./B03print.ztst'. Program terminated with signal 11, Segmentation fault. #0 0x00000000004745b1 in wait_for_processes () at signals.c:494 494 (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) && (gdb) list 489 * Find the process and job containing this pid and 490 * update it. 491 */ 492 if (findproc(pid, &jn, &pn, 0)) { 493 if (((jn->stat & STAT_BUILTIN) || 494 (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) && 495 WIFSTOPPED(status) && WSTOPSIG(status) == SIGTSTP) { 496 killjb(jn, SIGCONT); 497 zwarn("job can't be suspended"); 498 } else { (gdb) p thisjob $1 = -1 -- Regards, -- Sir Raorn. --- http://thousandsofhate.blogspot.com/ [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-09-12 13:51 ` Alexey I. Froloff @ 2011-09-12 16:03 ` Bart Schaefer 2011-09-12 16:18 ` Bart Schaefer 2011-09-12 16:27 ` Alexey I. Froloff 0 siblings, 2 replies; 16+ messages in thread From: Bart Schaefer @ 2011-09-12 16:03 UTC (permalink / raw) To: zsh-workers On Sep 12, 5:51pm, Alexey I. Froloff wrote: } } Sometimes, zsh segfaults here while doing B03print.ztst in POSIX } locale with thisjob equals to -1. Not 100% reproducable, though. This is why I wondered in the original patch message: : At this point list_pipe is true but the job receiving the stop signal : is not in the process list of the current job (thisjob), so it's : necessary to check whether the current job is a builtin. Unless you : can think of a case where thisjob would not correctly identify the : foreground job at the time the signal is handled? If I understand correctly, thisjob == -1 means there is no foreground job. I'm scratching my head over how list_pipe can be true when there is no job in the foreground. Could you please run ZTST_verbose=2 make TESTNUM=B03 check to verify which test is causing the segmentation fault? I suspect it's the very last test with "print -lO ... | while read ..." but I'd like to be sure. (OTOH it may not be possible to reproduce the fault with verbose output, but it's worth a try.) I further suspect the signal is arriving during foo=$line[i] or some other part of the inner loop where no actual command is being run, but list_pipe remains true because of the outer loop. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-09-12 16:03 ` Bart Schaefer @ 2011-09-12 16:18 ` Bart Schaefer 2011-09-12 16:35 ` Peter Stephenson 2011-09-12 16:27 ` Alexey I. Froloff 1 sibling, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2011-09-12 16:18 UTC (permalink / raw) To: zsh-workers On Sep 12, 9:03am, Bart Schaefer wrote: } } If I understand correctly, thisjob == -1 means there is no foreground } job. I'm scratching my head over how list_pipe can be true when there } is no job in the foreground. In particular I can't decide whether the fix is to do this: 494c494,496 < (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) && --- > (list_pipe && > (thisjob == -1 || > (jobtab[thisjob].stat & STAT_BUILTIN)))) && Or this: 494c494,496 < (list_pipe && (jobtab[thisjob].stat & STAT_BUILTIN))) && --- > (list_pipe && > (thisjob != -1 && > (jobtab[thisjob].stat & STAT_BUILTIN)))) && The first way treats no-foreground-job as if a shell builtin were in the foreground. The second way treats no-foreground-job as if an external command were in the foreground. I tend to think the former is the way to go here -- if there's no job in the foreground, then the parent shell must be in the foreground, and hence nothing is prepared to restart the signalled job if it gets stopped. However, I'm not that confident that I know all the cases where thisjob == -1 may be true. Any help here? If not we're going with plan (A). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-09-12 16:18 ` Bart Schaefer @ 2011-09-12 16:35 ` Peter Stephenson 2011-09-15 14:57 ` Alexey I. Froloff 0 siblings, 1 reply; 16+ messages in thread From: Peter Stephenson @ 2011-09-12 16:35 UTC (permalink / raw) To: zsh-workers On Mon, 12 Sep 2011 09:18:55 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > On Sep 12, 9:03am, Bart Schaefer wrote: > } If I understand correctly, thisjob == -1 means there is no foreground > } job. I'm scratching my head over how list_pipe can be true when there > } is no job in the foreground. If we can't decide this, there's no obvious way of choosing between the alternative fixes, so you should probably just pick one. -- Peter Stephenson <pws@csr.com> Software Engineer Tel: +44 (0)1223 692070 Cambridge Silicon Radio Limited Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom More information can be found at www.csr.com. Follow CSR on Twitter at http://twitter.com/CSR_PLC and read our blog at www.csr.com/blog ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-09-12 16:35 ` Peter Stephenson @ 2011-09-15 14:57 ` Alexey I. Froloff 2011-09-16 16:16 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Alexey I. Froloff @ 2011-09-15 14:57 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 366 bytes --] On Mon, Sep 12, 2011 at 05:35:27PM +0100, Peter Stephenson wrote: > If we can't decide this, there's no obvious way of choosing between the > alternative fixes, so you should probably just pick one. So, have you decided what to do with this? If coin toss can help, I just got heads. -- Regards, -- Sir Raorn. --- http://thousandsofhate.blogspot.com/ [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-09-15 14:57 ` Alexey I. Froloff @ 2011-09-16 16:16 ` Bart Schaefer 0 siblings, 0 replies; 16+ messages in thread From: Bart Schaefer @ 2011-09-16 16:16 UTC (permalink / raw) To: Zsh hackers list I've been traveling for the last several days (still am, in fact; writing this on a train through New England) but will commit "the first way" when I return home next week, unless someone else wants to grab the diff and do it for me. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: How to misplace an entire pipeline 2011-09-12 16:03 ` Bart Schaefer 2011-09-12 16:18 ` Bart Schaefer @ 2011-09-12 16:27 ` Alexey I. Froloff 1 sibling, 0 replies; 16+ messages in thread From: Alexey I. Froloff @ 2011-09-12 16:27 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1865 bytes --] On Mon, Sep 12, 2011 at 09:03:53AM -0700, Bart Schaefer wrote: > to verify which test is causing the segmentation fault? I suspect it's > the very last test with "print -lO ... | while read ..." but I'd like > to be sure. (OTOH it may not be possible to reproduce the fault with > verbose output, but it's worth a try.) Test successful. ZTST_test: looking for new test ZTST_test: examining line: ZTST_test: examining line: print -lO $'a' $'a\0' $'a\0b' $'a\0b\0' $'a\0b\0a' $'a\0b\0b' $'a\0c' | ZTST_getchunk: read code chunk: print -lO $'a' $'a\0' $'a\0b' $'a\0b\0' $'a\0b\0a' $'a\0b\0b' $'a\0c' | while read -r line; do for (( i = 1; i <= ${#line}; i++ )); do foo=$line[i] printf "%02x" $(( #foo )) done print done ZTST_test: examining line: >610063 ZTST_getredir: read redir for '>': 610063 6100620062 6100620061 61006200 610062 6100 61 ZTST_test: examining line: Running test: sorting with embedded nulls ZTST_test: expecting status: 0 Input: /usr/src/tmp/zsh.ztst.in.2764, output: /usr/src/tmp/zsh.ztst.out.2764, error: /usr/src/tmp/zsh.ztst.terr.2764 ************************************** 0 successful test scripts, 1 failure, 0 skipped ************************************** make[1]: *** [check] Error 1 make[1]: Leaving directory `/usr/src/RPM/BUILD/zsh-4.3.12/Test' make: *** [check] Error 2 > I further suspect the signal is arriving during foo=$line[i] or some > other part of the inner loop where no actual command is being run, but > list_pipe remains true because of the outer loop. I think there's a race condition somewhere. I've managed to reproduce this error on an average loaded system with decent swap usage. Fail chance is about 15-20%% on my system and >80% on build server. -- Regards, -- Sir Raorn. --- http://thousandsofhate.blogspot.com/ [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-09-16 16:16 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-08-06 3:31 How to misplace an entire pipeline Bart Schaefer 2011-08-07 17:50 ` Peter Stephenson 2011-08-07 21:43 ` Bart Schaefer 2011-08-08 4:05 ` Bart Schaefer 2011-08-08 18:27 ` Peter Stephenson 2011-08-09 6:10 ` Bart Schaefer 2011-08-09 20:19 ` Peter Stephenson 2011-08-13 18:52 ` Bart Schaefer 2011-08-13 20:15 ` Bart Schaefer 2011-09-12 13:51 ` Alexey I. Froloff 2011-09-12 16:03 ` Bart Schaefer 2011-09-12 16:18 ` Bart Schaefer 2011-09-12 16:35 ` Peter Stephenson 2011-09-15 14:57 ` Alexey I. Froloff 2011-09-16 16:16 ` Bart Schaefer 2011-09-12 16:27 ` Alexey I. Froloff
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).