$pipestatus broken?

zsh-workers
 help / color / mirror / code / Atom feed

* $pipestatus broken?
@ 2011-12-10 12:24 Frank Terbeck
  2011-12-10 12:48 ` Frank Terbeck
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Terbeck @ 2011-12-10 12:24 UTC (permalink / raw)
  To: zsh-workers

Hey list,

I've been playing with $pipestatus to solve the subversion1.7 issues in
vcs_info... Because it turns out, it can be a whole bunch of error
messages.

So the idea was not to match output but to look at subversion's exit
status. Since "svn info" is called in a pipe, I was using $pipestatus.
The result was mixed - literally. Sometimes, $pipestatus would contain
"0 0" and sometimes it would contain just a single "1".

Now I was extracting the offending code into a test script and then
later replaced the call to "svn info" by a call to fortune(1). The
result is the same.

Here's my test script:
[snip]
local -A svninfo
fortune |& while IFS=: read a b; do
    svninfo[${a// /_}]="${b## #}"
done
ps=( "${pipestatus[@]}" )
rc=${ps[1]}
print "${ps[@]}"
[snap]

And here is a test-run:
[snip]
% repeat 10 zsh ./test
1
0 0
1
1
0 0
0 0
0 0
1
0 0
1
[snap]

Am I using `$pipestatus' the wrong way or is this a bug?

Regards, Frank

-- 
In protocol design, perfection has been reached not when there is
nothing left to add, but when there is nothing left to take away.
                                                  -- RFC 1925

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-10 12:24 $pipestatus broken? Frank Terbeck
@ 2011-12-10 12:48 ` Frank Terbeck
  2011-12-10 14:58   ` Bart Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Terbeck @ 2011-12-10 12:48 UTC (permalink / raw)
  To: zsh-workers

Frank Terbeck wrote:
[...]
> Here's my test script:

You can actually dumb this down to:

[snip]
: | while read a; do
    :
done
print "${pipestatus[@]}"
[snap]

It (pipestatus just containing a single "1") seems to happen less
frequently with this simpler loop, but still in the range of 5% of the
invocations on my machine.

If I use the simplest pipe I can think of:
[snip]
: | :
print "${pipestatus[@]}"
[snap]

I didn't get a misstep in 10000 test runs.

I'm running the latest zsh snapshot on a 2.6.32 linux box on a intel
core2duo laptop (that's an amd64 architecture), if it matters.

Regards, Frank

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-10 12:48 ` Frank Terbeck
@ 2011-12-10 14:58   ` Bart Schaefer
  2011-12-11 14:37     ` Frank Terbeck
  2011-12-23 10:49     ` Frank Terbeck
  0 siblings, 2 replies; 12+ messages in thread
From: Bart Schaefer @ 2011-12-10 14:58 UTC (permalink / raw)
  To: zsh-workers

On Dec 10,  1:48pm, Frank Terbeck wrote:
} Subject: Re: $pipestatus broken?
}
} You can actually dumb this down to:
} 
} [snip]
} : | while read a; do
}     :
} done
} print "${pipestatus[@]}"
} [snap]
} 
} It (pipestatus just containing a single "1") seems to happen less
} frequently with this simpler loop, but still in the range of 5% of the
} invocations on my machine.

With that loop, I get a single "1" 100% of the time unless I attach to
the process with GDB, in which case it becomes random.

I traced this down to update_job().  At this line:

Breakpoint 5, update_job (jn=0x9415894) at ../../zsh-4.0/Src/jobs.c:504
504	    if (job == thisjob && (jn->stat & STAT_DONE)) {

There are two entries in the job table; job = 1 and thisjob = 2, so we
skip updating $pipestatus.  I don't know where the "1" value in the
first element of pipestatus comes from, though.

In this case update_job() is being called from wait_for_processes() via
zhandler() while the shell is blocked in bin_read().

Interestingly, in the case I trapped with GDB, we're waiting for PID
6193:

492		if (findproc(pid, &jn, &pn, 0)) {

(gdb) p pid
$16 = 6193

But that job is not anywhere in jobtab -- the closest is jobtab[1]:

(gdb) p jobtab[1]
$19 = {gleader = 6192, other = 0, stat = 1097, pwd = 0x0, procs = 0x9428028, 
  auxprocs = 0x0, filelist = 0x0, stty_in_env = 0, ty = 0x0}

The procs pointer there points to a one-element linked list containing
only PID 6192.  So where did 6193 come from, and why is it not in the
job table?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-10 14:58   ` Bart Schaefer
@ 2011-12-11 14:37     ` Frank Terbeck
  2011-12-23 10:49     ` Frank Terbeck
  1 sibling, 0 replies; 12+ messages in thread
From: Frank Terbeck @ 2011-12-11 14:37 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote:
[...]
> With that loop, I get a single "1" 100% of the time unless I attach to
> the process with GDB, in which case it becomes random.

I know I won't be much help in solving this, because I don't know how
any of the job handling code works or even what needs to be done for any
sort of sane job handling in the first place...

But I tried to see if I could pinpoint the revision, that introduced the
breakage. And it appears, that the first release that featured
`$pipestatus' was 3.1.7. And it's broken in that release already.

The first tag that includes the feature is `zsh-3.1.6-dev-21'. I tried
to build it, but I can't get the autotools (I tried various versions) to
produce a valid `configure' script. :-/

`$pipestatus' was introduced in workers-10358:

  <http://www.zsh.org/mla/workers/2000/msg01183.html>

Regards, Frank

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-10 14:58   ` Bart Schaefer
  2011-12-11 14:37     ` Frank Terbeck
@ 2011-12-23 10:49     ` Frank Terbeck
  2011-12-23 21:31       ` Bart Schaefer
  2011-12-24 17:37       ` Bart Schaefer
  1 sibling, 2 replies; 12+ messages in thread
From: Frank Terbeck @ 2011-12-23 10:49 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote:
[...]
> With that loop, I get a single "1" 100% of the time unless I attach to
> the process with GDB, in which case it becomes random.

It's correct that this should output "0 0", though, right? With the right
hand side of the pipe being a loop, the last return code of the loop is
return code of the loop, correct?

If that's correct, the following adds a new test, that checks if the
problem happens on a given system (actually, if the problem appears at
least once in 2048 test runs - which has been triggered every time on my
test system so far). I didn't see anything about tests that are *expected*
to fail in `B01cd.ztst', so this makes the test suite fail at least once.
Therefore, I don't know if adding this is a good idea or not.

Regards, Frank

diff --git a/Test/A04redirect.ztst b/Test/A04redirect.ztst
index 3911b06..d349f33 100644
--- a/Test/A04redirect.ztst
+++ b/Test/A04redirect.ztst
@@ -441,3 +441,13 @@
 0:failed assignment on non-posix-special, POSIX_BUILTINS
 >output
 ?zsh: read-only variable: foo
+
+  repeat 2048; do (: | while read a; do
+                           :
+                       done;
+                       print "${pipestatus[@]}")
+  done | sort -u
+0:Check whether `$pipestatus[]' behaves.
+>0 0
+F:This test checks for a bug in `$pipestatus[]' handling.  If it breaks then
+F:the bug is still there or it reappeared. See workers-29973 for details.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-23 10:49     ` Frank Terbeck
@ 2011-12-23 21:31       ` Bart Schaefer
  2011-12-23 22:11         ` Frank Terbeck
  2011-12-24 17:37       ` Bart Schaefer
  1 sibling, 1 reply; 12+ messages in thread
From: Bart Schaefer @ 2011-12-23 21:31 UTC (permalink / raw)
  To: zsh-workers

On Dec 23, 11:49am, Frank Terbeck wrote:
} Subject: Re: $pipestatus broken?
}
} Bart Schaefer wrote:
} [...]
} > With that loop, I get a single "1" 100% of the time unless I attach to
} > the process with GDB, in which case it becomes random.
} 
} It's correct that this should output "0 0", though, right?

Yes, I believe so.

I *think* the problem is that the right-hand-side is a builtin and thus
does not have an external process.  It's therefore possible for the
left side to exit and be reaped before the right side is done, so the
shell forgets that it is in a pipeline at all.

I get correct behavior if the right-side loop is in a subshell.  In a
curious twist, if the loop is in { braces } , then $pipstatus is a
single 0 instead of a single 1.  This may be a clue to where/how the
pipestatus internals are being improperly reset, but I have yet to
figure it out.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-23 21:31       ` Bart Schaefer
@ 2011-12-23 22:11         ` Frank Terbeck
  2011-12-24  9:32           ` Bart Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Terbeck @ 2011-12-23 22:11 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote:
> On Dec 23, 11:49am, Frank Terbeck wrote:
> } Bart Schaefer wrote:
> } > With that loop, I get a single "1" 100% of the time unless I attach to
> } > the process with GDB, in which case it becomes random.
> } 
> } It's correct that this should output "0 0", though, right?
>
> Yes, I believe so.
>
> I *think* the problem is that the right-hand-side is a builtin and thus
> does not have an external process.  It's therefore possible for the
> left side to exit and be reaped before the right side is done, so the
> shell forgets that it is in a pipeline at all.

I think that's part of it. It appears to have to be a loop on the
right-hand-side, too, though. The following never fails for me:

% (repeat 100000; do : | :; print "${pipestatus[@]}"; done) | sort -u
0 0

...unless, of course, the loop just needs to be there to delay the RHS,
for the exit/reap to happen as you suggested.

Regards, Frank


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-23 22:11         ` Frank Terbeck
@ 2011-12-24  9:32           ` Bart Schaefer
  2011-12-24  9:59             ` Frank Terbeck
  0 siblings, 1 reply; 12+ messages in thread
From: Bart Schaefer @ 2011-12-24  9:32 UTC (permalink / raw)
  To: zsh-workers

On Dec 23, 11:11pm, Frank Terbeck wrote:
}
} ...unless, of course, the loop just needs to be there to delay the RHS,
} for the exit/reap to happen as you suggested.

Hmm, this is getting a bit hairy.

In the simple case { echo foo | read -E }, there is a single job table
entry for "read" which is the group leader for the pipeline.

In the loop case { echo foo | repeat 1; read -E } there is a job table
entry for the loop which is the group leader, but a new entry is
created for "read -E".  execpline() remembers the previous thisjob as
the local "pj" and restores thisjob = pj at line 1619, but by that
time it is too late -- waitjobs() has set thisjob = -1 for just long
enough for zhandler() to call update_job(), which fails to update the
pipestats because thisjob = -1 tells it there is no current job.

The following seems to fix it, by telling waitjobs() what the previous
job number was so it can be reset immediately.  There may still be a
race condition that requires fiddling with signal blocks to make sure
thisjob is correct at the time the zhandler() catches the signal, but
if so this should at least allow the block/unblock to be localized.

Index: Src/exec.c
===================================================================
--- Src/exec.c	20 Dec 2011 17:13:38 -0000	1.43
+++ Src/exec.c	24 Dec 2011 09:22:57 -0000
@@ -1516,7 +1516,7 @@
 		}
 		if (!(jn->stat & STAT_LOCKED)) {
 		    updated = hasprocs(thisjob);
-		    waitjobs();
+		    waitjobs(pj);
 		    child_block();
 		} else
 		    updated = 0;
@@ -3339,7 +3339,7 @@
 		close(i);
 	closem(FDT_UNUSED);
 	if (thisjob != -1)
-	    waitjobs();
+	    waitjobs(-1);
 	_exit(lastval);
     }
     fixfds(save);
Index: Src/jobs.c
===================================================================
--- Src/jobs.c	20 Dec 2011 17:13:38 -0000	1.31
+++ Src/jobs.c	24 Dec 2011 09:23:34 -0000
@@ -1363,7 +1363,7 @@

 /**/
 void
-waitjobs(void)
+waitjobs(int prevjob)
 {
     Job jn = jobtab + thisjob;
     DPUTS(thisjob == -1, "No valid job in waitjobs.");
@@ -1375,7 +1375,7 @@
 	pipestats[0] = lastval;
 	numpipestats = 1;
     }
-    thisjob = -1;
+    thisjob = prevjob;
 }

 /* clear job table when entering subshells */
@@ -2072,7 +2072,7 @@
 		 * when we've just been told to wait for another
 		 * job (and done it)?
 		 */
-		waitjobs();
+		waitjobs(-1);
 		retval = lastval2;
 	    } else if (ofunc == BIN_DISOWN)
 	        deletejob(jobtab + job, 1);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-24  9:32           ` Bart Schaefer
@ 2011-12-24  9:59             ` Frank Terbeck
  2011-12-24 18:23               ` Bart Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Terbeck @ 2011-12-24  9:59 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote:
> On Dec 23, 11:11pm, Frank Terbeck wrote:
> } ...unless, of course, the loop just needs to be there to delay the RHS,
> } for the exit/reap to happen as you suggested.
>
> Hmm, this is getting a bit hairy.
>
> In the simple case { echo foo | read -E }, there is a single job table
> entry for "read" which is the group leader for the pipeline.
>
> In the loop case { echo foo | repeat 1; read -E } there is a job table
> entry for the loop which is the group leader, but a new entry is
> created for "read -E".  execpline() remembers the previous thisjob as
> the local "pj" and restores thisjob = pj at line 1619, but by that
> time it is too late -- waitjobs() has set thisjob = -1 for just long
> enough for zhandler() to call update_job(), which fails to update the
> pipestats because thisjob = -1 tells it there is no current job.
>
> The following seems to fix it, by telling waitjobs() what the previous
> job number was so it can be reset immediately.  There may still be a
> race condition that requires fiddling with signal blocks to make sure
> thisjob is correct at the time the zhandler() catches the signal, but
> if so this should at least allow the block/unblock to be localized.

Hm. I'm having a hard time following what's going on...

With this change, the test I posted in workers-30047 changes a bit.
Before, there were only lines that either looked like "1" or "0 0". Now
I'm getting "0 1", too.

"1" and "0 1" seem to happen way less now - both cases are in the 1%
area each. But maybe I'm just a little luckier, with respect to timing.

Regards, Frank


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-23 10:49     ` Frank Terbeck
  2011-12-23 21:31       ` Bart Schaefer
@ 2011-12-24 17:37       ` Bart Schaefer
  1 sibling, 0 replies; 12+ messages in thread
From: Bart Schaefer @ 2011-12-24 17:37 UTC (permalink / raw)
  To: zsh-workers

On Dec 23, 11:49am, Frank Terbeck wrote:
}
} If that's correct, the following adds a new test, that checks if the
} problem happens on a given system (actually, if the problem appears at
} least once in 2048 test runs - which has been triggered every time on my
} test system so far).

I think this test belongs in A05 or A07 rather than A04, but it probably
doesn't matter.  However, if it's really going to loop for that long,
it should either print a warning or call ZTST_hashmark occasionally
during the repeat loop.  (Every iteration, is probably excessive.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-24  9:59             ` Frank Terbeck
@ 2011-12-24 18:23               ` Bart Schaefer
  2011-12-24 18:46                 ` Bart Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: Bart Schaefer @ 2011-12-24 18:23 UTC (permalink / raw)
  To: zsh-workers

On Dec 24, 10:59am, Frank Terbeck wrote:
} Subject: Re: $pipestatus broken?
}
} Bart Schaefer wrote:
} >
} > In the loop case { echo foo | repeat 1; read -E } there is a job table
} > entry for the loop which is the group leader, but a new entry is
} > created for "read -E".  execpline() remembers the previous thisjob as
} > the local "pj" and restores thisjob = pj at line 1619, but by that
} > time it is too late -- waitjobs() has set thisjob = -1 for just long
} > enough for zhandler() to call update_job(), which fails to update the
} > pipestats because thisjob = -1 tells it there is no current job.
} >
} > The following seems to fix it, by telling waitjobs() what the previous
} > job number was so it can be reset immediately.  There may still be a
} > race condition that requires fiddling with signal blocks to make sure
} > thisjob is correct at the time the zhandler() catches the signal, but
} > if so this should at least allow the block/unblock to be localized.
} 
} Hm. I'm having a hard time following what's going on...

A bit more explanation, then; let's use your Test/A04 example:

    : | while read a; do :; done

In execpline(), zsh wants to keep the right side ("while ...") in the
current shell.  So it creates a job entry jobtab[1] for the pipeline,
forks to run ":" on the left side which becomes jobtab[1]->procs, and
enters execwhile() for the right side.  At this point thisjob = 1.

Now it needs to run "read a", so execpline temporarily creates a new
entry jobtab[2], saves pj = thisjob, sets thisjob = 2 and enters
execbuiltin() [whether it's a builtin isn't important to the bug].
When the builtin completes, execpline restores thisjob = pj to make
the loop the current job again.

EXCEPT ... at various times including during execbuiltin(), the child
forked off to run ":" may exit and hit the parent with SIGCHLD.  This
invokes zhandler() which reaps the process and calls update_job() to
change status in the job table, including $pipestatus.  update_job()
compares the job that just exited (a process linked to jobtab[1]) with
the current foreground process (which thisjob says is jobtab[2]) and
concludes that a background job has exited.  Therefore it skips the
update of $pipestatus and instead resets it as if there were no pipe.

When the shell then gets around to waiting for jobtab[1] at the end
of the loop, it has lost the reaped left-hand-side and behaves as if
there is only one job in the pipeline.

What we have is a case where the shell is juggling two "current" jobs
(the loop itself, and the command executed from within the loop) and
it loses track of one of them at a crucial instant.

} With this change, the test I posted in workers-30047 changes a bit.
} Before, there were only lines that either looked like "1" or "0 0". Now
} I'm getting "0 1", too.

Yes, when I actually put your test into my patched sandbox and run
"make check" I also get all three results.  Obviously waitjobs() is
not the only place where the SIGCHLD can sneak in, it's just the only
one I was able to catch with the debugger.  So that patch is not
sufficient (and also probably not necessary if we resolve the race).

I don't know exactly where the "0 1" is coming from -- or rather, it
must be coming from this in update_job() but I'm not sure why:

        if ((jn->stat & STAT_CURSH) && i < MAX_PIPESTATS)
            pipestats[i++] = lastval;

In this case I'm *guessing* lastval = 1 because we're catching the
signal after thisjob = pj but before the actual wait for jobtab[1],
so lastval reflects that "read a" has failed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: $pipestatus broken?
  2011-12-24 18:23               ` Bart Schaefer
@ 2011-12-24 18:46                 ` Bart Schaefer
  0 siblings, 0 replies; 12+ messages in thread
From: Bart Schaefer @ 2011-12-24 18:46 UTC (permalink / raw)
  To: zsh-workers

On Dec 24, 10:23am, Bart Schaefer wrote:
}
}         if ((jn->stat & STAT_CURSH) && i < MAX_PIPESTATS)
}             pipestats[i++] = lastval;
} 
} In this case I'm *guessing* lastval = 1 because we're catching the
} signal after thisjob = pj but before the actual wait for jobtab[1],
} so lastval reflects that "read a" has failed.

I can confirm by replacing "read a" with "(return 3)" in the test that
we are in fact picking up lastval from the condition rather than from
the loop exit, but in the absence of my patch the above snippet is not
the place where this happens.

That's all the time I have for this until after the 25th, I'm afraid.
Happy year-end-holdiday-stuff to you all.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-12-24 18:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-10 12:24 $pipestatus broken? Frank Terbeck
2011-12-10 12:48 ` Frank Terbeck
2011-12-10 14:58   ` Bart Schaefer
2011-12-11 14:37     ` Frank Terbeck
2011-12-23 10:49     ` Frank Terbeck
2011-12-23 21:31       ` Bart Schaefer
2011-12-23 22:11         ` Frank Terbeck
2011-12-24  9:32           ` Bart Schaefer
2011-12-24  9:59             ` Frank Terbeck
2011-12-24 18:23               ` Bart Schaefer
2011-12-24 18:46                 ` Bart Schaefer
2011-12-24 17:37       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).