zsh-workers
 help / color / mirror / code / Atom feed
* Drat, Test/A05 still hanging sometimes
@ 2014-10-06 20:42 Bart Schaefer
  2014-10-07 14:07 ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 2014-10-06 20:42 UTC (permalink / raw)
  To: zsh-workers

The interesting thing is where it is stuck.

The "runtests.zsh" thread is blocked on sigsuspend() in the "for file" loop,
waiting for the the $ZTST_exe thread to finish.

The $ZTST_exe thread is trying to exit -- in my case it is stuck on a mutex
with no zsh code left in the gdb-able call stack backtrace.

Strangely, though, if I kill that thread, the runtests thread also dies,
and the entire "make" aborts.  The "for file" loop is written in a way that
I would have expected runtests to simply continue on to the next pass.

I'm baffled.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Drat, Test/A05 still hanging sometimes
  2014-10-06 20:42 Drat, Test/A05 still hanging sometimes Bart Schaefer
@ 2014-10-07 14:07 ` Peter Stephenson
  2014-10-09  7:52   ` Axel Beckert
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2014-10-07 14:07 UTC (permalink / raw)
  To: zsh-workers

On Mon, 06 Oct 2014 13:42:55 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> The interesting thing is where it is stuck.
> 
> The "runtests.zsh" thread is blocked on sigsuspend() in the "for file" loop,
> waiting for the the $ZTST_exe thread to finish.
> 
> The $ZTST_exe thread is trying to exit -- in my case it is stuck on a mutex
> with no zsh code left in the gdb-able call stack backtrace.
> 
> Strangely, though, if I kill that thread, the runtests thread also dies,
> and the entire "make" aborts.  The "for file" loop is written in a way that
> I would have expected runtests to simply continue on to the next pass.
> 
> I'm baffled.

As this isn't new, and isn't easily tractable at the moment, I'm not
planning on holding up 5.0.7.

I suppose it's likely to be another missing queue_signals(), the
question is where.

pws


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Drat, Test/A05 still hanging sometimes
  2014-10-07 14:07 ` Peter Stephenson
@ 2014-10-09  7:52   ` Axel Beckert
  2014-10-12 18:47     ` Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Axel Beckert @ 2014-10-09  7:52 UTC (permalink / raw)
  To: zsh-workers

Hi,

On Tue, Oct 07, 2014 at 03:07:35PM +0100, Peter Stephenson wrote:
> On Mon, 06 Oct 2014 13:42:55 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > The interesting thing is where it is stuck.

With the 5.0.7 upload, again one of the Debian builds (so far, two
still outstanding) hung for more than 150 minutes, this time in
A01grammar.ztst:

https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=mipsel&ver=5.0.7-1&stamp=1412817576

Relevant lines:

../../Test/A01grammar.ztst: starting.
Running test: Basic pipeline handling
Test successful.
Running test: Exit status of pipeline with builtins (true)
Test successful.
Running test: Exit status of pipeline with builtins (false)
Test successful.
Running test: Executing command that evaluates to empty resets status
Test successful.
Running test: Starting background command resets status
Test successful.
Running test: Sourcing empty file resets status
Test successful.
Running test: Basic coprocess handling
make: *** [build-arch] Terminated
make[1]: *** [test] Terminated
make[2]: *** [check] Terminated
debian/rules:54: recipe for target 'build-arch' failed
Makefile:265: recipe for target 'test' failed
Makefile:189: recipe for target 'check' failed
Build killed with signal TERM after 150 minutes of inactivity

I assume it's too far away (in terms of output size) from A05 to have
(not) buffer flushing causing the output being incomplete.

HTH.

> As this isn't new, and isn't easily tractable at the moment, I'm not
> planning on holding up 5.0.7.

I think that was the right decision. Sorry for not being able to
report sooner this time, but I was quite busy the past few days.

> I suppose it's likely to be another missing queue_signals(), the
> question is where.

(I'd obviously be happy if someone finds that one, but I have even no
idea where to start looking.)

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Drat, Test/A05 still hanging sometimes
  2014-10-09  7:52   ` Axel Beckert
@ 2014-10-12 18:47     ` Bart Schaefer
  2014-10-13  3:03       ` Axel Beckert
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 2014-10-12 18:47 UTC (permalink / raw)
  To: zsh-workers

On Oct 9,  9:52am, Axel Beckert wrote:
}
} Running test: Basic coprocess handling
} make: *** [build-arch] Terminated
} make[1]: *** [test] Terminated
} make[2]: *** [check] Terminated
} debian/rules:54: recipe for target 'build-arch' failed
} Makefile:265: recipe for target 'test' failed
} Makefile:189: recipe for target 'check' failed
} Build killed with signal TERM after 150 minutes of inactivity
} 
} I assume it's too far away (in terms of output size) from A05 to have
} (not) buffer flushing causing the output being incomplete.

It is indeed too far away but:  The test in A05 that gets stuck also
uses a coprocess, so the likely place to look is a race condition in
the SIGCHLD reaping of coprocesses.

Tests in each of A01, A04, and A05 use "coproc".  How does this match up
with the hung builds you have encountered?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Drat, Test/A05 still hanging sometimes
  2014-10-12 18:47     ` Bart Schaefer
@ 2014-10-13  3:03       ` Axel Beckert
  2014-10-13 17:18         ` Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Axel Beckert @ 2014-10-13  3:03 UTC (permalink / raw)
  To: zsh-workers

Hi Bart,

On Sun, Oct 12, 2014 at 11:47:39AM -0700, Bart Schaefer wrote:
> On Oct 9,  9:52am, Axel Beckert wrote:
> }
> } Running test: Basic coprocess handling
> } make: *** [build-arch] Terminated
> } make[1]: *** [test] Terminated
> } make[2]: *** [check] Terminated
> } debian/rules:54: recipe for target 'build-arch' failed
> } Makefile:265: recipe for target 'test' failed
> } Makefile:189: recipe for target 'check' failed
> } Build killed with signal TERM after 150 minutes of inactivity
> } 
> } I assume it's too far away (in terms of output size) from A05 to have
> } (not) buffer flushing causing the output being incomplete.
> 
> It is indeed too far away but:  The test in A05 that gets stuck also
> uses a coprocess, so the likely place to look is a race condition in
> the SIGCHLD reaping of coprocesses.
> 
> Tests in each of A01, A04, and A05 use "coproc".  How does this match up
> with the hung builds you have encountered?

Quite well -- as far as I can see there was only one exception so far:
Once it also hung inside X02zlevi.ztst. All others were either in one
of the three tests you mentioned, with A05 being the most often one
(and IIRC also the one you experience twice or so).

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Drat, Test/A05 still hanging sometimes
  2014-10-13  3:03       ` Axel Beckert
@ 2014-10-13 17:18         ` Bart Schaefer
  2014-10-14  9:34           ` Axel Beckert
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 2014-10-13 17:18 UTC (permalink / raw)
  To: zsh-workers

On Oct 13,  5:03am, Axel Beckert wrote:
} 
} On Sun, Oct 12, 2014 at 11:47:39AM -0700, Bart Schaefer wrote:
} > Tests in each of A01, A04, and A05 use "coproc".  How does this match up
} > with the hung builds you have encountered?
} 
} Quite well -- as far as I can see there was only one exception so far:
} Once it also hung inside X02zlevi.ztst. All others were either in one
} of the three tests you mentioned, with A05 being the most often one
} (and IIRC also the one you experience twice or so).

Give the following a try?  With the "sleep" in there, I am unable to
make the A05 test hang.  Without it, I get one hang in each 20 repeats
of the test, pretty reliably.

Although why putting the sleep at that particular place has the right
side-effect, I do not know.

diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst
index ca97f4f..0b40a73 100644
--- a/Test/A05execution.ztst
+++ b/Test/A05execution.ztst
@@ -208,6 +208,7 @@ F:This similar test was triggering a reproducible failure with pipestatus.
   print -u $ZTST_fd 'This test takes 5 seconds to fail...'
   { printf "%d\n" {1..20000} } | ( read -e )
   hang(){ printf "%d\n" {2..20000} | cat }; hang | ( read -e )
+  sleep 1 ;: avoid coproc exit race condition
   print -p done
   read -et 6 -p
 0:Bug regression: piping a shell construct to an external process may hang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Drat, Test/A05 still hanging sometimes
  2014-10-13 17:18         ` Bart Schaefer
@ 2014-10-14  9:34           ` Axel Beckert
  2014-10-27  0:40             ` Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Axel Beckert @ 2014-10-14  9:34 UTC (permalink / raw)
  To: zsh-workers

Hi Bart,

On Mon, Oct 13, 2014 at 10:18:03AM -0700, Bart Schaefer wrote:
> Give the following a try?  With the "sleep" in there, I am unable to
> make the A05 test hang.  Without it, I get one hang in each 20 repeats
> of the test, pretty reliably.

Thanks. I've made an upload to Debian Experimental last night with a
bunch of cherry-picked post-5.0.7 patches plus this one.

> Although why putting the sleep at that particular place has the right
> side-effect, I do not know.
> 
> diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst
> index ca97f4f..0b40a73 100644
> --- a/Test/A05execution.ztst
> +++ b/Test/A05execution.ztst
> @@ -208,6 +208,7 @@ F:This similar test was triggering a reproducible failure with pipestatus.
>    print -u $ZTST_fd 'This test takes 5 seconds to fail...'
>    { printf "%d\n" {1..20000} } | ( read -e )
>    hang(){ printf "%d\n" {2..20000} | cat }; hang | ( read -e )
> +  sleep 1 ;: avoid coproc exit race condition
>    print -p done
>    read -et 6 -p
>  0:Bug regression: piping a shell construct to an external process may hang

One build failure so far (four slower/busier architectures still
outstanding), but it happened on the architecture where it happened
the most often so far (kfreebsd-amd64). It was inside A05, but not at
the above location but at the test starting at line 179 (or not
properly synced output):

https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=kfreebsd-amd64&ver=5.0.7-2&stamp=1413276687

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Drat, Test/A05 still hanging sometimes
  2014-10-14  9:34           ` Axel Beckert
@ 2014-10-27  0:40             ` Bart Schaefer
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 2014-10-27  0:40 UTC (permalink / raw)
  To: zsh-workers

On Oct 14, 11:34am, Axel Beckert wrote:
}
} One build failure so far (four slower/busier architectures still
} outstanding), but it happened on the architecture where it happened
} the most often so far (kfreebsd-amd64). It was inside A05, but not at
} the above location but at the test starting at line 179

At that point there should be exactly one child process (the sleep
that is being killed), so there shouldn't be any confusion between
waiting for a foreground job and waiting for that background job.

So it must somehow be the case that the SIGCHLD has been received
(post-kill) but the jobtab state of the sleep process did not get
properly updated; and now zsh is in an infinite (but not busy, as
it stops each pass in sigsuspend()) loop looking for a new SIGCHLD
followed by a state change of that process, which is never going to
happen.

I'll be curious to see if the patch from 33531 changes this at all.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-10-27  0:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-06 20:42 Drat, Test/A05 still hanging sometimes Bart Schaefer
2014-10-07 14:07 ` Peter Stephenson
2014-10-09  7:52   ` Axel Beckert
2014-10-12 18:47     ` Bart Schaefer
2014-10-13  3:03       ` Axel Beckert
2014-10-13 17:18         ` Bart Schaefer
2014-10-14  9:34           ` Axel Beckert
2014-10-27  0:40             ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).