From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6920 invoked by alias); 29 Sep 2014 01:30:11 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 33269 Received: (qmail 4973 invoked from network); 29 Sep 2014 01:30:05 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.2 Date: Mon, 29 Sep 2014 03:20:33 +0200 From: Axel Beckert To: zsh-workers@zsh.org Cc: ivodd@debian.org, 760061@bugs.debian.org Subject: Re: "5 seconds to fail" Message-ID: <20140929012033.GL5405@sym.noone.org> Mail-Followup-To: zsh-workers@zsh.org, ivodd@debian.org, 760061@bugs.debian.org References: <20140824183909.75d04bf0@pws-pc.ntlworld.com> <20140827162834.GQ7356@sym.noone.org> <140830162545.ZM27219@torch.brasslantern.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <140830162545.ZM27219@torch.brasslantern.com> X-Operating-System: Linux 3.2.0-4-amd64 X-Machine: sym2 x86_64 X-Editor: GNU Emacs 23.4.1 Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAAAAAC3mUtaAAAABGdBTUEAALGPC/xhBQAAADh0RVh0U29mdHdhcmUAWFYgVmVyc2lvbiAzLjEwYSAgUmV2OiAxMi8yOS85NCAoUE5HIHBhdGNoIDEuMindFS5JAAACGElEQVQ4jXXQMU8UYRDG8f8shNjdDH4AbpfGDjAWlKiJiZ0ajL1aGCvsNCbGaCGG1koLaztaTYz6ATy+gOyehYmF3MxVxgg3FnDsHcTpJr/M+8w7Rf6nCsaVTTDqxbg9hoOXmw83H71+Eyfg4E1d7/Z2fG9rGkZbTQiu+K+3U/C+76lmkvAhJuDndnoAiftou4V84okAGclop4U/jYACZDTxrYWP0gkxVfAm/W//GLZpxIzwIN0Hn8dw0B+IWkZmQmRsj2HfhwokEklHfNCCiQCRgAR7YyhQVRVTCKCzP4Y5zBBE0t0zY3Q8oQaBqqAMlVEcgVQd9706zGirAFium8HXumlMIeMwqQCInju+2+uB6MRENupdpMt8pRlHZyuAW0F+Mb6XSIVqtxjD+iVmVqqystLEzFTGT92YqRaXpNT5eTVjeJhbALPnrTxLUZUKZsgxcNm64hAOYisT/xhF+oKTGU5RegtC3Rt6eEDi/QnIevdTx9Md2EMmYBRmCQR1026FCGQQJJExsRUqgkMGaWSbwYLnoO4T6VgpbQbdELPMBAHWWrhYrcxXnYgAsatPWygkFCBD4K62MAsOTqA6szYRPpsu6e6Y8mPiVrBMNuGIMrgwBUu4p2DgG1Ownu6hpuTv7hScefHAzAC/yRRw5U5pALMbJ4AUALvHSZhxgHPXTsHcdWD1GadAHr9avP+c0wCr7263Df8ASLwXWHWs+KIAAAAHdElNRQfYBQEBODPr Organization: DeuxChevaux.org -- The =?iso-8859-1?Q?Citr?= =?iso-8859-1?B?b+tu?= 2CV Database User-Agent: Mutt/1.5.21 (2010-09-15) Hi again, On Sat, Aug 30, 2014 at 04:25:45PM -0700, Bart Schaefer wrote: > On Aug 27, 6:28pm, Axel Beckert wrote: > } On Sun, Aug 24, 2014 at 06:39:09PM +0100, Peter Stephenson wrote: > } > I think the only remaining open matter (excluding longer term issues > } > that aren't going to be dealt with immediately) is test failures in > } > some automated tests. > } > } We again got a few build-failures on the build daemons, but this time > } on different architectures than with 5.0.5-dev-2, namely amd64 (Linux) > } and s390x while kfreebsd-amd64 worked fine this time: > } > } https://buildd.debian.org/status/package.php?p=zsh&suite=experimental > } > } On amd64 (aka x86_64) the testsuite was hanging and killed after 150 > } minutes: > } https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=amd64&ver=5.0.5-dev-3-1&stamp=1409099297 > } > } On s390x, gcc was killed after 150 minutes of inactivity: > } https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=s390x&ver=5.0.5-dev-3-1&stamp=1409118602 > > I was able by accident to reproduce this in a foreground build on CentOS. Thanks for the confirmation that this is likely not debian-specific. I though still don't understand why this happens way more often on the buildds than outside of the buildds. Since we ran into the issue again with the upload one week ago, and we have to do another upload again before the upcoming freeze for Debian's next stable release, I did another run of building the zsh package again and again (without connected terminal by using "ssh localhost" without "-t" to get closer to the buildd environment) until I run into the issue. And this time I had some luck after something like 70 to 100 successful package builds in one row, but interestingly, with different results as Bart: > Can't say for sure it's exactly the same thing, but here's the stack trace > I got: > > (gdb) where > #0 0x0086e7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 > #1 0x009611ce in __lll_mutex_lock_wait () from /lib/tls/libc.so.6 > #2 0x008efc0b in _L_mutex_lock_4191 () from /lib/tls/libc.so.6 > #3 0x08010000 in ?? () > #4 0x00000000 in ?? () > > That's literally all of it, no zsh source files at all, so I suspect > the job has already exited and is a zombie stuck on that mutex. ~ # ps auxwwwf | fgrep -B10 -A2 Src/zsh abe 18988 0.0 0.1 41920 9216 ? SNs 01:53 0:00 \_ /usr/bin/perl /usr/bin/debuild -uc -us -B -j5 abe 19029 0.0 0.0 8964 1388 ? SN 01:53 0:00 \_ tee ../zsh_5.0.6-2_amd64.build abe 19030 0.0 0.1 41788 11232 ? SN 01:53 0:00 \_ /usr/bin/perl /usr/bin/dpkg-buildpackage -rfakeroot -D -us -uc -i -j3 -B -j5 abe 13060 0.0 0.0 22632 3220 ? SN 01:54 0:00 \_ /bin/bash /usr/bin/fakeroot debian/rules binary-arch abe 13075 0.0 0.0 13444 2424 ? SN 01:54 0:00 \_ /usr/bin/make -f debian/rules binary-arch abe 22865 0.0 0.0 8476 808 ? SN 01:55 0:00 \_ /bin/sh -c HOME="/home/abe/zsh/zsh/obj-static/testhome" dh_auto_test -B obj-static --parallel || true abe 22866 0.0 0.1 47996 13312 ? SN 01:55 0:00 \_ /usr/bin/perl -w /usr/bin/dh_auto_test -B obj-static --parallel abe 22875 0.0 0.0 13444 2464 ? SN 01:55 0:00 \_ make -j5 test abe 22876 0.0 0.0 8476 828 ? SN 01:55 0:00 \_ /bin/sh -c cd Test ; make check abe 22877 0.0 0.0 13444 2284 ? SN 01:55 0:00 \_ make check abe 22879 0.0 0.0 8476 924 ? SN 01:55 0:00 \_ /bin/sh -c if ZTST_testlist="`for f in ../../Test/*.ztst; \ do echo $f; done`" \ ZTST_srcdir="../../Test" \ ZTST_exe=../Src/zsh \ ../Src/zsh +Z -f ../../Test/runtests.zsh; then \ stat=0; \ else \ stat=1; \ fi; \ sleep 1; \ rm -rf Modules .zcompdump; \ exit $stat abe 22881 0.0 0.0 7428 2120 ? SN 01:55 0:00 \_ ../Src/zsh +Z -f ../../Test/runtests.zsh abe 23664 0.0 0.0 8084 2812 ? SN 01:55 0:00 \_ ../Src/zsh +Z -f ../../Test/ztst.zsh ../../Test/A05execution.ztst This means the issue occurred while running the test suite against the static build of the test suite. I'll tomorrow check the other failed builds if that was the case there, too. If so, I'll disable the test suite for the static build completely and hope that's the fix for now. (We currently ignore the result for the static build anyways since C02cond.ztst always fails there. See some other mail from me several months if not a year ago.) So what's different to Bart's case are the backtraces, latest children first: ~ # gdb -p 23664 GNU gdb (Debian 7.7.1+dfsg-3) 7.7.1 [...] Attaching to process 23664 Reading symbols from /home/abe/zsh/zsh/obj-static/Src/zsh...done. 0x000000000054f0eb in __lll_lock_wait_private () (gdb) bt #0 0x000000000054f0eb in __lll_lock_wait_private () #1 0x000000000050d551 in _L_lock_3682 () #2 0x0000000000508b1b in _int_free () #3 0x00000000004349cd in freejob (jn=jn@entry=0x131e430, deleting=deleting@entry=1) at ../../Src/jobs.c:1240 #4 0x0000000000433578 in deletejob (jn=jn@entry=0x131e430, disowning=disowning@entry=0) at ../../Src/jobs.c:1295 #5 0x00000000004339ba in printjob (jn=jn@entry=0x131e430, lng=0, synch=synch@entry=0) at ../../Src/jobs.c:1146 #6 0x00000000004348d0 in update_job (jn=0x131e430) at ../../Src/jobs.c:565 #7 0x00000000004636e1 in wait_for_processes () at ../../Src/signals.c:515 #8 0x0000000000463df5 in zhandler (sig=17) at ../../Src/signals.c:594 #9 #10 0x0000000000509038 in _int_free () #11 0x0000000000417f92 in execshfunc (shf=0x1374c10, args=0x7f847b7567c8) at ../../Src/exec.c:4434 #12 0x00000000004184b0 in execshfunc (args=0x7f847b7567c8, shf=0x1374c10) at ../../Src/exec.c:4310 #13 execfuncdef (state=0x7fff6eb724f0, do_exec=) at ../../Src/exec.c:4319 #14 0x0000000000415cdd in execsimple (state=state@entry=0x7fff6eb724f0) at ../../Src/exec.c:1120 #15 0x000000000041cfb0 in execlist (state=state@entry=0x7fff6eb724f0, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1259 #16 0x000000000041d540 in execode (p=p@entry=0x1339300, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0, context=context@entry=0x59b98b "shfunc") at ../../Src/exec.c:1070 #17 0x0000000000417610 in runshfunc (prog=0x1339300, wrap=0x0, name=0x7f847b75b5e0 "ZTST_execchunk") at ../../Src/exec.c:4895 #18 0x0000000000417bd6 in doshfunc (shfunc=shfunc@entry=0x13390d0, doshargs=doshargs@entry=0x7f847b75b498, noreturnval=noreturnval@entry=0) at ../../Src/exec.c:4775 #19 0x0000000000417f80 in execshfunc (shf=0x13390d0, args=0x7f847b75b498) at ../../Src/exec.c:4432 #20 0x000000000041b7d9 in execshfunc (args=, shf=) at ../../Src/exec.c:4400 #21 execcmd (state=0x7fff6eb73720, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3269 #22 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb73720, pcode=pcode@entry=5507, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #23 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb73720, slcode=, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #24 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb73720, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #25 0x000000000043d471 in execif (state=0x7fff6eb73720, do_exec=0) at ../../Src/loop.c:525 #26 0x000000000041ac06 in execcmd (state=0x7fff6eb73720, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3232 #27 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb73720, pcode=pcode@entry=5123, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #28 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb73720, slcode=, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #29 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb73720, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #30 0x000000000043d0bd in execwhile (state=0x7fff6eb73720, do_exec=) at ../../Src/loop.c:420 #31 0x000000000041ac06 in execcmd (state=0x7fff6eb73720, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3232 #32 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb73720, pcode=pcode@entry=323, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #33 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb73720, slcode=, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #34 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb73720, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #35 0x000000000041d540 in execode (p=p@entry=0x1339440, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0, context=context@entry=0x59b98b "shfunc") at ../../Src/exec.c:1070 #36 0x0000000000417610 in runshfunc (prog=0x1339440, wrap=0x0, name=0x7f847b75a910 "ZTST_test") at ../../Src/exec.c:4895 #37 0x0000000000417bd6 in doshfunc (shfunc=shfunc@entry=0x1338820, doshargs=doshargs@entry=0x7f847b75a8b0, noreturnval=noreturnval@entry=0) at ../../Src/exec.c:4775 #38 0x0000000000417f80 in execshfunc (shf=0x1338820, args=0x7f847b75a8b0) at ../../Src/exec.c:4432 #39 0x000000000041b7d9 in execshfunc (args=, shf=) at ../../Src/exec.c:4400 #40 execcmd (state=0x7fff6eb74980, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3269 #41 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb74980, pcode=pcode@entry=32515, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #42 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb74980, slcode=, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #43 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb74980, dont_change_job=dont_change_job@entry=1, exiting=0) at ../../Src/exec.c:1261 #44 0x000000000043d5e7 in execcase (state=0x7fff6eb74980, do_exec=0) at ../../Src/loop.c:603 #45 0x000000000041ac06 in execcmd (state=0x7fff6eb74980, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=18, last1=0) at ../../Src/exec.c:3232 #46 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb74980, pcode=pcode@entry=31491, how=how@entry=18, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #47 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb74980, slcode=, how=how@entry=18, last1=0) at ../../Src/exec.c:1478 #48 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb74980, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #49 0x000000000043d0bd in execwhile (state=0x7fff6eb74980, do_exec=) at ../../Src/loop.c:420 #50 0x000000000041ac06 in execcmd (state=0x7fff6eb74980, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=18, last1=0) at ../../Src/exec.c:3232 #51 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb74980, pcode=pcode@entry=31427, how=how@entry=18, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #52 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb74980, slcode=, how=how@entry=18, last1=0) at ../../Src/exec.c:1478 #53 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb74980, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #54 0x000000000041d540 in execode (p=p@entry=0x7f847b75a230, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0, context=context@entry=0x59cf89 "toplevel") at ../../Src/exec.c:1070 #55 0x000000000042e00c in loop (toplevel=toplevel@entry=1, justonce=justonce@entry=0) at ../../Src/init.c:185 #56 0x00000000004313b6 in zsh_main (argc=, argv=) at ../../Src/init.c:1625 #57 0x00000000004f1840 in __libc_start_main () #58 0x0000000000401c87 in _start () ~ # gdb -p 22881 GNU gdb (Debian 7.7.1+dfsg-3) 7.7.1 [...] Attaching to process 22881 Reading symbols from /home/abe/zsh/zsh/obj-static/Src/zsh...done. 0x00000000004f8746 in sigsuspend () (gdb) bt #0 0x00000000004f8746 in sigsuspend () #1 0x0000000000463296 in signal_suspend (sig=sig@entry=17, wait_cmd=wait_cmd@entry=0) at ../../Src/signals.c:375 #2 0x0000000000434e46 in zwaitjob (job=, wait_cmd=0) at ../../Src/jobs.c:1454 #3 0x0000000000435527 in waitjobs () at ../../Src/jobs.c:1499 #4 0x000000000041c6eb in execpline (state=state@entry=0x7fffd409efc0, slcode=, how=, how@entry=2, last1=0) at ../../Src/exec.c:1554 #5 0x000000000041d271 in execlist (state=state@entry=0x7fffd409efc0, dont_change_job=dont_change_job@entry=1, exiting=0) at ../../Src/exec.c:1261 #6 0x000000000043c581 in execfor (state=0x7fffd409efc0, do_exec=0) at ../../Src/loop.c:164 #7 0x000000000041ac06 in execcmd (state=0x7fffd409efc0, input=8, input@entry=0, output=output@entry=0, how=-1, how@entry=18, last1=368) at ../../Src/exec.c:3232 #8 0x000000000041bcfe in execpline2 (state=state@entry=0x7fffd409efc0, pcode=pcode@entry=771, how=how@entry=18, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #9 0x000000000041c0c6 in execpline (state=state@entry=0x7fffd409efc0, slcode=, how=how@entry=18, last1=0) at ../../Src/exec.c:1478 #10 0x000000000041d271 in execlist (state=state@entry=0x7fffd409efc0, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #11 0x000000000041d540 in execode (p=p@entry=0x7fbe3410a948, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0, context=context@entry=0x59cf89 "toplevel") at ../../Src/exec.c:1070 #12 0x000000000042e00c in loop (toplevel=toplevel@entry=1, justonce=justonce@entry=0) at ../../Src/init.c:185 #13 0x00000000004313b6 in zsh_main (argc=, argv=) at ../../Src/init.c:1625 #14 0x00000000004f1840 in __libc_start_main () #15 0x0000000000401c87 in _start () Hope this helps! For now the process is still there and I can still do some more investigation -- if it survives my Thinkpad going to Suspend-to-RAM when I go to work in a few hours. Kind regards, Axel -- /~\ Plain Text Ribbon Campaign | Axel Beckert \ / Say No to HTML in E-Mail and News | abe@deuxchevaux.org (Mail) X See http://www.nonhtmlmail.org/campaign.html | abe@noone.org (Mail+Jabber) / \ I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)