From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26628 invoked by alias); 5 Aug 2015 06:54:09 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 35984 Received: (qmail 4824 invoked from network); 5 Aug 2015 06:54:06 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.0 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:message-id:date:in-reply-to:comments :references:to:subject:mime-version:content-type; bh=CS6vegC1wFVz3DE92bgwTgjbjgcZghrk/dDx33tDPL4=; b=mqAoj60kWl4EMpsgQufv+QRCez/nI14wg2EHChykcjzEji8c3VyfsamzUI55EknRsG rXtE1szMws8yfTPuXZ8tHwni7GcFD+cYXvTcZMBZ5V4BTTxdGYU71R0GIQea9rlpgds0 Ozf7ihvz/EOIkZHieYxCcVaLOHtCaqDGp9Nr52aHcFd13LzSwEa6K/esT4cpl7fDCiV7 b1/kF5xrYk5R2CG7e+ZeUK0sk9fJeWEdtuOIy2J/irwAVCX6UPk4uDq8g6WU1vCrxZIe 0i+lLo+5yfy9BwEv9IAZP/hjslz2fuuveRJzl+B45muHYC7c91OqQ9wDRbiDj4MHIlKF uvYQ== X-Gm-Message-State: ALoCoQlX71qScxZatQIB2T4oiMBn2ajxCQjcSOrmKbz12l9oBSRKjxJK+I1KTPO6Y1LoFQZYYLwq X-Received: by 10.202.87.22 with SMTP id l22mr6554053oib.91.1438757644212; Tue, 04 Aug 2015 23:54:04 -0700 (PDT) From: Bart Schaefer Message-Id: <150804235400.ZM9958@torch.brasslantern.com> Date: Tue, 4 Aug 2015 23:53:59 -0700 In-Reply-To: Comments: In reply to Mathias Fredriksson "Re: Deadlock when receiving kill-signal from child process" (Aug 5, 12:52am) References: <150803085228.ZM24837@torch.brasslantern.com> <150803135818.ZM24977@torch.brasslantern.com> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: zsh-workers@zsh.org Subject: Re: Deadlock when receiving kill-signal from child process MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii On Aug 5, 12:52am, Mathias Fredriksson wrote: } } I have however managed to get a dump with strace on Gentoo Based on this strace plus a GDB stack trace Mathias sent me off-list, I think the problem may be here: 1415 zwaitjob(int job, int wait_cmd) 1416 { 1417 int q = queue_signal_level(); 1418 Job jn = jobtab + job; 1419 1420 dont_queue_signals(); 1421 child_block(); /* unblocked during signal_suspend() */ 1422 queue_traps(wait_cmd); ... 1440 while (!errflag && jn->stat && 1441 !(jn->stat & STAT_DONE) && 1442 !(interact && (jn->stat & STAT_STOPPED))) { 1443 signal_suspend(SIGCHLD, wait_cmd); I suspect what's happening is that the child represented by "job" exits during dont_queue_signals(), which is a macro that expands to a loop calling zhandler(), which will process TRAPUSR1 (or other traps). Somehow this results in jn->stat never being marked STAT_DONE. Perhaps this happens because the "thisjob" global gets temporarily changed in the TRAP* function? Anyway signal_suspend(SIGCHLD, wait_cmd) is then called when there are no children left, so we never receive another SIGCHLD to break out of the while-loop, and even if we do come out of signal_suspend() the while-loop goes around and we block again. I'm not sure what to do if this is in fact the problem, because it e.g. calling child_block() is before dont_queue_signals() has other problems. However, it's also possible that a child has exited even before its job table entry has been created. One way to find out if that has happened is this patch: diff --git a/Src/signals.c b/Src/signals.c index 3950ad1..d72c7d6 100644 --- a/Src/signals.c +++ b/Src/signals.c @@ -519,6 +519,7 @@ wait_for_processes(void) * will get added on to the next found process that * terminates. */ + zwarn("no job table entry for pid %d", pid); get_usage(); } /* Mathias, if you could apply that patch and try again to reproduce the deadlock, it might tell us something. -- Barton E. Schaefer