From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10572 invoked from network); 5 Dec 2007 00:30:38 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 5 Dec 2007 00:30:38 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 72124 invoked from network); 4 Dec 2007 20:30:33 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 4 Dec 2007 20:30:33 -0000 Received: (qmail 7392 invoked by alias); 4 Dec 2007 20:30:30 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24158 Received: (qmail 7373 invoked from network); 4 Dec 2007 20:30:29 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 4 Dec 2007 20:30:29 -0000 Received: (qmail 71890 invoked from network); 4 Dec 2007 20:30:29 -0000 Received: from mtaout02-winn.ispmail.ntl.com (81.103.221.48) by a.mx.sunsite.dk with SMTP; 4 Dec 2007 20:30:22 -0000 Received: from aamtaout01-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com with ESMTP id <20071204203042.XLKI25022.mtaout02-winn.ispmail.ntl.com@aamtaout01-winn.ispmail.ntl.com>; Tue, 4 Dec 2007 20:30:42 +0000 Received: from pws-pc.ntlworld.com ([82.6.96.116]) by aamtaout01-winn.ispmail.ntl.com with SMTP id <20071204203048.ZZWG219.aamtaout01-winn.ispmail.ntl.com@pws-pc.ntlworld.com>; Tue, 4 Dec 2007 20:30:48 +0000 Date: Tue, 4 Dec 2007 20:30:17 +0000 From: Peter Stephenson To: Guillaume Chazarain , zsh-workers@sunsite.dk Subject: Re: deadlock caused by gettext usage in a signal handler Message-Id: <20071204203017.35a29727.p.w.stephenson@ntlworld.com> In-Reply-To: <20071130203534.1d1ea29c@inria.fr> References: <20071130203534.1d1ea29c@inria.fr> X-Mailer: Sylpheed 2.3.1 (GTK+ 2.10.14; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 30 Nov 2007 20:35:34 +0100 Guillaume Chazarain wrote: > I just had a Zsh process (using zsh-4.2.6-6.fc7) deadlock, the > backtrace seems to show it is initializing the gettext infrastructure > to print "Input/output error" in a signal handler. OK, so after Guillaume's last point I've been looking at this at a more fundamental level. The shell tries to queue signals any time it might be doing something that could causes problems in the signal handler. This seems to be reasonable; at least it's a long time since we had an obvious bug with this (we've had plenty in signals more widely). Looking to see why what was going on here wasn't safe I noticed... ... > #12 0x0000000000465540 in zhandler (sig=17) at signals.c:521 > #13 > #14 0x0000003c3f030afa in *__GI___sigsuspend (set=0x7fff630adc60) > at ../sysdeps/unix/sysv/linux/sigsuspend.c:63 ... > #23 0x000000000040ddb6 in zexit (val=1, from_where=0) at builtin.c:4187 > #24 0x0000000000465637 in zhandler (sig=-4) at signals.c:540 > #25 > #26 0x0000003c3f097642 in __libc_fork () > at ../nptl/sysdeps/unix/sysv/linux/fork.c:127 The shell is running at a supposedly not critical point (actually forking) when it gets a signal. I don't know what -4 is supposed to be, but possibly it's SIGINT with some extra flags (only SIGHUP, SIGINT and SIGALRM call zexit()). Then it tries to exit, running the exit scripts. The problem happens when it's handling a SIGCHLD from something it's running. I still don't understand why that's hairy here, however. The first zhandler() has basically finished what it's doing and handed over to zexit() to exit the shell. That leaves me wondering if forking might be the problem; do we need to queue signals around there? It's not obvious why that would be. There remains my simple plan B of running strerror() once immediately after setting the locale to do any one-off initialization, but I'm starting to think the issue is more widespread. -- Peter Stephenson Web page now at http://homepage.ntlworld.com/p.w.stephenson/