zsh-workers
 help / color / mirror / code / Atom feed
* deadlock caused by gettext usage in a signal handler
@ 2007-11-30 19:35 Guillaume Chazarain
  2007-12-03 22:43 ` Peter Stephenson
  2007-12-04 20:30 ` Peter Stephenson
  0 siblings, 2 replies; 20+ messages in thread
From: Guillaume Chazarain @ 2007-11-30 19:35 UTC (permalink / raw)
  To: zsh-workers

Hello,

I just had a Zsh process (using zsh-4.2.6-6.fc7) deadlock, the
backtrace seems to show it is initializing the gettext infrastructure
to print "Input/output error" in a signal handler.

As a solution, avoiding gettext() in signal handlers seems harder than
forcing gettext() initialization before its first usage, but the cost
may be prohibitive.

(gdb) bt
#0  0x0000003c3f0dcf78 in __lll_mutex_lock_wait () from /lib64/libc.so.6
#1  0x0000003c3f074ced in _L_lock_15106 () from /lib64/libc.so.6
#2  0x0000003c3f073d7b in *__GI___libc_realloc (oldmem=0x3c3f34c960, 
    bytes=2048) at malloc.c:3697
#3  0x0000003c3f02d2bc in read_alias_file (fname=<value optimized out>, 
    fname_len=<value optimized out>) at localealias.c:309
#4  0x0000003c3f02d58e in _nl_expand_alias (name=0x7fff630ad470 "fr_FR.UTF-8")
    at localealias.c:189
#5  0x0000003c3f02bdf7 in _nl_find_domain (
    dirname=0x3c3f11efb0 "/usr/share/locale", 
    locale=0x7fff630ad470 "fr_FR.UTF-8", 
    domainname=0x7fff630ad490 "LC_MESSAGES/libc.mo", domainbinding=0x0)
    at finddomain.c:120
#6  0x0000003c3f02b64f in __dcigettext (domainname=0x3c3f11b86b "libc", 
    msgid1=0x3c3f11ba3a "Input/output error", msgid2=0x0, plural=0, n=0, 
    category=5) at dcigettext.c:627
#7  0x0000003c3f07713c in *__GI___strerror_r (errnum=5, buf=0x0, buflen=0)
    at _strerror.c:65
#8  0x0000003c3f076f7e in strerror (errnum=1060424032) at strerror.c:33
#9  0x0000000000474507 in zerrmsg (fmt=<value optimized out>, str=0x0, num=5)
    at utils.c:172
#10 0x0000000000474aab in zerr (fmt=0x3c3f34c960 "\002", str=0x0, num=2064)
    at utils.c:60
#11 0x000000000043cad8 in update_job (jn=0x892140) at jobs.c:361
#12 0x0000000000465540 in zhandler (sig=17) at signals.c:521
#13 <signal handler called>
#14 0x0000003c3f030afa in *__GI___sigsuspend (set=0x7fff630adc60)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
#15 0x0000000000465734 in signal_suspend (sig=17, sig2=0) at signals.c:367
#16 0x000000000043c74c in zwaitjob (job=<value optimized out>, sig=0)
    at jobs.c:1145
#17 0x000000000043c8f8 in waitjobs () at jobs.c:1180
#18 0x0000000000425de2 in execpline (state=<value optimized out>, 
    slcode=<value optimized out>, how=18, last1=0) at exec.c:1146
#19 0x0000000000426a70 in execlist (state=0x7fff630adfb0, dont_change_job=0, 
    exiting=0) at exec.c:877
#20 0x0000000000426c85 in execode (p=0x2aaaae93b338, dont_change_job=0, 
    exiting=0) at exec.c:777
#21 0x000000000043738f in loop (toplevel=0, justonce=0) at init.c:167
#22 0x000000000043794c in source (s=0x4780af "/etc/zlogout") at init.c:1045
#23 0x000000000040ddb6 in zexit (val=1, from_where=0) at builtin.c:4187
#24 0x0000000000465637 in zhandler (sig=-4) at signals.c:540
#25 <signal handler called>
#26 0x0000003c3f097642 in __libc_fork ()
    at ../nptl/sysdeps/unix/sysv/linux/fork.c:127
#27 0x0000000000421e85 in zfork (tv=0x7fff630aea90) at exec.c:231
#28 0x00000000004239b1 in execcmd (state=0x7fff630afc80, input=0, output=0, 
    how=18, last1=2) at exec.c:2127
#29 0x0000000000425557 in execpline2 (state=0x7fff630afc80, 
    pcode=<value optimized out>, how=18, input=0, output=0, last1=0)
    at exec.c:1285
#30 0x00000000004259ea in execpline (state=0x7fff630afc80, 
    slcode=<value optimized out>, how=18, last1=0) at exec.c:1071
#31 0x0000000000426a70 in execlist (state=0x7fff630afc80, dont_change_job=0, 
    exiting=0) at exec.c:877
#32 0x0000000000426c85 in execode (p=0x2aaaae93b1a0, dont_change_job=0, 
    exiting=0) at exec.c:777
#33 0x000000000043738f in loop (toplevel=1, justonce=0) at init.c:167
#34 0x0000000000438051 in zsh_main (argc=<value optimized out>, 
    argv=0x7fff630afdf8) at init.c:1280
#35 0x0000003c3f01dab4 in __libc_start_main (main=0x40cca0 <main>, argc=1, 
    ubp_av=0x7fff630afdf8, init=<value optimized out>, 
    fini=<value optimized out>, rtld_fini=<value optimized out>, 
    stack_end=0x7fff630afde8) at libc-start.c:222
#36 0x000000000040cc09 in _start ()

Thanks.

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-11-30 19:35 deadlock caused by gettext usage in a signal handler Guillaume Chazarain
@ 2007-12-03 22:43 ` Peter Stephenson
  2007-12-04  1:39   ` Paul Ackersviller
  2007-12-04 18:02   ` Guillaume Chazarain
  2007-12-04 20:30 ` Peter Stephenson
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Stephenson @ 2007-12-03 22:43 UTC (permalink / raw)
  To: zsh-workers; +Cc: Guillaume Chazarain

On Fri, 30 Nov 2007 20:35:34 +0100
Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> I just had a Zsh process (using zsh-4.2.6-6.fc7) deadlock, the
> backtrace seems to show it is initializing the gettext infrastructure
> to print "Input/output error" in a signal handler.
> 
> As a solution, avoiding gettext() in signal handlers seems harder than
> forcing gettext() initialization before its first usage, but the cost
> may be prohibitive.

I'm not sure how we should handle this.  zsh doesn't ever use gettext()
directly and the interface it is using (strerror()) doesn't document
what needs to be initialised and when (nor even that it calls
gettext()).  I suppose a call to strerror() right at the top of the main
function might do the trick, but there's no guarantee and that seems
rather a hack.  Alternatively, we could use strerror_r() with, say, 80
characters off the stack, but there's no guarantee there aren't other
calls with the same problem.  So I can see possible things to do but
nothing that looks reliable.

I was going to say using strerror_r() was at least hack free, but it
turns out it isn't... there are two incompatible versions whose
prototypes differ only in return value, so it's difficult to test for
them.  So I've tried to handle both.  Yuk.  Suggestions welcome.

To get diff output for a test failure while I was writing this I needed
to use "diff -a" in ztst.zsh.  I'm not sure if the option is universal,
however.

Index: configure.ac
===================================================================
RCS file: /cvsroot/zsh/zsh/configure.ac,v
retrieving revision 1.82
diff -u -r1.82 configure.ac
--- configure.ac	24 Nov 2007 01:56:49 -0000	1.82
+++ configure.ac	3 Dec 2007 22:29:29 -0000
@@ -1157,7 +1157,7 @@
 	       getlogin getpwent getpwnam getpwuid getgrgid getgrnam \
 	       initgroups nis_list \
 	       setuid seteuid setreuid setresuid setsid \
-	       memcpy memmove strstr strerror \
+	       memcpy memmove strstr strerror strerror_r \
 	       getrlimit getrusage \
 	       setlocale \
 	       uname \
Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.171
diff -u -r1.171 utils.c
--- Src/utils.c	20 Nov 2007 09:55:10 -0000	1.171
+++ Src/utils.c	3 Dec 2007 22:29:31 -0000
@@ -255,6 +255,12 @@
 {
     const char *str;
     int num;
+#ifdef HAVE_STRERROR_R
+#define ERRBUFSIZE (80)
+    int olderrno;
+    char errbuf[ERRBUFSIZE];
+#endif
+    char *errmsg;
 
     if ((unset(SHINSTDIN) || locallevel) && lineno)
 	fprintf(file, "%ld: ", (long)lineno);
@@ -304,12 +310,39 @@
 		    errflag = 1;
 		    return;
 		}
+#ifdef HAVE_STRERROR_R
+		/*
+		 * There are two incompatible strerror_r()s floating round.
+		 * The GNU extension refuses to copy the message into the
+		 * buffer if it can return a constant string.  To suppress it
+		 * we need to define _XOPEN_SOURCE to 600.  I don't dare do
+		 * this because we're already depending on _GNU_SOURCE.  So
+		 * try to handle both by looking for errno being set (for the
+		 * standard version failing) or errbuf being left untouched
+		 * (for the GNU version).  One presumes that if strerror_r()
+		 * didn't copy anything to errbuf, then it's safe to
+		 * call strerror() to get the string.
+		 *
+		 * This is a mess, but it's about a decade and half
+		 * too late to shirk from messes in the source.
+		 */
+		olderrno = errno;
+		errno = 0;
+		errbuf[0] = '\0';
+		strerror_r(num, errbuf, ERRBUFSIZE);
+		if (errno || errbuf[0] == '\0')
+		    errmsg = strerror(num);
+		else
+		    errmsg = errbuf;
+		errno = olderrno;
+#else
+		errmsg = strerror(num);
+#endif
 		/* If the message is not about I/O problems, it looks better *
 		 * if we uncapitalize the first letter of the message        */
 		if (num == EIO)
-		    fputs(strerror(num), file);
+		    fputs(errmsg, file);
 		else {
-		    char *errmsg = strerror(num);
 		    fputc(tulower(errmsg[0]), file);
 		    fputs(errmsg + 1, file);
 		}


-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-03 22:43 ` Peter Stephenson
@ 2007-12-04  1:39   ` Paul Ackersviller
  2007-12-04 18:02   ` Guillaume Chazarain
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Ackersviller @ 2007-12-04  1:39 UTC (permalink / raw)
  To: zsh-workers

On Mon, Dec 03, 2007 at 10:43:24PM +0000, Peter Stephenson wrote:
> On Fri, 30 Nov 2007 20:35:34 +0100
> Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> > I just had a Zsh process (using zsh-4.2.6-6.fc7) deadlock, the
> > backtrace seems to show it is initializing the gettext infrastructure
> > to print "Input/output error" in a signal handler.

...most of patch snipped for conciseness...

> -		    fputs(strerror(num), file);
> +		    fputs(errmsg, file);
>  		else {
> -		    char *errmsg = strerror(num);
>  		    fputc(tulower(errmsg[0]), file);
>  		    fputs(errmsg + 1, file);

As this was reported on 4.2.6, I figured you'd want it on the 4.2
branch as well.  However the entire patch doesn't merge cleanly,
due to zerrmsg having its signature changed in 4.3 (with a FILE
pointer as the first parameter).  It seemed obvious enough to change
`file' in the above hunk to `stderr', so I've committed that to the
zsh-4_2-patches branch.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-03 22:43 ` Peter Stephenson
  2007-12-04  1:39   ` Paul Ackersviller
@ 2007-12-04 18:02   ` Guillaume Chazarain
  2007-12-04 18:24     ` Peter Stephenson
  1 sibling, 1 reply; 20+ messages in thread
From: Guillaume Chazarain @ 2007-12-04 18:02 UTC (permalink / raw)
  To: zsh-workers

Le Mon, 3 Dec 2007 22:43:24 +0000,
Peter Stephenson <p.w.stephenson@ntlworld.com> a écrit :

> zsh doesn't ever use gettext() directly

I overlooked that, great that Zsh does not use strsignal and perror ;-)
Zsh seems to use dlerror though, hopefully not in a signal handler.

> Alternatively, we could use strerror_r()

strerror_r also calls into gettext, so I don't see how this solves the
problem. I have been caught by the gettext initialization as it's a
bigger malloc consumer than a simple gettext call but according to a
comment by Paul Eggert in:

http://cvs.savannah.gnu.org/viewvc/diffutils/lib/c-stack.c?root=diffutils&view=markup

Do not translate them in the signal handler, since gettext is not
async-signal-safe.

You can see that strerror_r calls gettext in:

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/string/_strerror.c?rev=1.1.2.2&content-type=text/x-cvsweb-markup&cvsroot=glibc

return (char *) _(_sys_errlist_internal[errnum]);

> To get diff output for a test failure while I was writing this I needed

Does that mean you managed to make a reproducible test case?

> to use "diff -a" in ztst.zsh.  I'm not sure if the option is universal,
> however.

Confused, I can't see any diff invocation in your patch. From some
googling, POSIX does not specify it
(http://www.opengroup.org/onlinepubs/009695399/utilities/diff.html),
but the BSDs provide it anyway
(http://www.freebsd.org/cgi/man.cgi?query=diff&apropos=0&sektion=0&manpath=FreeBSD+6.0-RELEASE&format=html),
some use GNU diff.

It seems to me a possible solution would be to rewrite strerror() in Zsh
to initially build a static table of translated error messages as
well as the translation of "Unknown error ". Don't know how much this
would delay the Zsh startup though.

Thanks.

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-04 18:02   ` Guillaume Chazarain
@ 2007-12-04 18:24     ` Peter Stephenson
  2007-12-04 18:43       ` Guillaume Chazarain
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2007-12-04 18:24 UTC (permalink / raw)
  To: Guillaume Chazarain, zsh-workers

Guillaume Chazarain wrote:
> Zsh seems to use dlerror though, hopefully not in a signal handler.

No, the shell is quite cautious what it executes in signal handlers;
system errors are pretty much inevitable, but there's no module handling
there so no call to dlerror.

> strerror_r also calls into gettext, so I don't see how this solves the
> problem.

strerror_r() is the way of ensuring thread safety in printing the
message by providing a buffer into which data is written.  The GNU
library manual entry does actually claim it's thread-safe; that
means that any hidden back-end functions it uses are thread-safe too.

To put it another way, I don't really care how it solves the problem; if
it doesn't it's broken.

There is a possible problem that strerror_r() might need more memory
than we pass it if it's doing random additional things, I suppose, but
the manual doesn't give any guidance.

> > To get diff output for a test failure while I was writing this I needed
> 
> Does that mean you managed to make a reproducible test case?

No, I was testing I hadn't broken anything.

> > to use "diff -a" in ztst.zsh.  I'm not sure if the option is universal,
> > however.
> 
> Confused, I can't see any diff invocation in your patch.

There isn't; I was referring to an existing test that failed while I was
writing the patch: diff assumed the difference in output was binary and
didn't show it.

> It seems to me a possible solution would be to rewrite strerror() in Zsh
> to initially build a static table of translated error messages as
> well as the translation of "Unknown error ". Don't know how much this
> would delay the Zsh startup though.

If strerror_r() really doesn't do what it's supposed to then it needs
fixing.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-04 18:24     ` Peter Stephenson
@ 2007-12-04 18:43       ` Guillaume Chazarain
  2007-12-04 19:43         ` Peter Stephenson
  0 siblings, 1 reply; 20+ messages in thread
From: Guillaume Chazarain @ 2007-12-04 18:43 UTC (permalink / raw)
  To: zsh-workers

Le Tue, 04 Dec 2007 18:24:50 +0000,
Peter Stephenson <pws@csr.com> a écrit :

> Guillaume Chazarain wrote:
> > Zsh seems to use dlerror though, hopefully not in a signal handler.
> 
> No, the shell is quite cautious what it executes in signal handlers;
> system errors are pretty much inevitable, but there's no module handling
> there so no call to dlerror.

OK, great.

> 
> > strerror_r also calls into gettext, so I don't see how this solves the
> > problem.
> 
> strerror_r() is the way of ensuring thread safety in printing the
> message by providing a buffer into which data is written.  The GNU
> library manual entry does actually claim it's thread-safe; that
> means that any hidden back-end functions it uses are thread-safe too.

Thread-safety does not imply async-signal-safety. It means locks and
thread local storage, but these do nothing for signals. It would be
great if sticking an _r would make functions callable in signal
handlers ;-)

There are more details in this blog post:
http://girtby.net/archives/2006/12/18/the-other-kind-of-reentrant

Thanks.

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-04 18:43       ` Guillaume Chazarain
@ 2007-12-04 19:43         ` Peter Stephenson
  2007-12-04 19:49           ` Guillaume Chazarain
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2007-12-04 19:43 UTC (permalink / raw)
  To: Guillaume Chazarain, Zsh Hackers' List

On Tue, 4 Dec 2007 19:43:14 +0100
Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> > > strerror_r also calls into gettext, so I don't see how this solves the
> > > problem.
> > 
> > strerror_r() is the way of ensuring thread safety in printing the
> > message by providing a buffer into which data is written.  The GNU
> > library manual entry does actually claim it's thread-safe; that
> > means that any hidden back-end functions it uses are thread-safe too.
> 
> Thread-safety does not imply async-signal-safety. It means locks and
> thread local storage, but these do nothing for signals. It would be
> great if sticking an _r would make functions callable in signal
> handlers ;-)

Indeed, it's as you say.  So the patch doesn't really help, since that's
the only case we have to worry about.  I don't see an easy way out at
the moment.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-04 19:43         ` Peter Stephenson
@ 2007-12-04 19:49           ` Guillaume Chazarain
  0 siblings, 0 replies; 20+ messages in thread
From: Guillaume Chazarain @ 2007-12-04 19:49 UTC (permalink / raw)
  To: Zsh Hackers' List

Le Tue, 4 Dec 2007 19:43:09 +0000,
Peter Stephenson <p.w.stephenson@ntlworld.com> a écrit :

> I don't see an easy way out at the moment.

Not even the one I proposed in
http://article.gmane.org/gmane.comp.shells.zsh.devel/13888?

rewrite strerror() in Zsh to initially build a static table of
translated error messages as well as the translation of
"Unknown error ".

It's not trivial and speedy, but it should be safe I think.

Thanks.

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-11-30 19:35 deadlock caused by gettext usage in a signal handler Guillaume Chazarain
  2007-12-03 22:43 ` Peter Stephenson
@ 2007-12-04 20:30 ` Peter Stephenson
  2007-12-04 23:09   ` Guillaume Chazarain
  2007-12-06 23:02   ` Guillaume Chazarain
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Stephenson @ 2007-12-04 20:30 UTC (permalink / raw)
  To: Guillaume Chazarain, zsh-workers

On Fri, 30 Nov 2007 20:35:34 +0100
Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> I just had a Zsh process (using zsh-4.2.6-6.fc7) deadlock, the
> backtrace seems to show it is initializing the gettext infrastructure
> to print "Input/output error" in a signal handler.

OK, so after Guillaume's last point I've been looking at this at a more
fundamental level.

The shell tries to queue signals any time it might be doing something
that could causes problems in the signal handler.  This seems to be
reasonable; at least it's a long time since we had an obvious bug with
this (we've had plenty in signals more widely).

Looking to see why what was going on here wasn't safe I noticed...

...
> #12 0x0000000000465540 in zhandler (sig=17) at signals.c:521
> #13 <signal handler called>
> #14 0x0000003c3f030afa in *__GI___sigsuspend (set=0x7fff630adc60)
>     at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
...
> #23 0x000000000040ddb6 in zexit (val=1, from_where=0) at builtin.c:4187
> #24 0x0000000000465637 in zhandler (sig=-4) at signals.c:540
> #25 <signal handler called>
> #26 0x0000003c3f097642 in __libc_fork ()
>     at ../nptl/sysdeps/unix/sysv/linux/fork.c:127

The shell is running at a supposedly not critical point (actually
forking) when it gets a signal.  I don't know what -4 is supposed to be,
but possibly it's SIGINT with some extra flags (only SIGHUP, SIGINT and
SIGALRM call zexit()).  Then it tries to exit, running the exit
scripts.  The problem happens when it's handling a SIGCHLD from
something it's running.

I still don't understand why that's hairy here, however.  The first
zhandler() has basically finished what it's doing and handed over to
zexit() to exit the shell.  That leaves me wondering if forking might be
the problem; do we need to queue signals around there?  It's not obvious
why that would be.

There remains my simple plan B of running strerror() once immediately
after setting the locale to do any one-off initialization, but I'm
starting to think the issue is more widespread.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-04 20:30 ` Peter Stephenson
@ 2007-12-04 23:09   ` Guillaume Chazarain
  2007-12-06 23:02   ` Guillaume Chazarain
  1 sibling, 0 replies; 20+ messages in thread
From: Guillaume Chazarain @ 2007-12-04 23:09 UTC (permalink / raw)
  To: Zsh Hackers' List

Le Tue, 4 Dec 2007 20:30:17 +0000,
Peter Stephenson <p.w.stephenson@ntlworld.com> a écrit :

> but I'm starting to think the issue is more widespread.

Definitely :-(

Here is a stack trace without gettext in it:

0x0000003c3f0dcf78 in __lll_mutex_lock_wait () from /lib64/libc.so.6
#0  0x0000003c3f0dcf78 in __lll_mutex_lock_wait () from /lib64/libc.so.6
#1  0x0000003c3f074cb7 in _L_lock_14740 () from /lib64/libc.so.6
#2  0x0000003c3f073b11 in *__GI___libc_free (mem=0x3c3f34c960) at malloc.c:3620
#3  0x0000000000431f02 in savehistfile (
    fn=0x8a1690 "/user/gchazara/home/.zsh_history", err=1, 
    writeflags=<value optimized out>) at hist.c:2091
#4  0x000000000040dd6d in zexit (val=1, from_where=<value optimized out>)
    at builtin.c:4181
#5  0x0000000000465637 in zhandler (sig=-512) at signals.c:540
#6  <signal handler called>
#7  0x0000003c3f07006a in _int_free (av=0x3c3f34c960, mem=0x8b02c0)
    at malloc.c:4662
#8  0x0000003c3f073b1c in *__GI___libc_free (mem=0xb65d60) at malloc.c:3622
#9  0x0000000000438965 in ingetc () at input.c:261
#10 0x000000000043338d in ihgetc () at hist.c:241
#11 0x000000000044122b in gettok () at lex.c:631
#12 0x0000000000441a38 in yylex () at lex.c:347
#13 0x000000000045cef7 in parse_event () at parse.c:449
#14 0x0000000000437348 in loop (toplevel=1, justonce=0) at init.c:128
#15 0x0000000000438051 in zsh_main (argc=<value optimized out>, 
    argv=0x7ffff4980ef8) at init.c:1280
#16 0x0000003c3f01dab4 in __libc_start_main (main=0x40cca0 <main>, argc=1, 
    ubp_av=0x7ffff4980ef8, init=<value optimized out>, 
    fini=<value optimized out>, rtld_fini=<value optimized out>, 
    stack_end=0x7ffff4980ee8) at libc-start.c:222
#17 0x000000000040cc09 in _start ()

A deadlock with free() this time.

Here is more information about how to reproduce these deadlocks:

Download a snapshot of gsh:
http://repo.or.cz/w/gsh.git?a=snapshot;h=9a90d28f9ad0e64c45202522b26ff6c13f10f795;sf=tgz

Run this loop:

while : ; do ./gsh.py --command=: localhost localhost localhost; done

This assumes that running 'ssh localhost' will lead you to a Zsh
instance. This should accumulate deadlocked Zsh processes that you can
see with: pstree -a -h -p -u $USER|grep '^zsh,'|cut -d, -f2

I don't think it's the first time I saw hung Zsh processes, but this
commit in my gsh software made the problem appears more easily:

http://repo.or.cz/w/gsh.git?a=commitdiff;h=9a90d28f9ad0e64c45202522b26ff6c13f10f795

It replaces builtin echo invocations with actual /bin/echo forkings.

Hope that helps.

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-04 20:30 ` Peter Stephenson
  2007-12-04 23:09   ` Guillaume Chazarain
@ 2007-12-06 23:02   ` Guillaume Chazarain
  2007-12-07 10:35     ` Peter Stephenson
  1 sibling, 1 reply; 20+ messages in thread
From: Guillaume Chazarain @ 2007-12-06 23:02 UTC (permalink / raw)
  To: Zsh Hackers' List

Le Tue, 4 Dec 2007 20:30:17 +0000,
Peter Stephenson <p.w.stephenson@ntlworld.com> a écrit :

> That leaves me wondering if forking might be
> the problem; do we need to queue signals around there?  It's not obvious
> why that would be.

It appears that glibc's fork is playing with locks:

http://sourceware.org/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/fork.c?rev=1.11.2.3&content-type=text/x-cvsweb-markup&cvsroot=glibc

This patch solved the problem for me:

diff --git a/Src/exec.c b/Src/exec.c
index cf79ea8..6f16b9e 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -229,6 +229,7 @@ zfork(struct timeval *tv)
 {
     pid_t pid;
     struct timezone dummy_tz;
+    sigset_t signals;
 
     /*
      * Is anybody willing to explain this test?
@@ -239,7 +240,10 @@ zfork(struct timeval *tv)
     }
     if (tv)
 	gettimeofday(tv, &dummy_tz);
+    sigfillset(&signals);
+    signals = signal_block(signals);
     pid = fork();
+    signal_setmask(signals);
     if (pid == -1) {
 	zerr("fork failed: %e", errno);
 	return -1;

Thanks.

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-06 23:02   ` Guillaume Chazarain
@ 2007-12-07 10:35     ` Peter Stephenson
  2007-12-07 10:46       ` Guillaume Chazarain
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2007-12-07 10:35 UTC (permalink / raw)
  To: Guillaume Chazarain, Zsh Hackers' List

On Fri, 7 Dec 2007 00:02:07 +0100
Guillaume Chazarain <guichaz@yahoo.fr> wrote:
> It appears that glibc's fork is playing with locks:
>
> This patch solved the problem for me:

Thanks for looking at that.  The traditional zsh way is to queue signals
around an operation, but fork() is atomic to the calling code and blocking
should be quicker, so your patch is fine and I've committed it.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-07 10:35     ` Peter Stephenson
@ 2007-12-07 10:46       ` Guillaume Chazarain
  2007-12-07 11:21         ` Bart Schaefer
  2007-12-07 11:29         ` Peter Stephenson
  0 siblings, 2 replies; 20+ messages in thread
From: Guillaume Chazarain @ 2007-12-07 10:46 UTC (permalink / raw)
  To: Zsh Hackers' List

Le Fri, 7 Dec 2007 10:35:33 +0000,
Peter Stephenson <pws@csr.com> a écrit :

> The traditional zsh way is to queue signals
> around an operation

Oh, I didn't know queue_signals().

> but fork() is atomic to the calling code and blocking
> should be quicker

Ok, but I don't mind changing it to queue_signals().

> so your patch is fine and I've committed it.

Thanks, how about reverting the strerror_r one (also in zsh-4.2)?

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-07 10:46       ` Guillaume Chazarain
@ 2007-12-07 11:21         ` Bart Schaefer
  2007-12-07 11:27           ` Peter Stephenson
  2007-12-07 11:29         ` Peter Stephenson
  1 sibling, 1 reply; 20+ messages in thread
From: Bart Schaefer @ 2007-12-07 11:21 UTC (permalink / raw)
  To: Guillaume Chazarain, Zsh Hackers' List

On Dec 7, 11:46am, Guillaume Chazarain wrote:
}
} Ok, but I don't mind changing it to queue_signals().

I think that might be better anyway, because I'm not sure that
sigfillset() et al. are as portable as the code that queue_signals()
encapsulates.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-07 11:21         ` Bart Schaefer
@ 2007-12-07 11:27           ` Peter Stephenson
  2007-12-07 11:57             ` Guillaume Chazarain
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Stephenson @ 2007-12-07 11:27 UTC (permalink / raw)
  To: Guillaume Chazarain, Zsh Hackers' List

Bart Schaefer wrote:
> On Dec 7, 11:46am, Guillaume Chazarain wrote:
> }
> } Ok, but I don't mind changing it to queue_signals().
> 
> I think that might be better anyway, because I'm not sure that
> sigfillset() et al. are as portable as the code that queue_signals()
> encapsulates.

That's already worked around... we use the same code Guillaume just
added in zhandler(), unconditionally.  It's fixed up by definitions in
the header.

However, it's trivial to make it consistent with other places...

Index: Src/exec.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/exec.c,v
retrieving revision 1.125
diff -u -r1.125 exec.c
--- Src/exec.c	7 Dec 2007 10:33:58 -0000	1.125
+++ Src/exec.c	7 Dec 2007 11:25:54 -0000
@@ -229,7 +229,6 @@
 {
     pid_t pid;
     struct timezone dummy_tz;
-    sigset_t signals;
 
     /*
      * Is anybody willing to explain this test?
@@ -240,10 +239,9 @@
     }
     if (tv)
 	gettimeofday(tv, &dummy_tz);
-    sigfillset(&signals);
-    signals = signal_block(signals);
+    queue_signals();
     pid = fork();
-    signal_setmask(signals);
+    unqueue_signals();
     if (pid == -1) {
 	zerr("fork failed: %e", errno);
 	return -1;


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-07 10:46       ` Guillaume Chazarain
  2007-12-07 11:21         ` Bart Schaefer
@ 2007-12-07 11:29         ` Peter Stephenson
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Stephenson @ 2007-12-07 11:29 UTC (permalink / raw)
  To: Zsh Hackers' List, Guillaume Chazarain

Guillaume Chazarain wrote:
> Thanks, how about reverting the strerror_r one (also in zsh-4.2)?

That's probably neater, given the hack to get it to work.  However, it's
marginally neater to keep the errmsg variable.  It makes it look like
I've done something.

Index: configure.ac
===================================================================
RCS file: /cvsroot/zsh/zsh/configure.ac,v
retrieving revision 1.85
diff -u -r1.85 configure.ac
--- configure.ac	7 Dec 2007 02:53:17 -0000	1.85
+++ configure.ac	7 Dec 2007 11:25:52 -0000
@@ -1157,7 +1157,7 @@
 	       getlogin getpwent getpwnam getpwuid getgrgid getgrnam \
 	       initgroups nis_list \
 	       setuid seteuid setreuid setresuid setsid \
-	       memcpy memmove strstr strerror strerror_r \
+	       memcpy memmove strstr strerror \
 	       getrlimit getrusage \
 	       setlocale \
 	       uname \
Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.172
diff -u -r1.172 utils.c
--- Src/utils.c	3 Dec 2007 22:46:11 -0000	1.172
+++ Src/utils.c	7 Dec 2007 11:26:01 -0000
@@ -310,34 +310,7 @@
 		    errflag = 1;
 		    return;
 		}
-#ifdef HAVE_STRERROR_R
-		/*
-		 * There are two incompatible strerror_r()s floating round.
-		 * The GNU extension refuses to copy the message into the
-		 * buffer if it can return a constant string.  To suppress it
-		 * we need to define _XOPEN_SOURCE to 600.  I don't dare do
-		 * this because we're already depending on _GNU_SOURCE.  So
-		 * try to handle both by looking for errno being set (for the
-		 * standard version failing) or errbuf being left untouched
-		 * (for the GNU version).  One presumes that if strerror_r()
-		 * didn't copy anything to errbuf, then it's safe to
-		 * call strerror() to get the string.
-		 *
-		 * This is a mess, but it's about a decade and half
-		 * too late to shirk from messes in the source.
-		 */
-		olderrno = errno;
-		errno = 0;
-		errbuf[0] = '\0';
-		strerror_r(num, errbuf, ERRBUFSIZE);
-		if (errno || errbuf[0] == '\0')
-		    errmsg = strerror(num);
-		else
-		    errmsg = errbuf;
-		errno = olderrno;
-#else
 		errmsg = strerror(num);
-#endif
 		/* If the message is not about I/O problems, it looks better *
 		 * if we uncapitalize the first letter of the message        */
 		if (num == EIO)


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-07 11:27           ` Peter Stephenson
@ 2007-12-07 11:57             ` Guillaume Chazarain
  0 siblings, 0 replies; 20+ messages in thread
From: Guillaume Chazarain @ 2007-12-07 11:57 UTC (permalink / raw)
  To: Zsh Hackers' List

Le Fri, 07 Dec 2007 11:27:12 +0000,
Peter Stephenson <pws@csr.com> a écrit :

> +    queue_signals();
>      pid = fork();
> -    signal_setmask(signals);
> +    unqueue_signals();

OK, tested it, works fine.

The following comment in signals.h made a lot of sense to me though ;-)
* it is probably overkill for zsh to do this

-- 
Guillaume


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-10  2:04 ` Bart Schaefer
@ 2007-12-10  3:17   ` Paul Ackersviller
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Ackersviller @ 2007-12-10  3:17 UTC (permalink / raw)
  To: zsh-workers

On Sun, Dec 09, 2007 at 06:04:22PM -0800, Bart Schaefer wrote:
> On Dec 10, 12:11am, Paul Ackersviller wrote:
> }
> } I've done 24180 on the 4.2 branch, and also added 24169, 24170,
> } 24174, 24187, and 24188.
> 
> I think you could do 24150 as well.

Okay, done.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
  2007-12-10  0:11 Paul Ackersviller
@ 2007-12-10  2:04 ` Bart Schaefer
  2007-12-10  3:17   ` Paul Ackersviller
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Schaefer @ 2007-12-10  2:04 UTC (permalink / raw)
  To: zsh-workers

On Dec 10, 12:11am, Paul Ackersviller wrote:
}
} I've done 24180 on the 4.2 branch, and also added 24169, 24170,
} 24174, 24187, and 24188.

I think you could do 24150 as well.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: deadlock caused by gettext usage in a signal handler
@ 2007-12-10  0:11 Paul Ackersviller
  2007-12-10  2:04 ` Bart Schaefer
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Ackersviller @ 2007-12-10  0:11 UTC (permalink / raw)
  To: zsh-workers; +Cc: Guillaume Chazarain

Peter Stephenson wrote:
> Guillaume Chazarain wrote:
> > Thanks, how about reverting the strerror_r one (also in zsh-4.2)?
> 
> That's probably neater, given the hack to get it to work.  However, it's
> marginally neater to keep the errmsg variable.  It makes it look like
> I've done something.

I've done 24180 on the 4.2 branch, and also added 24169, 24170,
24174, 24187, and 24188.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-12-10  3:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-30 19:35 deadlock caused by gettext usage in a signal handler Guillaume Chazarain
2007-12-03 22:43 ` Peter Stephenson
2007-12-04  1:39   ` Paul Ackersviller
2007-12-04 18:02   ` Guillaume Chazarain
2007-12-04 18:24     ` Peter Stephenson
2007-12-04 18:43       ` Guillaume Chazarain
2007-12-04 19:43         ` Peter Stephenson
2007-12-04 19:49           ` Guillaume Chazarain
2007-12-04 20:30 ` Peter Stephenson
2007-12-04 23:09   ` Guillaume Chazarain
2007-12-06 23:02   ` Guillaume Chazarain
2007-12-07 10:35     ` Peter Stephenson
2007-12-07 10:46       ` Guillaume Chazarain
2007-12-07 11:21         ` Bart Schaefer
2007-12-07 11:27           ` Peter Stephenson
2007-12-07 11:57             ` Guillaume Chazarain
2007-12-07 11:29         ` Peter Stephenson
2007-12-10  0:11 Paul Ackersviller
2007-12-10  2:04 ` Bart Schaefer
2007-12-10  3:17   ` Paul Ackersviller

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).