zsh-workers
 help / color / mirror / code / Atom feed
From: "Bart Schaefer" <schaefer@candle.brasslantern.com>
To: zsh-workers@sunsite.auc.dk
Subject: Final (?) info on signals/crashes when suspending "mutt" function
Date: Sun, 27 Jun 1999 07:03:17 +0000	[thread overview]
Message-ID: <990627070317.ZM7630@candle.brasslantern.com> (raw)

Jump to the end for the big news that may finally get this fixed.  I've
been writing this message incrementally between debugging passes, so you
might as well get the whole play-by-play.

Recall that Jos Backus reported that suspending the function

    mutt () {
	command mutt "$@"
	echotc rs
    }

cause zsh to behave badly.  Sven has sent several patches but none of them
have completely fixed the problem.  Attempting to debug this, I've been
running gdb on zsh.  I reproduced the problem but so far I'm only able to
break at the point at which the SIGSTOP is received, so I'm not sure who
is sending that signal -- however, the parent zsh received first SIGSTOP
and *then* SIGTSTP when I hit ^Z, which is very suspicious.

However, because I was in gdb (attached to a PID from another xterm) I was
able to make zsh continue after each signal (so zsh's xterm never got hung).
Continuing through the second (TSTP) signal, I ended up with this:

zagzig% mutt () {
function>       command mutt "$@"
function>       echotc rs
function> }
zagzig% mutt
zsh: suspended (signal)  mutt
zagzig% pstree $$
zsh-+-mutt
    `-pstree
zagzig% fg
[1]  - trace trap (core dumped)  mutt

Simultaneously in the gdb terminal, the parent zsh got a SIGSEGV because it
tried to strcmp() a bad job table entry.  Here's the stack trace:

(gdb) where
#0  strcmp (p1=0x0, p2=0x80bfe70 "/usr/src/local/zsh/zsh-3.0.6-pre")
    at ../sysdeps/generic/strcmp.c:36
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
#2  0x804a8c3 in execbuiltin (args=0x80c2710, bn=0x80b0ea0) at builtin.c:186
#3  0x805d7d3 in execcmd (cmd=0x80c26f0, input=0, output=0, how=2, last1=2)
    at exec.c:1779
#4  0x805af5e in execpline2 (pline=0x80c2740, how=2, input=0, output=0, 
    last1=0) at exec.c:912
#5  0x805a5b0 in execpline (l=0x80c26d8, how=2, last1=0) at exec.c:739
#6  0x805a183 in execlist (list=0x80c2750, dont_change_job=0, exiting=0)
    at exec.c:612
#7  0x806bee0 in loop (toplevel=1, justonce=0) at init.c:143
#8  0x806bbe4 in main (argc=2, argv=0xbffff6ec) at init.c:75
(gdb) up
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
629			if (strcmp(jobtab[job].pwd, pwd)) {
(gdb) p job
$1 = 1
(gdb) p jobtab[1]
$3 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p jobtab[0]
$4 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p curjob
$5 = 2

Somewhere zsh has completely lost track of two (?) jobs, and failed to reset
curjob to -1.

Now, oddly, if I change the function to be:

    mutt() {
	cd /tmp
	command mutt "$@"
	echotc rs
    }

I still get the SIGSTOP followed by the SIGTSTP, but now zsh is able to
correctly "fg" the job:

zagzig% mutt () {
        cd /tmp
        command mutt "$@"
        echotc rs
}
zagzig% mutt
zsh: suspended (signal)  mutt
(pwd now: /tmp)
zagzig% cd -
/usr/src/local/zsh/zsh-3.0.6-pre
zagzig% fg
[1]  - continued  mutt
zsh: suspended (signal)  mutt
zagzig% fg
[1]  - continued  mutt

The extra builtin has caused something different to happen.  Following
the second "fg" I quit mutt with "q" -- and now zsh is hung, blocked in
sigsuspend() called from waitjob(); but that may be a side effect of gdb.

The strange thing is, I can't tell where the heck that SIGSTOP is coming
from.  I've even tried putting in debug print statements around places
where zsh performs a kill() or killpg(), and I don't get any output!  Is
some other process (mutt itself?) sending a SIGSTOP to the process group?

YES!  That's IT!  MUTT is calling kill(0, SIGSTOP) and blowing its parent
zsh out of the water!  Confirmed by changing "command" to "strace" in the
function above.  Mutt expects to be the process group leader, but is not.

So that pretty much tears it.  There is no way short of forking a "watcher"
subshell for EVERY external process to handle both:
(1) badly-behaved programs whose exit status does not reveal that they died
    from a signal, and
(2) badly-behaved programs that send uncatchable signals to their entire
    process group even when they are not the group leader.

The failure in case (1) is far less catastrophic than case (2), so I think
the right solution is to back off to the behavior from patch 6707 (that is,
scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

I don't know, however, if that's directly related to the bogus curjob value
and "fg" crash noted above.  Probably so, but ...

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


             reply	other threads:[~1999-06-27  8:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-06-27  7:03 Bart Schaefer [this message]
1999-06-27  8:41 Bart Schaefer
1999-06-27 13:21 ` Peter Stephenson
1999-06-27 16:45   ` Bart Schaefer
1999-06-28  7:04 Sven Wischnowsky
1999-06-28  8:14 ` Andrej Borsenkow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=990627070317.ZM7630@candle.brasslantern.com \
    --to=schaefer@candle.brasslantern.com \
    --cc=zsh-workers@sunsite.auc.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).