zsh-workers
 help / color / mirror / code / Atom feed
* zsh hangs with the message "zsh: can't set tty pgrp: not owner".
@ 1999-06-27  6:43 Tanaka Akira
  1999-06-27  8:41 ` Final (?) info on signals/crashes when suspending "mutt" function Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Tanaka Akira @ 1999-06-27  6:43 UTC (permalink / raw)
  To: zsh-workers

zsh-3.1.5-pws-24 hangs with the message "zsh: can't set tty pgrp: not
owner" when it is started by Bourne shell.

Last login: Sun Jun 27 15:20:06 from localhost
Sun Microsystems Inc.   SunOS 5.7       Generic October 1998
$ /app/zsh-3.1.5-pws-24/bin/zsh
zsh: can't set tty pgrp: not owner
^C^C^C

# akr_sh is a test user.

truss reports as following.

stat64("/dev/pts/36", 0xFFBEFAE0)               = 0
open("/dev/pts/36", O_RDWR|O_NOCTTY)            = 3
fcntl(3, F_DUPFD, 0x0000000A)                   = 10
close(3)                                        = 0
ioctl(10, TCGETS, 0x00089F78)                   = 0
getpid()                                        = 25149 [25148]
ioctl(10, TIOCGSID, 0xFFBEFC4C)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCSPGRP, 0xFFBEFCF8)                Err#1 EPERM
kill(25149, SIG#0)                              = 0
zsh: can't set tty pgrp: not owner
write(2, " z s h :   c a n ' t   s".., 35)      = 35
setpgid(0, 0)                                   = 0
getpgrp()                                       = 25149
ioctl(10, TIOCGSID, 0xFFBEFC44)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCGPGRP, 0xFFBEFCAC)                = 0
alarm(0)                                        = 0
sigaction(SIGALRM, 0xFFBEFC30, 0xFFBEFCE0)      = 0
sigfillset(0xFF1B8998)                          = 0
sigprocmask(SIG_BLOCK, 0xFFBEFCD0, 0xFFBEFCC0)  = 0
alarm(1)                                        = 0
sigsuspend(0xFFBEFCB0)          (sleeping...)

# Hm. Why is TIOCSPGRP failed?

truss also reports that zsh hangs with following loop.

    Received signal #14, SIGALRM, in sigsuspend() [caught]
sigsuspend(0xFFBEFCB0)                          Err#4 EINTR
setcontext(0xFFBEF998)
alarm(0)                                        = 0
sigprocmask(SIG_UNBLOCK, 0xFFBEFCD0, 0x00000000) = 0
sigaction(SIGALRM, 0xFFBEFC30, 0x00000000)      = 0
getpgrp()                                       = 25149
ioctl(10, TIOCGSID, 0xFFBEFC44)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCGPGRP, 0xFFBEFCAC)                = 0
kill(-25149, SIGTTIN)                           = 0
    Received signal #26, SIGTTIN [ignored]
      siginfo: SIGTTIN pid=25149 uid=30000
getpgrp()                                       = 25149
ioctl(10, TIOCGSID, 0xFFBEFC44)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCGPGRP, 0xFFBEFCAC)                = 0
alarm(0)                                        = 0
sigaction(SIGALRM, 0xFFBEFC30, 0xFFBEFCE0)      = 0
sigprocmask(SIG_BLOCK, 0xFFBEFCD0, 0xFFBEFCC0)  = 0
alarm(1)                                        = 0
sigsuspend(0xFFBEFCB0)          (sleeping...)

zsh-3.1.5-pws-23 has no problem on same situation.
-- 
Tanaka Akira


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Final (?) info on signals/crashes when suspending "mutt" function
@ 1999-06-27  8:41 ` Bart Schaefer
  1999-06-27  8:47   ` zsh hangs with the message "zsh: can't set tty pgrp: not owner" Bart Schaefer
  1999-06-27 13:21   ` Final (?) info on signals/crashes when suspending "mutt" function Peter Stephenson
  0 siblings, 2 replies; 8+ messages in thread
From: Bart Schaefer @ 1999-06-27  8:41 UTC (permalink / raw)
  To: zsh-workers

[I sent this once before but it seems to have vanished.  Sorry if it shows
up twice.]

Jump to the end for the big news that may finally get this fixed.  I've
been writing this message incrementally between debugging passes, so you
might as well get the whole play-by-play.

Recall that Jos Backus reported that suspending the function

    mutt () {
	command mutt "$@"
	echotc rs
    }

cause zsh to behave badly.  Sven has sent several patches but none of them
have completely fixed the problem.  Attempting to debug this, I've been
running gdb on zsh.  I reproduced the problem but so far I'm only able to
break at the point at which the SIGSTOP is received, so I'm not sure who
is sending that signal -- however, the parent zsh received first SIGSTOP
and *then* SIGTSTP when I hit ^Z, which is very suspicious.

However, because I was in gdb (attached to a PID from another xterm) I was
able to make zsh continue after each signal (so zsh's xterm never got hung).
Continuing through the second (TSTP) signal, I ended up with this:

zagzig% mutt () {
function>       command mutt "$@"
function>       echotc rs
function> }
zagzig% mutt
zsh: suspended (signal)  mutt
zagzig% pstree $$
zsh-+-mutt
    `-pstree
zagzig% fg
[1]  - trace trap (core dumped)  mutt

Simultaneously in the gdb terminal, the parent zsh got a SIGSEGV because it
tried to strcmp() a bad job table entry.  Here's the stack trace:

(gdb) where
#0  strcmp (p1=0x0, p2=0x80bfe70 "/usr/src/local/zsh/zsh-3.0.6-pre")
    at ../sysdeps/generic/strcmp.c:36
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
#2  0x804a8c3 in execbuiltin (args=0x80c2710, bn=0x80b0ea0) at builtin.c:186
#3  0x805d7d3 in execcmd (cmd=0x80c26f0, input=0, output=0, how=2, last1=2)
    at exec.c:1779
#4  0x805af5e in execpline2 (pline=0x80c2740, how=2, input=0, output=0, 
    last1=0) at exec.c:912
#5  0x805a5b0 in execpline (l=0x80c26d8, how=2, last1=0) at exec.c:739
#6  0x805a183 in execlist (list=0x80c2750, dont_change_job=0, exiting=0)
    at exec.c:612
#7  0x806bee0 in loop (toplevel=1, justonce=0) at init.c:143
#8  0x806bbe4 in main (argc=2, argv=0xbffff6ec) at init.c:75
(gdb) up
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
629			if (strcmp(jobtab[job].pwd, pwd)) {
(gdb) p job
$1 = 1
(gdb) p jobtab[1]
$3 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p jobtab[0]
$4 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p curjob
$5 = 2

Somewhere zsh has completely lost track of two (?) jobs, and failed to reset
curjob to -1.

Now, oddly, if I change the function to be:

    mutt() {
	cd /tmp
	command mutt "$@"
	echotc rs
    }

I still get the SIGSTOP followed by the SIGTSTP, but now zsh is able to
correctly "fg" the job:

zagzig% mutt () {
        cd /tmp
        command mutt "$@"
        echotc rs
}
zagzig% mutt
zsh: suspended (signal)  mutt
(pwd now: /tmp)
zagzig% cd -
/usr/src/local/zsh/zsh-3.0.6-pre
zagzig% fg
[1]  - continued  mutt
zsh: suspended (signal)  mutt
zagzig% fg
[1]  - continued  mutt

The extra builtin has caused something different to happen.  Following
the second "fg" I quit mutt with "q" -- and now zsh is hung, blocked in
sigsuspend() called from waitjob(); but that may be a side effect of gdb.

The strange thing is, I can't tell where the heck that SIGSTOP is coming
from.  I've even tried putting in debug print statements around places
where zsh performs a kill() or killpg(), and I don't get any output!  Is
some other process (mutt itself?) sending a SIGSTOP to the process group?

YES!  That's IT!  MUTT is calling kill(0, SIGSTOP) and blowing its parent
zsh out of the water!  Confirmed by changing "command" to "strace" in the
function above.  Mutt expects to be the process group leader, but is not.

So that pretty much tears it.  There is no way short of forking a "watcher"
subshell for EVERY external process to handle both:
(1) badly-behaved programs whose exit status does not reveal that they died
    from a signal, and
(2) badly-behaved programs that send uncatchable signals to their entire
    process group even when they are not the group leader.

The failure in case (1) is far less catastrophic than case (2), so I think
the right solution is to back off to the behavior from patch 6707 (that is,
scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

I don't know, however, if that's directly related to the bogus curjob value
and "fg" crash noted above.  Probably so, but ...

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh hangs with the message "zsh: can't set tty pgrp: not owner".
  1999-06-27  8:41 ` Final (?) info on signals/crashes when suspending "mutt" function Bart Schaefer
@ 1999-06-27  8:47   ` Bart Schaefer
  1999-06-27 13:21   ` Final (?) info on signals/crashes when suspending "mutt" function Peter Stephenson
  1 sibling, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 1999-06-27  8:47 UTC (permalink / raw)
  To: zsh-workers

On Jun 27,  3:43pm, Tanaka Akira wrote:
} Subject: zsh hangs with the message "zsh: can't set tty pgrp: not owner".
}
} zsh-3.1.5-pws-24 hangs with the message "zsh: can't set tty pgrp: not
} owner" when it is started by Bourne shell.

On Jun 27,  8:41am, Bart Schaefer wrote:
} Subject: Final (?) info on signals/crashes when suspending "mutt" function
}
} scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

OK, so I take it back; 6850 is not good either.

Also it looks as though bits of 6819 should stay, just not the part about
zsh remaining the pgrp leader.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Final (?) info on signals/crashes when suspending "mutt" function
  1999-06-27  8:41 ` Final (?) info on signals/crashes when suspending "mutt" function Bart Schaefer
  1999-06-27  8:47   ` zsh hangs with the message "zsh: can't set tty pgrp: not owner" Bart Schaefer
@ 1999-06-27 13:21   ` Peter Stephenson
  1999-06-27 16:45     ` Bart Schaefer
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 1999-06-27 13:21 UTC (permalink / raw)
  To: zsh-workers

"Bart Schaefer" wrote:
> The failure in case (1) is far less catastrophic than case (2), so I think
> the right solution is to back off to the behavior from patch 6707 (that is,
> scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

[except that 6850 isn't because the shell hangs when called from sh.]
Somebody who's been looking at this will have to produce counter-patches
for pws-24, I'm not going to attempt this myself on a wing and a prayer.

If we can
  1) handle shell structures with well-behaved external programmes
     (i.e. not sh, ksh, zcat)
  2) suspend and interrupt functions running well-behaved external
     programmes (or none)
  3) not hang because of failed attempts to set the pgrp
then I think we should declare a truce before 3.1.6.

-- 
Peter Stephenson <pws@ibmth.df.unipi.it>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarroti 2, 56127 Pisa, Italy


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Final (?) info on signals/crashes when suspending "mutt" function
  1999-06-27 13:21   ` Final (?) info on signals/crashes when suspending "mutt" function Peter Stephenson
@ 1999-06-27 16:45     ` Bart Schaefer
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 1999-06-27 16:45 UTC (permalink / raw)
  To: Peter Stephenson, zsh-workers

On Jun 27,  3:21pm, Peter Stephenson wrote:
} Subject: Re: Final (?) info on signals/crashes when suspending "mutt" func
}
} "Bart Schaefer" wrote:
} > The failure in case (1) is far less catastrophic than case (2), so I think
} > the right solution is to back off to the behavior from patch 6707
} 
} Somebody who's been looking at this will have to produce counter-patches
} for pws-24, I'm not going to attempt this myself on a wing and a prayer.

I spent several hours last night attempting it in 3.0.6 and have not been
having much luck.  I got it back to the point where suspending the mutt
function doesn't kill the parent, thing, but then it appears (from strace)
that bin_fg() is calling attachtty() with the parent shell's process ID
rather than that of the stopped job, which would mean that someplace that
I haven't found yet, the `gleader' of the job entry is set incorrectly.
(The effect of this is that mutt appears to come into the foreground, but
then gets a SIGTTIN and stops again as soon as it tries to read input.)

I fear I'm going to have to punt this to Sven.  I had really hoped to put
out a test 3.0.6 this weekend, but now it may have to wait a couple weeks.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Final (?) info on signals/crashes when suspending "mutt" function
  1999-06-28  7:04 Sven Wischnowsky
@ 1999-06-28  8:14 ` Andrej Borsenkow
  0 siblings, 0 replies; 8+ messages in thread
From: Andrej Borsenkow @ 1999-06-28  8:14 UTC (permalink / raw)
  To: Sven Wischnowsky, zsh-workers

> I also build a patch over the weekend that tried to address Andrej's
> problems, it's appended below.

It does not help (sigh). Unfortunately, I'm beginning to suspect OS bug. The
visible problem is, that Zsh does not get SIGCHLD when child stops. Unless Zsh
somehow blocks SIGCHLD (but I fail to see why it does it when started as first
level shell but does not otherwise) this looks like SIGCHLD not being sent in
this case. Again, I suspect, why it can happen:

man 2 signal

     If signal() or sigset() is used to set SIGCHLD's disposition to a sig-
     nal handler, SIGCHLD will not be sent when the calling process' chil-
     dren are stopped or continued.

At least xterm (and probably dtterm and getty/login) are using signal() to play
with SIGCHLD before exec'ing shell. So, I can imagine some non-trivial bug, that
subsequent sigaction() won't reset SA_NOCLDSTOP. Unfortunately, I could not
reproduce it in obvious way.


/andrej



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Final (?) info on signals/crashes when suspending "mutt" function
@ 1999-06-28  7:04 Sven Wischnowsky
  1999-06-28  8:14 ` Andrej Borsenkow
  0 siblings, 1 reply; 8+ messages in thread
From: Sven Wischnowsky @ 1999-06-28  7:04 UTC (permalink / raw)
  To: zsh-workers


Bart Schaefer wrote:

> ...
>
> YES!  That's IT!  MUTT is calling kill(0, SIGSTOP) and blowing its parent
> zsh out of the water!  Confirmed by changing "command" to "strace" in the
> function above.  Mutt expects to be the process group leader, but is not.

Oh, wonderful...

> So that pretty much tears it.  There is no way short of forking a "watcher"
> subshell for EVERY external process to handle both:
> (1) badly-behaved programs whose exit status does not reveal that they died
>     from a signal, and
> (2) badly-behaved programs that send uncatchable signals to their entire
>     process group even when they are not the group leader.
> 
> The failure in case (1) is far less catastrophic than case (2), so I think
> the right solution is to back off to the behavior from patch 6707 (that is,
> scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

Yes, including the things you said in the follow up, i.e. some bits of 
6819 are still valid and 6850 will go anyway.
I also build a patch over the weekend that tried to address Andrej's
problems, it's appended below. I'll be *very* busy this week, but I'll 
try to send a patch for it this week. I'm sorry folks, but who'd have
expected such a behaviour...

Bye
 Sven

P.S.: Note: parts of this patch may be unneeded when we go back, but
      I'll make the patch relative to pws-24 with this patch. Ok?

diff -u oos/exec.c Src/exec.c
--- oos/exec.c	Mon Jun 28 08:37:57 1999
+++ Src/exec.c	Mon Jun 28 08:50:56 1999
@@ -828,7 +828,7 @@
     int ipipe[2], opipe[2];
     int pj, newjob;
     int old_simple_pline = simple_pline;
-    static int lastwj;
+    static int lastwj, lpforked;
 
     if (!l->left)
 	return lastval = (l->flags & PFLAG_NOT) != 0;
@@ -865,7 +865,7 @@
 	nowait = 0;
 	simple_pline = (l->left->type == END);
     }
-    lastwj = 0;
+    lastwj = lpforked = 0;
     execpline2(l->left, how, opipe[0], ipipe[1], last1);
     pline_level--;
     if (how & Z_ASYNC) {
@@ -935,8 +935,8 @@
 		    jn->stat & STAT_DONE &&
 		    lastval2 & 0200)
 		    killpg(mypgrp, lastval2 & ~0200);
-		if ((list_pipe || last1 || pline_level) &&
-		    !list_pipe_child && 
+		if (!list_pipe_child && !lpforked && !subsh &&
+		    (list_pipe || last1 || pline_level) &&
 		    ((jn->stat & STAT_STOPPED) ||
 		     (list_pipe_job && pline_level &&
 		      (jobtab[list_pipe_job].stat & STAT_STOPPED)))) {
@@ -959,6 +959,7 @@
 		    else if (pid) {
 			char dummy;
 
+			lpforked = 1;
 			list_pipe_pid = pid;
 			nowait = errflag = 1;
 			breaks = loops;
@@ -999,9 +1000,9 @@
 
 	    if (list_pipe && (lastval & 0200) && pj >= 0 &&
 		(!(jn->stat & STAT_INUSE) || (jn->stat & STAT_DONE))) {
+		deletejob(jn);
 		jn = jobtab + pj;
-		jn->stat |= STAT_NOPRINT;
-		killjb(jobtab + pj, lastval & ~0200);
+		killjb(jn, lastval & ~0200);
 	    }
 	    if (list_pipe_child || ((list_pipe || pline_level) &&
 				    (jn->stat & STAT_DONE)))
diff -u oos/jobs.c Src/jobs.c
--- oos/jobs.c	Mon Jun 28 08:37:58 1999
+++ Src/jobs.c	Mon Jun 28 08:50:57 1999
@@ -799,7 +799,8 @@
 			}
 		    if (!p) {
 			jn->stat &= ~STAT_SUPERJOB;
-			if (WIFEXITED(jn->procs->status))
+			if (WIFEXITED(jn->procs->status) &&
+			    !(jn->stat & STAT_CURSH))
 			    jn->gleader = mypgrp;
 			/* This deleted the job too early if the parent
 			   shell waited for a command in a list that will

--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Final (?) info on signals/crashes when suspending "mutt" function
@ 1999-06-27  7:03 Bart Schaefer
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 1999-06-27  7:03 UTC (permalink / raw)
  To: zsh-workers

Jump to the end for the big news that may finally get this fixed.  I've
been writing this message incrementally between debugging passes, so you
might as well get the whole play-by-play.

Recall that Jos Backus reported that suspending the function

    mutt () {
	command mutt "$@"
	echotc rs
    }

cause zsh to behave badly.  Sven has sent several patches but none of them
have completely fixed the problem.  Attempting to debug this, I've been
running gdb on zsh.  I reproduced the problem but so far I'm only able to
break at the point at which the SIGSTOP is received, so I'm not sure who
is sending that signal -- however, the parent zsh received first SIGSTOP
and *then* SIGTSTP when I hit ^Z, which is very suspicious.

However, because I was in gdb (attached to a PID from another xterm) I was
able to make zsh continue after each signal (so zsh's xterm never got hung).
Continuing through the second (TSTP) signal, I ended up with this:

zagzig% mutt () {
function>       command mutt "$@"
function>       echotc rs
function> }
zagzig% mutt
zsh: suspended (signal)  mutt
zagzig% pstree $$
zsh-+-mutt
    `-pstree
zagzig% fg
[1]  - trace trap (core dumped)  mutt

Simultaneously in the gdb terminal, the parent zsh got a SIGSEGV because it
tried to strcmp() a bad job table entry.  Here's the stack trace:

(gdb) where
#0  strcmp (p1=0x0, p2=0x80bfe70 "/usr/src/local/zsh/zsh-3.0.6-pre")
    at ../sysdeps/generic/strcmp.c:36
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
#2  0x804a8c3 in execbuiltin (args=0x80c2710, bn=0x80b0ea0) at builtin.c:186
#3  0x805d7d3 in execcmd (cmd=0x80c26f0, input=0, output=0, how=2, last1=2)
    at exec.c:1779
#4  0x805af5e in execpline2 (pline=0x80c2740, how=2, input=0, output=0, 
    last1=0) at exec.c:912
#5  0x805a5b0 in execpline (l=0x80c26d8, how=2, last1=0) at exec.c:739
#6  0x805a183 in execlist (list=0x80c2750, dont_change_job=0, exiting=0)
    at exec.c:612
#7  0x806bee0 in loop (toplevel=1, justonce=0) at init.c:143
#8  0x806bbe4 in main (argc=2, argv=0xbffff6ec) at init.c:75
(gdb) up
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
629			if (strcmp(jobtab[job].pwd, pwd)) {
(gdb) p job
$1 = 1
(gdb) p jobtab[1]
$3 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p jobtab[0]
$4 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p curjob
$5 = 2

Somewhere zsh has completely lost track of two (?) jobs, and failed to reset
curjob to -1.

Now, oddly, if I change the function to be:

    mutt() {
	cd /tmp
	command mutt "$@"
	echotc rs
    }

I still get the SIGSTOP followed by the SIGTSTP, but now zsh is able to
correctly "fg" the job:

zagzig% mutt () {
        cd /tmp
        command mutt "$@"
        echotc rs
}
zagzig% mutt
zsh: suspended (signal)  mutt
(pwd now: /tmp)
zagzig% cd -
/usr/src/local/zsh/zsh-3.0.6-pre
zagzig% fg
[1]  - continued  mutt
zsh: suspended (signal)  mutt
zagzig% fg
[1]  - continued  mutt

The extra builtin has caused something different to happen.  Following
the second "fg" I quit mutt with "q" -- and now zsh is hung, blocked in
sigsuspend() called from waitjob(); but that may be a side effect of gdb.

The strange thing is, I can't tell where the heck that SIGSTOP is coming
from.  I've even tried putting in debug print statements around places
where zsh performs a kill() or killpg(), and I don't get any output!  Is
some other process (mutt itself?) sending a SIGSTOP to the process group?

YES!  That's IT!  MUTT is calling kill(0, SIGSTOP) and blowing its parent
zsh out of the water!  Confirmed by changing "command" to "strace" in the
function above.  Mutt expects to be the process group leader, but is not.

So that pretty much tears it.  There is no way short of forking a "watcher"
subshell for EVERY external process to handle both:
(1) badly-behaved programs whose exit status does not reveal that they died
    from a signal, and
(2) badly-behaved programs that send uncatchable signals to their entire
    process group even when they are not the group leader.

The failure in case (1) is far less catastrophic than case (2), so I think
the right solution is to back off to the behavior from patch 6707 (that is,
scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

I don't know, however, if that's directly related to the bogus curjob value
and "fg" crash noted above.  Probably so, but ...

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1999-06-28  8:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-06-27  6:43 zsh hangs with the message "zsh: can't set tty pgrp: not owner" Tanaka Akira
1999-06-27  8:41 ` Final (?) info on signals/crashes when suspending "mutt" function Bart Schaefer
1999-06-27  8:47   ` zsh hangs with the message "zsh: can't set tty pgrp: not owner" Bart Schaefer
1999-06-27 13:21   ` Final (?) info on signals/crashes when suspending "mutt" function Peter Stephenson
1999-06-27 16:45     ` Bart Schaefer
1999-06-27  7:03 Bart Schaefer
1999-06-28  7:04 Sven Wischnowsky
1999-06-28  8:14 ` Andrej Borsenkow

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).