zsh-workers
 help / color / mirror / code / Atom feed
* zsh hangs with the message "zsh: can't set tty pgrp: not owner".
@ 1999-06-27  6:43 Tanaka Akira
  1999-06-27  8:41 ` Final (?) info on signals/crashes when suspending "mutt" function Bart Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Tanaka Akira @ 1999-06-27  6:43 UTC (permalink / raw)
  To: zsh-workers

zsh-3.1.5-pws-24 hangs with the message "zsh: can't set tty pgrp: not
owner" when it is started by Bourne shell.

Last login: Sun Jun 27 15:20:06 from localhost
Sun Microsystems Inc.   SunOS 5.7       Generic October 1998
$ /app/zsh-3.1.5-pws-24/bin/zsh
zsh: can't set tty pgrp: not owner
^C^C^C

# akr_sh is a test user.

truss reports as following.

stat64("/dev/pts/36", 0xFFBEFAE0)               = 0
open("/dev/pts/36", O_RDWR|O_NOCTTY)            = 3
fcntl(3, F_DUPFD, 0x0000000A)                   = 10
close(3)                                        = 0
ioctl(10, TCGETS, 0x00089F78)                   = 0
getpid()                                        = 25149 [25148]
ioctl(10, TIOCGSID, 0xFFBEFC4C)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCSPGRP, 0xFFBEFCF8)                Err#1 EPERM
kill(25149, SIG#0)                              = 0
zsh: can't set tty pgrp: not owner
write(2, " z s h :   c a n ' t   s".., 35)      = 35
setpgid(0, 0)                                   = 0
getpgrp()                                       = 25149
ioctl(10, TIOCGSID, 0xFFBEFC44)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCGPGRP, 0xFFBEFCAC)                = 0
alarm(0)                                        = 0
sigaction(SIGALRM, 0xFFBEFC30, 0xFFBEFCE0)      = 0
sigfillset(0xFF1B8998)                          = 0
sigprocmask(SIG_BLOCK, 0xFFBEFCD0, 0xFFBEFCC0)  = 0
alarm(1)                                        = 0
sigsuspend(0xFFBEFCB0)          (sleeping...)

# Hm. Why is TIOCSPGRP failed?

truss also reports that zsh hangs with following loop.

    Received signal #14, SIGALRM, in sigsuspend() [caught]
sigsuspend(0xFFBEFCB0)                          Err#4 EINTR
setcontext(0xFFBEF998)
alarm(0)                                        = 0
sigprocmask(SIG_UNBLOCK, 0xFFBEFCD0, 0x00000000) = 0
sigaction(SIGALRM, 0xFFBEFC30, 0x00000000)      = 0
getpgrp()                                       = 25149
ioctl(10, TIOCGSID, 0xFFBEFC44)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCGPGRP, 0xFFBEFCAC)                = 0
kill(-25149, SIGTTIN)                           = 0
    Received signal #26, SIGTTIN [ignored]
      siginfo: SIGTTIN pid=25149 uid=30000
getpgrp()                                       = 25149
ioctl(10, TIOCGSID, 0xFFBEFC44)                 = 0
getsid(0)                                       = 25101
ioctl(10, TIOCGPGRP, 0xFFBEFCAC)                = 0
alarm(0)                                        = 0
sigaction(SIGALRM, 0xFFBEFC30, 0xFFBEFCE0)      = 0
sigprocmask(SIG_BLOCK, 0xFFBEFCD0, 0xFFBEFCC0)  = 0
alarm(1)                                        = 0
sigsuspend(0xFFBEFCB0)          (sleeping...)

zsh-3.1.5-pws-23 has no problem on same situation.
-- 
Tanaka Akira


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Final (?) info on signals/crashes when suspending "mutt" function
@ 1999-06-27  7:03 Bart Schaefer
  0 siblings, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 1999-06-27  7:03 UTC (permalink / raw)
  To: zsh-workers

Jump to the end for the big news that may finally get this fixed.  I've
been writing this message incrementally between debugging passes, so you
might as well get the whole play-by-play.

Recall that Jos Backus reported that suspending the function

    mutt () {
	command mutt "$@"
	echotc rs
    }

cause zsh to behave badly.  Sven has sent several patches but none of them
have completely fixed the problem.  Attempting to debug this, I've been
running gdb on zsh.  I reproduced the problem but so far I'm only able to
break at the point at which the SIGSTOP is received, so I'm not sure who
is sending that signal -- however, the parent zsh received first SIGSTOP
and *then* SIGTSTP when I hit ^Z, which is very suspicious.

However, because I was in gdb (attached to a PID from another xterm) I was
able to make zsh continue after each signal (so zsh's xterm never got hung).
Continuing through the second (TSTP) signal, I ended up with this:

zagzig% mutt () {
function>       command mutt "$@"
function>       echotc rs
function> }
zagzig% mutt
zsh: suspended (signal)  mutt
zagzig% pstree $$
zsh-+-mutt
    `-pstree
zagzig% fg
[1]  - trace trap (core dumped)  mutt

Simultaneously in the gdb terminal, the parent zsh got a SIGSEGV because it
tried to strcmp() a bad job table entry.  Here's the stack trace:

(gdb) where
#0  strcmp (p1=0x0, p2=0x80bfe70 "/usr/src/local/zsh/zsh-3.0.6-pre")
    at ../sysdeps/generic/strcmp.c:36
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
#2  0x804a8c3 in execbuiltin (args=0x80c2710, bn=0x80b0ea0) at builtin.c:186
#3  0x805d7d3 in execcmd (cmd=0x80c26f0, input=0, output=0, how=2, last1=2)
    at exec.c:1779
#4  0x805af5e in execpline2 (pline=0x80c2740, how=2, input=0, output=0, 
    last1=0) at exec.c:912
#5  0x805a5b0 in execpline (l=0x80c26d8, how=2, last1=0) at exec.c:739
#6  0x805a183 in execlist (list=0x80c2750, dont_change_job=0, exiting=0)
    at exec.c:612
#7  0x806bee0 in loop (toplevel=1, justonce=0) at init.c:143
#8  0x806bbe4 in main (argc=2, argv=0xbffff6ec) at init.c:75
(gdb) up
#1  0x804ba8b in bin_fg (name=0x80c25d8 "fg", argv=0x80c2770, 
    ops=0xbffff1a8 "", func=2) at builtin.c:629
629			if (strcmp(jobtab[job].pwd, pwd)) {
(gdb) p job
$1 = 1
(gdb) p jobtab[1]
$3 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p jobtab[0]
$4 = {gleader = 0, other = 0, stat = 0, pwd = 0x0, procs = 0x0, 
  filelist = 0x0, stty_in_env = 0, ty = 0x0}
(gdb) p curjob
$5 = 2

Somewhere zsh has completely lost track of two (?) jobs, and failed to reset
curjob to -1.

Now, oddly, if I change the function to be:

    mutt() {
	cd /tmp
	command mutt "$@"
	echotc rs
    }

I still get the SIGSTOP followed by the SIGTSTP, but now zsh is able to
correctly "fg" the job:

zagzig% mutt () {
        cd /tmp
        command mutt "$@"
        echotc rs
}
zagzig% mutt
zsh: suspended (signal)  mutt
(pwd now: /tmp)
zagzig% cd -
/usr/src/local/zsh/zsh-3.0.6-pre
zagzig% fg
[1]  - continued  mutt
zsh: suspended (signal)  mutt
zagzig% fg
[1]  - continued  mutt

The extra builtin has caused something different to happen.  Following
the second "fg" I quit mutt with "q" -- and now zsh is hung, blocked in
sigsuspend() called from waitjob(); but that may be a side effect of gdb.

The strange thing is, I can't tell where the heck that SIGSTOP is coming
from.  I've even tried putting in debug print statements around places
where zsh performs a kill() or killpg(), and I don't get any output!  Is
some other process (mutt itself?) sending a SIGSTOP to the process group?

YES!  That's IT!  MUTT is calling kill(0, SIGSTOP) and blowing its parent
zsh out of the water!  Confirmed by changing "command" to "strace" in the
function above.  Mutt expects to be the process group leader, but is not.

So that pretty much tears it.  There is no way short of forking a "watcher"
subshell for EVERY external process to handle both:
(1) badly-behaved programs whose exit status does not reveal that they died
    from a signal, and
(2) badly-behaved programs that send uncatchable signals to their entire
    process group even when they are not the group leader.

The failure in case (1) is far less catastrophic than case (2), so I think
the right solution is to back off to the behavior from patch 6707 (that is,
scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

I don't know, however, if that's directly related to the bogus curjob value
and "fg" crash noted above.  Probably so, but ...

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: Final (?) info on signals/crashes when suspending "mutt" function
@ 1999-06-28  7:04 Sven Wischnowsky
  1999-06-28  8:14 ` Andrej Borsenkow
  0 siblings, 1 reply; 8+ messages in thread
From: Sven Wischnowsky @ 1999-06-28  7:04 UTC (permalink / raw)
  To: zsh-workers


Bart Schaefer wrote:

> ...
>
> YES!  That's IT!  MUTT is calling kill(0, SIGSTOP) and blowing its parent
> zsh out of the water!  Confirmed by changing "command" to "strace" in the
> function above.  Mutt expects to be the process group leader, but is not.

Oh, wonderful...

> So that pretty much tears it.  There is no way short of forking a "watcher"
> subshell for EVERY external process to handle both:
> (1) badly-behaved programs whose exit status does not reveal that they died
>     from a signal, and
> (2) badly-behaved programs that send uncatchable signals to their entire
>     process group even when they are not the group leader.
> 
> The failure in case (1) is far less catastrophic than case (2), so I think
> the right solution is to back off to the behavior from patch 6707 (that is,
> scrap 6819 and most of 6824, but 6848 and 6850 are orthogonal and good).

Yes, including the things you said in the follow up, i.e. some bits of 
6819 are still valid and 6850 will go anyway.
I also build a patch over the weekend that tried to address Andrej's
problems, it's appended below. I'll be *very* busy this week, but I'll 
try to send a patch for it this week. I'm sorry folks, but who'd have
expected such a behaviour...

Bye
 Sven

P.S.: Note: parts of this patch may be unneeded when we go back, but
      I'll make the patch relative to pws-24 with this patch. Ok?

diff -u oos/exec.c Src/exec.c
--- oos/exec.c	Mon Jun 28 08:37:57 1999
+++ Src/exec.c	Mon Jun 28 08:50:56 1999
@@ -828,7 +828,7 @@
     int ipipe[2], opipe[2];
     int pj, newjob;
     int old_simple_pline = simple_pline;
-    static int lastwj;
+    static int lastwj, lpforked;
 
     if (!l->left)
 	return lastval = (l->flags & PFLAG_NOT) != 0;
@@ -865,7 +865,7 @@
 	nowait = 0;
 	simple_pline = (l->left->type == END);
     }
-    lastwj = 0;
+    lastwj = lpforked = 0;
     execpline2(l->left, how, opipe[0], ipipe[1], last1);
     pline_level--;
     if (how & Z_ASYNC) {
@@ -935,8 +935,8 @@
 		    jn->stat & STAT_DONE &&
 		    lastval2 & 0200)
 		    killpg(mypgrp, lastval2 & ~0200);
-		if ((list_pipe || last1 || pline_level) &&
-		    !list_pipe_child && 
+		if (!list_pipe_child && !lpforked && !subsh &&
+		    (list_pipe || last1 || pline_level) &&
 		    ((jn->stat & STAT_STOPPED) ||
 		     (list_pipe_job && pline_level &&
 		      (jobtab[list_pipe_job].stat & STAT_STOPPED)))) {
@@ -959,6 +959,7 @@
 		    else if (pid) {
 			char dummy;
 
+			lpforked = 1;
 			list_pipe_pid = pid;
 			nowait = errflag = 1;
 			breaks = loops;
@@ -999,9 +1000,9 @@
 
 	    if (list_pipe && (lastval & 0200) && pj >= 0 &&
 		(!(jn->stat & STAT_INUSE) || (jn->stat & STAT_DONE))) {
+		deletejob(jn);
 		jn = jobtab + pj;
-		jn->stat |= STAT_NOPRINT;
-		killjb(jobtab + pj, lastval & ~0200);
+		killjb(jn, lastval & ~0200);
 	    }
 	    if (list_pipe_child || ((list_pipe || pline_level) &&
 				    (jn->stat & STAT_DONE)))
diff -u oos/jobs.c Src/jobs.c
--- oos/jobs.c	Mon Jun 28 08:37:58 1999
+++ Src/jobs.c	Mon Jun 28 08:50:57 1999
@@ -799,7 +799,8 @@
 			}
 		    if (!p) {
 			jn->stat &= ~STAT_SUPERJOB;
-			if (WIFEXITED(jn->procs->status))
+			if (WIFEXITED(jn->procs->status) &&
+			    !(jn->stat & STAT_CURSH))
 			    jn->gleader = mypgrp;
 			/* This deleted the job too early if the parent
 			   shell waited for a command in a list that will

--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1999-06-28  8:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-06-27  6:43 zsh hangs with the message "zsh: can't set tty pgrp: not owner" Tanaka Akira
1999-06-27  8:41 ` Final (?) info on signals/crashes when suspending "mutt" function Bart Schaefer
1999-06-27  8:47   ` zsh hangs with the message "zsh: can't set tty pgrp: not owner" Bart Schaefer
1999-06-27 13:21   ` Final (?) info on signals/crashes when suspending "mutt" function Peter Stephenson
1999-06-27 16:45     ` Bart Schaefer
1999-06-27  7:03 Bart Schaefer
1999-06-28  7:04 Sven Wischnowsky
1999-06-28  8:14 ` Andrej Borsenkow

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).