zsh-workers
 help / color / mirror / code / Atom feed
* zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
@ 1999-04-29  5:28 Tatsuo Furukawa
  1999-04-29 16:01 ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Tatsuo Furukawa @ 1999-04-29  5:28 UTC (permalink / raw)
  To: zsh-workers


Hello, zsh developers.  I really appreciate including "absolute cursor
move" patch.

But, I found another problem.

(Maybe) as you know, zsh-3.0.X is not set COLUMNS and LINE envionment
variable.  (But zsh-3.1.X does).  I am using zsh-3.0.X, and I want to
set them, so, I write "eval $(resize)" code in TRAPWINCH() function.
This is based on "archive/latest/4447" message.

This works almost well.  But sometime zsh hangs.  At last, I found the
positive proof.  Here is:


1. Write following in .zshrc

RPROMPT="(%l)"

function TRAPWINCH() {
        eval $(resize)
        echo;
        echo "resized";
        echo
} 


2. Start new terminal

    $ xterm &

3. Rlogin into localhost

    $ rlogin localhost

4. Change terminal size using mouse.

    Then "resized" message is displayed, and RPROMPT is displayed into
    'right' place.  (But why is "resized" message displayed twice?)

5. Exit.

    $ exit

    Then, you will return to origial shell.  

6. Execute following command.

    $  eval `resize`

7. zsh hangs. (T_T)


I tested the following situation:

    zsh:    3.0.6-pre-2
    OS:     HP-UX 10.20

    zsh:    3.0.6-pre-2
    OS:     Linux/Slackware 3.1

    zsh:    3.1.5-pws-16
    OS:     Linux/Slackware 3.1
    
zsh hangs in all case.


This plobrem is too difficult to fix for me.  Sorry for report only,
no patch.

-- 
Tatsuo Furukawa  (frkwtto@osk3.3web.ne.jp)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
  1999-04-29  5:28 zsh hangs (3.0.6-pre-2, 3.1.5-pws-16) Tatsuo Furukawa
@ 1999-04-29 16:01 ` Bart Schaefer
  1999-05-04 16:48   ` PATCH: " Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 1999-04-29 16:01 UTC (permalink / raw)
  To: Tatsuo Furukawa, zsh-workers

On Apr 29,  2:28pm, Tatsuo Furukawa wrote:
} Subject: zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
}
} 1. Write following in .zshrc
} 
} RPROMPT="(%l)"
} 
} function TRAPWINCH() {
}         eval $(resize)
}         echo;
}         echo "resized";
}         echo
} } 
} 
} 4. Change terminal size using mouse.
} 
}     Then "resized" message is displayed, and RPROMPT is displayed into
}     'right' place.  (But why is "resized" message displayed twice?)

The message is displayed twice because the "resize" command itself causes
a SIGWINCH to be sent.  So you get one when xterm finishes remapping, and
another when resize runs.  I'm not entirely sure why this doesn't cause
an infinite loop; perhaps resize only sends a SIGWINCH when the values it
reads back from the terminal do not match what's in the environment.

The hang appears to be a race condition in exec.c: getoutput().  Zsh is
blocked forever in sigsuspend() waiting for the SIGCHLD that will tell
it `resize` has exited.  Probably that signal arrived while zsh was
handling the SIGWINCH sent by "resize" and zsh either improperly handled
it then or dropped it on the floor.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* PATCH: Re: zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
  1999-04-29 16:01 ` Bart Schaefer
@ 1999-05-04 16:48   ` Bart Schaefer
  1999-05-04 17:12     ` Bart Schaefer
  1999-05-11 14:41     ` Tatsuo Furukawa
  0 siblings, 2 replies; 5+ messages in thread
From: Bart Schaefer @ 1999-05-04 16:48 UTC (permalink / raw)
  To: Tatsuo Furukawa, zsh-workers

On Apr 29,  9:01am, Bart Schaefer wrote:
} Subject: Re: zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
}
} The hang appears to be a race condition in exec.c: getoutput().  Zsh is
} blocked forever in sigsuspend() waiting for the SIGCHLD that will tell
} it `resize` has exited.  Probably that signal arrived while zsh was
} handling the SIGWINCH sent by "resize" and zsh either improperly handled
} it then or dropped it on the floor.

This is indeed a race condition; I can't reproduce it under strace, but
the output I do get makes me suspicious of what's going on.  Here's a
fragment starting with the first WINCH and ending with the second:

--- SIGWINCH (Window size changed) ---
sigprocmask(SIG_BLOCK, ~[], [WINCH])    = 0
sigprocmask(SIG_SETMASK, [WINCH], ~[KILL STOP]) = 0
ioctl(10, TIOCGWINSZ, {ws_row=28, ws_col=86, ws_xpixel=535, ws_ypixel=368}) = 0
ioctl(10, TIOCSPGRP, [19992])           = 0
ioctl(10, SNDCTL_TMR_STOP, {B9600 opost isig -icanon -echo ...}) = 0
geteuid()                               = 674
write(10, "\r\33[m\17\33[27m\33[24m\33[Jzag"..., 26) = 26
write(10, "\33[K\33[67C(p6)\r\33[8C", 17) = 17
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
time(NULL)                              = 925778145
pipe([4, 5])                            = 0
fcntl(4, F_DUPFD, 10)                   = 11
close(4)                                = 0
fcntl(5, F_DUPFD, 10)                   = 12
close(5)                                = 0
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
fork()                                  = 19998
close(12)                               = 0
fcntl(11, F_GETFL)                      = 0 (flags O_RDONLY)
fstat(11, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40008000
lseek(11, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
read(11, "TERMCAP=\'xterm|vs100|xterms|xte"..., 4096) = 990
read(11, "", 4096)                      = 0
close(11)                               = 0
munmap(0x40008000, 4096)                = 0
sigsuspend(~[HUP CHLD] <unfinished ...>
--- SIGCHLD (Child exited) ---
<... sigsuspend resumed> )              = -1 EINTR (Interrupted system call)
sigprocmask(SIG_BLOCK, ~[], ~[HUP KILL STOP]) = 0
sigprocmask(SIG_SETMASK, ~[HUP KILL STOP], ~[KILL STOP]) = 0
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG|WUNTRACED, NULL) = 19998
times({tms_utime=3, tms_stime=6, tms_cutime=5, tms_cstime=3}) = 22268196
wait4(-1, 0xbfffefb4, WNOHANG|WUNTRACED, NULL) = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [CHLD WINCH])
brk(0x80b4000)                          = 0x80b4000
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
ioctl(10, TIOCGWINSZ, {ws_row=28, ws_col=86, ws_xpixel=516, ws_ypixel=364}) = 0
ioctl(10, TIOCSWINSZ, {ws_row=28, ws_col=86, ws_xpixel=516, ws_ypixel=364}) = 0
ioctl(10, TIOCSPGRP, [19992])           = 0
ioctl(10, SNDCTL_TMR_STOP, {B9600 opost isig -icanon -echo ...}) = 0
geteuid()                               = 674
write(10, "\r\33[m\17\33[27m\33[24m\33[Jzag"..., 26) = 26
write(10, "\33[K\33[73C(p6)\r\33[8C", 17) = 17
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
ioctl(10, TIOCGWINSZ, {ws_row=28, ws_col=86, ws_xpixel=516, ws_ypixel=364}) = 0
ioctl(10, TIOCSWINSZ, {ws_row=28, ws_col=86, ws_xpixel=516, ws_ypixel=364}) = 0
ioctl(10, TIOCSPGRP, [19992])           = 0
ioctl(10, SNDCTL_TMR_STOP, {B9600 opost isig -icanon -echo ...}) = 0
geteuid()                               = 674
write(10, "\r\33[m\17\33[27m\33[24m\33[Jzag"..., 26) = 26
write(10, "\33[K\33[73C(p6)\r\33[8C", 17) = 17
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
write(1, "\n", 1)                       = 1
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
write(1, "resized\n", 8)                = 8
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigprocmask(SIG_BLOCK, [CHLD], [WINCH]) = 0
write(1, "\n", 1)                       = 1
sigprocmask(SIG_BLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [CHLD WINCH]) = 0
sigprocmask(SIG_UNBLOCK, [CHLD], [WINCH]) = 0
sigreturn()                             = ? (mask now [])
--- SIGWINCH (Window size changed) ---

Note that the second WINCH isn't delivered until the first handler is done,
which is as it should be.  However, also note that during execution of the
handler, there are several instances where CHLD is blocked twice and then
unblocked twice.  Of course, the second block and the second unblock are
no-ops -- which is very bad in the case of the second unblock, because it
means that a SIGCHLD could be received before whatever code first blocked
the signal is ready to handle it.

I think that's what's happening -- getoutput() has called child_block()
and will call child_unblock() right before child_suspend(); but some other
code [probably in execpline()] has already called child_unblock(), so the
SIGCHLD gets handled before child_suspend() is called.  This results in
deadlock, because there are no children left.

"Premature" CHLD-unblocking doesn't cause a lockup at other times because
the code in waitjob() checks whether any children exist before calling
child_suspend().  Unfortunately, the process forked in getoutput() is not
added to the job table, so it's not directly possible to check whether it's
already been waited for.  The processes forked by getpipe() and getproc()
are similarly not job-tabled, but zsh never explicitly waits for them.

However, there is code in jobs.c to handle waiting for a process that is
not in the job table.  So I think the fix is the following; it doesn't
appear to break anything in simple tests.  The only behavioral change is
that zsh becomes interruptible with SIGINT (^C) during the wait for the
$(command), but that shouldn't matter because the output from the command
has already been consumed by readoutput().

Index: Src/exec.c
===================================================================
RCS file: /extra/cvsroot/zsh/zsh-3.0/Src/exec.c,v
retrieving revision 1.1.1.5.2.3
diff -u -r1.1.1.5.2.3 exec.c
--- exec.c	1999/04/28 05:21:34	1.1.1.5.2.3
+++ exec.c	1999/05/04 16:34:15
@@ -2071,7 +2071,7 @@
 	zclose(pipes[1]);
 	retval = readoutput(pipes[0], qt);
 	fdtable[pipes[0]] = 0;
-	child_suspend(0);		/* unblocks */
+	waitforpid(pid);		/* unblocks */
 	lastval = cmdoutval;
 	return retval;
     }

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Re: zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
  1999-05-04 16:48   ` PATCH: " Bart Schaefer
@ 1999-05-04 17:12     ` Bart Schaefer
  1999-05-11 14:41     ` Tatsuo Furukawa
  1 sibling, 0 replies; 5+ messages in thread
From: Bart Schaefer @ 1999-05-04 17:12 UTC (permalink / raw)
  To: zsh-workers

On May 4,  9:48am, Bart Schaefer wrote:
} Subject: PATCH: Re: zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
}
} I think that's what's happening -- getoutput() has called child_block()
} and will call child_unblock() right before child_suspend()

That's not precisely accurate, BTW.  child_suspend() is what calls [the
equivalent of] child_unblock() internally, right before sigsuspend() [or
the equivalent].  The conceptual flow is the same.

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: Re: zsh hangs (3.0.6-pre-2, 3.1.5-pws-16)
  1999-05-04 16:48   ` PATCH: " Bart Schaefer
  1999-05-04 17:12     ` Bart Schaefer
@ 1999-05-11 14:41     ` Tatsuo Furukawa
  1 sibling, 0 replies; 5+ messages in thread
From: Tatsuo Furukawa @ 1999-05-11 14:41 UTC (permalink / raw)
  To: schaefer; +Cc: zsh-workers


Hello, Bart,

I applied your patch for fixing zsh hang problem patch issued in 6213.
And I tested.  This problem is too difficult to understand for me, but
patched zsh works well.  The patched zsh doesn't hang any more!!
(I tested 3.1.5-pws-17 in HP-UX 9.07).

So, I hope that this patch will be incorporated! :-)

Bart> Index: Src/exec.c
Bart> ===================================================================
Bart> RCS file: /extra/cvsroot/zsh/zsh-3.0/Src/exec.c,v
Bart> retrieving revision 1.1.1.5.2.3
Bart> diff -u -r1.1.1.5.2.3 exec.c
Bart> --- exec.c	1999/04/28 05:21:34	1.1.1.5.2.3
Bart> +++ exec.c	1999/05/04 16:34:15
Bart> @@ -2071,7 +2071,7 @@
Bart>  	zclose(pipes[1]);
Bart>  	retval = readoutput(pipes[0], qt);
Bart>  	fdtable[pipes[0]] = 0;
Bart> -	child_suspend(0);		/* unblocks */
Bart> +	waitforpid(pid);		/* unblocks */
Bart>  	lastval = cmdoutval;
Bart>  	return retval;
Bart>      }

-- 
Tatsuo Furukawa (frkwtto@osk3.3web.ne.jp)


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-05-11 14:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-04-29  5:28 zsh hangs (3.0.6-pre-2, 3.1.5-pws-16) Tatsuo Furukawa
1999-04-29 16:01 ` Bart Schaefer
1999-05-04 16:48   ` PATCH: " Bart Schaefer
1999-05-04 17:12     ` Bart Schaefer
1999-05-11 14:41     ` Tatsuo Furukawa

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).