From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from loas.clark.net ([168.143.0.10]) by hawkwind.utcs.utoronto.ca with SMTP id <24704>; Thu, 5 Feb 1998 16:59:04 -0500
Received: from shell.clark.net (root@shell [168.143.0.8])
	by loas.clark.net (8.8.8/8.8.8) with ESMTP id KAA13081;
	Thu, 5 Feb 1998 10:25:58 -0500 (EST)
Received: from shell.clark.net (culliton@localhost [127.0.0.1]) by shell.clark.net (8.8.8/8.8.8) with ESMTP id KAA02997; Thu, 5 Feb 1998 10:25:43 -0500 (EST)
Message-Id: <199802051525.KAA02997@shell.clark.net>
To:	smarry@pantransit.smar.reptiles.org
cc:	broman@nosc.mil, culliton@clark.net, debian-devel@lists.debian.org,
	rc@hawkwind.utcs.toronto.edu, culliton@clark.net
Subject: Re: "rc" shell maintainer? 
In-reply-to: Your message of "05 Feb 1998 05:01:30 GMT."
             <19980205050130.12464.qmail@pantransit.smar.reptiles.org> 
Date:	Thu, 5 Feb 1998 10:25:42 -0500
From:	Tom Culliton <culliton@clark.net>

On 05 Feb 1998 05:01:30 GMT, smarry@pantransit.smar.reptiles.org wrote:
> Hello, this is Marc Moorcroft.  I joined the list shortly after
> running into two problems with rc-1.5b2 under Linux.
> 
> One definite bug (reported to Tim Goodwin) will cause rc to spin calling
> wait(2) if you do:
> 
> {ls & wait} | cat
> 
> There is another problem with signal handling that is a little more
> complicated.  When I upgraded to the 2.0.33 kernel, rc began hanging
> occasionally when I interrupted programs, and when I finally got irritated
> enough to check it thoroughly, I found that it failed trip.rc at:
> 
> kill -2 $pid
> 
> The relevant code is rc_wait() in wait.c:
> 
> static pid_t rc_wait(int *stat) {
> 	int r;
> 	interrupt_happened = FALSE;
> 	if (!setjmp(slowbuf.j)) {
> 		slow = TRUE;
> 		if (!interrupt_happened)
> 			r = wait(stat);
> 		else
> 			r = -1;
> 	} else
> 		r = -1;
> 	slow = FALSE;
> 	return r;
> }
> 
> It appears that some of the time, Linux will return from the wait(2)
> for the 'kill' process before the signal gets delivered.  On Linux
> installations where signal(2) has the System V behaviour (system calls
> are interrupted for signals that are caught via signal(2)) rc longjmps
> out of the signal handler (a rather alarming practice in itself) to the
> top of the enclosing code in rc_wait().  The sequence of events appears
> to be:
> 
> 	The signal is sent,
> 
> 	the process exits,
> 
> 	wait(2) returns successfully, and
> 
> 	before the longjmp gadgetry can be turned off (slow = FALSE),
> 	the signal handler IMMEDIATELY runs,
> 
> 	longjmps back to the top of the setjmp block,
> 
> and the PID that wait(2) returned is lost.  rc loops forever calling
> wait(2) with no children, waiting for the lost PID to turn up.

Shouldn't the "interrupt_happened" flag prevent this?

> I've talked to others who have had different problems on other Linux
> installations, where caught signals do not interrupt system calls, as
> in BSD.  This appears to be due to a difference of opinion between the
> libc and glibc people about how signals should behave, but I haven't
> investigated it myself.

It sounds like you're running RedHat 5.0 or some other distribution
which uses glibc.  I haven't taken that step, partly on the "beware of
version X.0 of anything" and partly because I'm waiting until I get a
new machine.  Sounds like this is another good reason, having heard
lots of complaints of problems with glibc.

The ultimate solution maybe to move to the world of sigaction/sigblock
where those calls are available.  The signal handling in rc is one of
the hairyest aspects of the code due to portability issues and race
conditions.  Byron spent a lot of time (the change logs are full of
it) fixing race conditions and signal handling before he passed on the
torch.  (BTW - Anyone heard from him recently?)

Tom