From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from loas.clark.net ([168.143.0.10]) by hawkwind.utcs.utoronto.ca with SMTP id <24704>; Thu, 5 Feb 1998 16:59:04 -0500 Received: from shell.clark.net (root@shell [168.143.0.8]) by loas.clark.net (8.8.8/8.8.8) with ESMTP id KAA13081; Thu, 5 Feb 1998 10:25:58 -0500 (EST) Received: from shell.clark.net (culliton@localhost [127.0.0.1]) by shell.clark.net (8.8.8/8.8.8) with ESMTP id KAA02997; Thu, 5 Feb 1998 10:25:43 -0500 (EST) Message-Id: <199802051525.KAA02997@shell.clark.net> To: smarry@pantransit.smar.reptiles.org cc: broman@nosc.mil, culliton@clark.net, debian-devel@lists.debian.org, rc@hawkwind.utcs.toronto.edu, culliton@clark.net Subject: Re: "rc" shell maintainer? In-reply-to: Your message of "05 Feb 1998 05:01:30 GMT." <19980205050130.12464.qmail@pantransit.smar.reptiles.org> Date: Thu, 5 Feb 1998 10:25:42 -0500 From: Tom Culliton On 05 Feb 1998 05:01:30 GMT, smarry@pantransit.smar.reptiles.org wrote: > Hello, this is Marc Moorcroft. I joined the list shortly after > running into two problems with rc-1.5b2 under Linux. > > One definite bug (reported to Tim Goodwin) will cause rc to spin calling > wait(2) if you do: > > {ls & wait} | cat > > There is another problem with signal handling that is a little more > complicated. When I upgraded to the 2.0.33 kernel, rc began hanging > occasionally when I interrupted programs, and when I finally got irritated > enough to check it thoroughly, I found that it failed trip.rc at: > > kill -2 $pid > > The relevant code is rc_wait() in wait.c: > > static pid_t rc_wait(int *stat) { > int r; > interrupt_happened = FALSE; > if (!setjmp(slowbuf.j)) { > slow = TRUE; > if (!interrupt_happened) > r = wait(stat); > else > r = -1; > } else > r = -1; > slow = FALSE; > return r; > } > > It appears that some of the time, Linux will return from the wait(2) > for the 'kill' process before the signal gets delivered. On Linux > installations where signal(2) has the System V behaviour (system calls > are interrupted for signals that are caught via signal(2)) rc longjmps > out of the signal handler (a rather alarming practice in itself) to the > top of the enclosing code in rc_wait(). The sequence of events appears > to be: > > The signal is sent, > > the process exits, > > wait(2) returns successfully, and > > before the longjmp gadgetry can be turned off (slow = FALSE), > the signal handler IMMEDIATELY runs, > > longjmps back to the top of the setjmp block, > > and the PID that wait(2) returned is lost. rc loops forever calling > wait(2) with no children, waiting for the lost PID to turn up. Shouldn't the "interrupt_happened" flag prevent this? > I've talked to others who have had different problems on other Linux > installations, where caught signals do not interrupt system calls, as > in BSD. This appears to be due to a difference of opinion between the > libc and glibc people about how signals should behave, but I haven't > investigated it myself. It sounds like you're running RedHat 5.0 or some other distribution which uses glibc. I haven't taken that step, partly on the "beware of version X.0 of anything" and partly because I'm waiting until I get a new machine. Sounds like this is another good reason, having heard lots of complaints of problems with glibc. The ultimate solution maybe to move to the world of sigaction/sigblock where those calls are available. The signal handling in rc is one of the hairyest aspects of the code due to portability issues and race conditions. Byron spent a lot of time (the change logs are full of it) fixing race conditions and signal handling before he passed on the torch. (BTW - Anyone heard from him recently?) Tom