I've finally found time to look at this issue again, and have found a way to replicate the issue at will. To replicate, create a service directory containing an empty file supervise/control in place of the fifo. Create a symlink so that runsvdir spawns a new runsv process. runsv will then behave as shown in the strace plus commentary shown below. The unexpected behaviour comes from a combination of two factors - supervise/control is a file and not a fifo, and linux poll() sets POLLIN for a regular file at EOF: http://www.greenend.org.uk/rjk/2001/06/poll.html IMO this is an error condition which runsv should handle more gracefully. The question is - how should the error be handled or corrected? One way would be for runsv to replace any existing supervise/control at startup unless it is a fifo. On Sat, 12 Jul 2008, Charlie Brady wrote: > On Sat, 12 Jul 2008, Dražen Kačar wrote: > >> Charlie Brady wrote: >> > > Here's a sample strace: >> > > >> > > rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0 >> > > rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0 >> > > poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}, >> > > {fd=11, events=POLLIN}], 3, 1000020) = 1 >> >> You only have POLLIN in revents for fd 9. Also, the return value is 1, >> meaning that only one file descriptor has readable data. >> >> > > rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0 >> > > rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 >> > > read(3, 0xbfa256c3, 1) = -1 EAGAIN (Resource >> > > temporarily >> > > unavailable) >> > > waitpid(-1, 0xbfa256bc, WNOHANG) = 0 >> > > read(9, "", 1) = 0 >> > > read(11, 0xbfa256c3, 1) = -1 EAGAIN (Resource >> > > temporarily >> > > unavailable) >> > > gettimeofday({1215792640, 591768}, NULL) = 0 >> > ... >> >> > So poll i saying that data is available, and read is saying that it >> > isn't. >> > Is anyone else confused? >> >> poll() is saying that the data is available on fd 9 and read() on fd 9 >> does not return an error. The other two file descriptors should have not >> been read. > > Thanks. I didn't read the strace at all carefully, did I? > > I see in iopause() that poll() is called without checking the return status. > > fd 3 is runsv's selfpipe. fd 9 is the control fifo and fd 11 is the log > control fifo. Poll tells us that fd 9 had data to read, but then read > returned 0. > > I'd better check (on Monday) to see whether I somehow have multiple runsv > processes in the same directory... > > Here's the relevant part of runsv.c: > > ... > sig_unblock(sig_term); > sig_unblock(sig_child); > iopause(x, 2 +haslog, &deadline, &now); > sig_block(sig_term); > sig_block(sig_child); > > while (read(selfpipe[0], &ch, 1) == 1) > ; > for (;;) { > int child; > int wstat; > > child =wait_nohang(&wstat); > if (!child) break; > if ((child == -1) && (errno != error_intr)) break; > if (child == svd[0].pid) { > svd[0].pid =0; > pidchanged =1; > svd[0].wstat =wstat; > svd[0].ctrl &=~C_TERM; > if (svd[0].state != S_FINISH) > if ((fd =open_read("finish")) != -1) { > close(fd); > svd[0].state =S_FINISH; > update_status(&svd[0]); > continue; > } > svd[0].state =S_DOWN; > taia_uint(&deadline, 1); > taia_add(&deadline, &svd[0].start, &deadline); > taia_now(&svd[0].start); > update_status(&svd[0]); > if (taia_less(&svd[0].start, &deadline)) sleep(1); > } > if (haslog) { > if (child == svd[1].pid) { > svd[1].pid =0; > pidchanged =1; > svd[1].state =S_DOWN; > svd[1].ctrl &=~C_TERM; > taia_uint(&deadline, 1); > taia_add(&deadline, &svd[1].start, &deadline); > taia_now(&svd[1].start); > update_status(&svd[1]); > if (taia_less(&svd[1].start, &deadline)) sleep(1); > } > } > } > if (read(svd[0].fdcontrol, &ch, 1) == 1) ctrl(&svd[0], ch); > if (haslog) > if (read(svd[1].fdcontrol, &ch, 1) == 1) ctrl(&svd[1], ch); > > ... >