From: Charlie Brady <charlieb-supervision@budge.apana.org.au>
To: supervision@list.skarnet.org
Subject: runsv spinning 100% CPU (was Re: runsv and EAGAIN)
Date: Tue, 20 Jan 2009 18:21:10 -0500 (EST) [thread overview]
Message-ID: <Pine.LNX.4.64.0901201804020.6370@e-smith.charlieb.ott.istop.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0807122230580.21979@e-smith.charlieb.ott.istop.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 4264 bytes --]
I've finally found time to look at this issue again, and have found a way
to replicate the issue at will.
To replicate, create a service directory containing an empty file
supervise/control in place of the fifo. Create a symlink so that runsvdir
spawns a new runsv process. runsv will then behave as shown in the strace
plus commentary shown below.
The unexpected behaviour comes from a combination of two factors -
supervise/control is a file and not a fifo, and linux poll() sets POLLIN
for a regular file at EOF:
http://www.greenend.org.uk/rjk/2001/06/poll.html
IMO this is an error condition which runsv should handle more gracefully.
The question is - how should the error be handled or corrected? One way
would be for runsv to replace any existing supervise/control at startup
unless it is a fifo.
On Sat, 12 Jul 2008, Charlie Brady wrote:
> On Sat, 12 Jul 2008, Dra¾en Kaèar wrote:
>
>> Charlie Brady wrote:
>> > > Here's a sample strace:
>> > >
>> > > rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
>> > > rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
>> > > poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN},
>> > > {fd=11, events=POLLIN}], 3, 1000020) = 1
>>
>> You only have POLLIN in revents for fd 9. Also, the return value is 1,
>> meaning that only one file descriptor has readable data.
>>
>> > > rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
>> > > rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
>> > > read(3, 0xbfa256c3, 1) = -1 EAGAIN (Resource
>> > > temporarily
>> > > unavailable)
>> > > waitpid(-1, 0xbfa256bc, WNOHANG) = 0
>> > > read(9, "", 1) = 0
>> > > read(11, 0xbfa256c3, 1) = -1 EAGAIN (Resource
>> > > temporarily
>> > > unavailable)
>> > > gettimeofday({1215792640, 591768}, NULL) = 0
>> > ...
>>
>> > So poll i saying that data is available, and read is saying that it
>> > isn't.
>> > Is anyone else confused?
>>
>> poll() is saying that the data is available on fd 9 and read() on fd 9
>> does not return an error. The other two file descriptors should have not
>> been read.
>
> Thanks. I didn't read the strace at all carefully, did I?
>
> I see in iopause() that poll() is called without checking the return status.
>
> fd 3 is runsv's selfpipe. fd 9 is the control fifo and fd 11 is the log
> control fifo. Poll tells us that fd 9 had data to read, but then read
> returned 0.
>
> I'd better check (on Monday) to see whether I somehow have multiple runsv
> processes in the same directory...
>
> Here's the relevant part of runsv.c:
>
> ...
> sig_unblock(sig_term);
> sig_unblock(sig_child);
> iopause(x, 2 +haslog, &deadline, &now);
> sig_block(sig_term);
> sig_block(sig_child);
>
> while (read(selfpipe[0], &ch, 1) == 1)
> ;
> for (;;) {
> int child;
> int wstat;
>
> child =wait_nohang(&wstat);
> if (!child) break;
> if ((child == -1) && (errno != error_intr)) break;
> if (child == svd[0].pid) {
> svd[0].pid =0;
> pidchanged =1;
> svd[0].wstat =wstat;
> svd[0].ctrl &=~C_TERM;
> if (svd[0].state != S_FINISH)
> if ((fd =open_read("finish")) != -1) {
> close(fd);
> svd[0].state =S_FINISH;
> update_status(&svd[0]);
> continue;
> }
> svd[0].state =S_DOWN;
> taia_uint(&deadline, 1);
> taia_add(&deadline, &svd[0].start, &deadline);
> taia_now(&svd[0].start);
> update_status(&svd[0]);
> if (taia_less(&svd[0].start, &deadline)) sleep(1);
> }
> if (haslog) {
> if (child == svd[1].pid) {
> svd[1].pid =0;
> pidchanged =1;
> svd[1].state =S_DOWN;
> svd[1].ctrl &=~C_TERM;
> taia_uint(&deadline, 1);
> taia_add(&deadline, &svd[1].start, &deadline);
> taia_now(&svd[1].start);
> update_status(&svd[1]);
> if (taia_less(&svd[1].start, &deadline)) sleep(1);
> }
> }
> }
> if (read(svd[0].fdcontrol, &ch, 1) == 1) ctrl(&svd[0], ch);
> if (haslog)
> if (read(svd[1].fdcontrol, &ch, 1) == 1) ctrl(&svd[1], ch);
>
> ...
>
next prev parent reply other threads:[~2009-01-20 23:21 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-11 20:46 runsv and EAGAIN Charlie Brady
2008-07-12 1:27 ` Charlie Brady
2008-07-12 16:07 ` Dražen Kačar
2008-07-13 2:46 ` Charlie Brady
2009-01-20 23:21 ` Charlie Brady [this message]
2009-02-10 12:41 ` runsv spinning 100% CPU (was Re: runsv and EAGAIN) Gerrit Pape
2009-02-18 0:09 ` Laurent Bercot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0901201804020.6370@e-smith.charlieb.ott.istop.com \
--to=charlieb-supervision@budge.apana.org.au \
--cc=supervision@list.skarnet.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).