supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Charlie Brady <charlieb-supervision@budge.apana.org.au>
To: supervision@list.skarnet.org
Subject: runsv spinning 100% CPU (was Re: runsv and EAGAIN)
Date: Tue, 20 Jan 2009 18:21:10 -0500 (EST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0901201804020.6370@e-smith.charlieb.ott.istop.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0807122230580.21979@e-smith.charlieb.ott.istop.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 4264 bytes --]


I've finally found time to look at this issue again, and have found a way 
to replicate the issue at will.


To replicate, create a service directory containing an empty file 
supervise/control in place of the fifo. Create a symlink so that runsvdir 
spawns a new runsv process. runsv will then behave as shown in the strace 
plus commentary shown below.

The unexpected behaviour comes from a combination of two factors - 
supervise/control is a file and not a fifo, and linux poll() sets POLLIN 
for a regular file at EOF:

http://www.greenend.org.uk/rjk/2001/06/poll.html

IMO this is an error condition which runsv should handle more gracefully. 
The question is - how should the error be handled or corrected? One way 
would be for runsv to replace any existing supervise/control at startup 
unless it is a fifo.

On Sat, 12 Jul 2008, Charlie Brady wrote:

> On Sat, 12 Jul 2008, Dra¾en Kaèar wrote:
>
>>  Charlie Brady wrote:
>> > >  Here's a sample strace:
>> > > 
>> > >  rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
>> > >  rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
>> > >  poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN},
>> > >  {fd=11, events=POLLIN}], 3, 1000020) = 1
>>
>>  You only have POLLIN in revents for fd 9. Also, the return value is 1,
>>  meaning that only one file descriptor has readable data.
>> 
>> > >  rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
>> > >  rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
>> > >  read(3, 0xbfa256c3, 1)                  = -1 EAGAIN (Resource 
>> > >  temporarily
>> > >  unavailable)
>> > >  waitpid(-1, 0xbfa256bc, WNOHANG)        = 0
>> > >  read(9, "", 1)                          = 0
>> > >  read(11, 0xbfa256c3, 1)                 = -1 EAGAIN (Resource 
>> > >  temporarily
>> > >  unavailable)
>> > >  gettimeofday({1215792640, 591768}, NULL) = 0
>> >  ...
>> 
>> >  So poll i saying that data is available, and read is saying that it 
>> >  isn't.
>> >  Is anyone else confused?
>>
>>  poll() is saying that the data is available on fd 9 and read() on fd 9
>>  does not return an error. The other two file descriptors should have not
>>  been read.
>
> Thanks. I didn't read the strace at all carefully, did I?
>
> I see in iopause() that poll() is called without checking the return status.
>
> fd 3 is runsv's selfpipe. fd 9 is the control fifo and fd 11 is the log 
> control fifo. Poll tells us that fd 9 had data to read, but then read 
> returned 0.
>
> I'd better check (on Monday) to see whether I somehow have multiple runsv 
> processes in the same directory...
>
> Here's the relevant part of runsv.c:
>
> ...
>     sig_unblock(sig_term);
>     sig_unblock(sig_child);
>     iopause(x, 2 +haslog, &deadline, &now);
>     sig_block(sig_term);
>     sig_block(sig_child);
>
>    while (read(selfpipe[0], &ch, 1) == 1)
>       ;
>     for (;;) {
>       int child;
>       int wstat;
>
>       child =wait_nohang(&wstat);
>       if (!child) break;
>       if ((child == -1) && (errno != error_intr)) break;
>       if (child == svd[0].pid) {
>         svd[0].pid =0;
>         pidchanged =1;
>         svd[0].wstat =wstat;
>         svd[0].ctrl &=~C_TERM;
>         if (svd[0].state != S_FINISH)
>           if ((fd =open_read("finish")) != -1) {
>             close(fd);
>             svd[0].state =S_FINISH;
>             update_status(&svd[0]);
>             continue;
>           }
>         svd[0].state =S_DOWN;
>         taia_uint(&deadline, 1);
>         taia_add(&deadline, &svd[0].start, &deadline);
>         taia_now(&svd[0].start);
>         update_status(&svd[0]);
>         if (taia_less(&svd[0].start, &deadline)) sleep(1);
>       }
>       if (haslog) {
>         if (child == svd[1].pid) {
>           svd[1].pid =0;
>           pidchanged =1;
>           svd[1].state =S_DOWN;
>           svd[1].ctrl &=~C_TERM;
>           taia_uint(&deadline, 1);
>           taia_add(&deadline, &svd[1].start, &deadline);
>           taia_now(&svd[1].start);
>           update_status(&svd[1]);
>           if (taia_less(&svd[1].start, &deadline)) sleep(1);
>         }
>       }
>     }
>     if (read(svd[0].fdcontrol, &ch, 1) == 1) ctrl(&svd[0], ch);
>     if (haslog)
>       if (read(svd[1].fdcontrol, &ch, 1) == 1) ctrl(&svd[1], ch);
>
> ...
>

  reply	other threads:[~2009-01-20 23:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-11 20:46 runsv and EAGAIN Charlie Brady
2008-07-12  1:27 ` Charlie Brady
2008-07-12 16:07   ` Dražen Kačar
2008-07-13  2:46     ` Charlie Brady
2009-01-20 23:21       ` Charlie Brady [this message]
2009-02-10 12:41         ` runsv spinning 100% CPU (was Re: runsv and EAGAIN) Gerrit Pape
2009-02-18  0:09           ` Laurent Bercot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0901201804020.6370@e-smith.charlieb.ott.istop.com \
    --to=charlieb-supervision@budge.apana.org.au \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).