supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* runsv and EAGAIN
@ 2008-07-11 20:46 Charlie Brady
  2008-07-12  1:27 ` Charlie Brady
  0 siblings, 1 reply; 7+ messages in thread
From: Charlie Brady @ 2008-07-11 20:46 UTC (permalink / raw)
  To: supervision


I'm just putting together a new sytem using CentOS 5.2 (i686 - 
2.6.18-92.el5PAE), and have runsv chewing up lots of CPU:

...
  1202 root      25   0   108   28   20 R 10.2  0.0   3:13.97 runsv
  1300 root      25   0   112   36   20 R 10.2  0.0   3:17.01 runsv
  1306 root      25   0   108   28   20 R 10.2  0.0   3:18.52 runsv
  1322 root      25   0   112   36   20 R 10.2  0.0   3:16.97 runsv
  1203 root      25   0   108   32   20 R  9.9  0.0   3:19.21 runsv
  1228 root      25   0   112   36   20 R  9.9  0.0   3:21.39 runsv
  1271 root      25   0   112   32   20 R  9.9  0.0   3:24.37 runsv
  1200 root      25   0   112   32   20 R  9.6  0.0   3:23.17 runsv
...

Here's a sample strace:

rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}, 
{fd=11, events=POLLIN}], 3, 1000020) = 1
rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
read(3, 0xbfa256c3, 1)                  = -1 EAGAIN (Resource temporarily 
unavailable)
waitpid(-1, 0xbfa256bc, WNOHANG)        = 0
read(9, "", 1)                          = 0
read(11, 0xbfa256c3, 1)                 = -1 EAGAIN (Resource temporarily 
unavailable)
gettimeofday({1215792640, 591768}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}, 
{fd=11, events=POLLIN}], 3, 1000020) = 1
rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
read(3, 0xbfa256c3, 1)                  = -1 EAGAIN (Resource temporarily 
unavailable)
waitpid(-1, 0xbfa256bc, WNOHANG)        = 0
read(9, "", 1)                          = 0
read(11, 0xbfa256c3, 1)                 = -1 EAGAIN (Resource temporarily 
unavailable)
gettimeofday({1215792640, 593313}, NULL) = 0
...

I think this might be a case of a kernel being "this dumb" - iopause.c:

...
  	/* XXX: some kernels apparently need x[0] even if len is 0 */
  	/* XXX: how to handle EAGAIN? are kernels really this dumb? */
  	/* XXX: how to handle EINVAL? when exactly can this happen? */
...


...
ERRORS
        The poll() function shall fail if:

        EAGAIN The allocation of internal data structures failed but a 
subsequent request may succeed.

...

The system isn't obviously out of memory:

[root@amc-upgrade:]# free
              total       used       free     shared    buffers     cached
Mem:        255608     213396      42212          0      29240     109132
-/+ buffers/cache:      75024     180584
Swap:       265064          0     265064
~
[root@amc-upgrade:]#

[root@amc-upgrade:]# cat /proc/meminfo
MemTotal:       255608 kB
MemFree:         47580 kB
Buffers:         29276 kB
Cached:         109176 kB
SwapCached:          0 kB
Active:         112132 kB
Inactive:        75084 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       255608 kB
LowFree:         47580 kB
SwapTotal:      265064 kB
SwapFree:       265064 kB
Dirty:            2176 kB
Writeback:           0 kB
AnonPages:       48780 kB
Mapped:           9236 kB
Slab:            15600 kB
PageTables:       2372 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:    392868 kB
Committed_AS:   206484 kB
VmallocTotal:   772088 kB
VmallocUsed:      2284 kB
VmallocChunk:   769652 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB
~
[root@amc-upgrade:]#




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runsv and EAGAIN
  2008-07-11 20:46 runsv and EAGAIN Charlie Brady
@ 2008-07-12  1:27 ` Charlie Brady
  2008-07-12 16:07   ` Dražen Kačar
  0 siblings, 1 reply; 7+ messages in thread
From: Charlie Brady @ 2008-07-12  1:27 UTC (permalink / raw)
  To: supervision


On Fri, 11 Jul 2008, Charlie Brady wrote:

> Here's a sample strace:
>
> rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
> rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
> poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}, {fd=11, 
> events=POLLIN}], 3, 1000020) = 1
> rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
> rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
> read(3, 0xbfa256c3, 1)                  = -1 EAGAIN (Resource temporarily 
> unavailable)
> waitpid(-1, 0xbfa256bc, WNOHANG)        = 0
> read(9, "", 1)                          = 0
> read(11, 0xbfa256c3, 1)                 = -1 EAGAIN (Resource temporarily 
> unavailable)
> gettimeofday({1215792640, 591768}, NULL) = 0
...
> ...
> ERRORS
>        The poll() function shall fail if:
>
>       EAGAIN The allocation of internal data structures failed but a 
> subsequent request may succeed.
>
> ...

Sorry - this, from "man 3 read" is the relevant manpage section:

...
ERRORS
        EAGAIN Non-blocking I/O has been selected using O_NONBLOCK and no
        data was  immediately available for reading.
...

So poll i saying that data is available, and read is saying that it isn't. 
Is anyone else confused?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runsv and EAGAIN
  2008-07-12  1:27 ` Charlie Brady
@ 2008-07-12 16:07   ` Dražen Kačar
  2008-07-13  2:46     ` Charlie Brady
  0 siblings, 1 reply; 7+ messages in thread
From: Dražen Kačar @ 2008-07-12 16:07 UTC (permalink / raw)
  To: supervision

Charlie Brady wrote:
> >Here's a sample strace:
> >
> >rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
> >rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
> >poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN}, 
> >{fd=11, events=POLLIN}], 3, 1000020) = 1

You only have POLLIN in revents for fd 9. Also, the return value is 1,
meaning that only one file descriptor has readable data.

> >rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
> >rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
> >read(3, 0xbfa256c3, 1)                  = -1 EAGAIN (Resource temporarily 
> >unavailable)
> >waitpid(-1, 0xbfa256bc, WNOHANG)        = 0
> >read(9, "", 1)                          = 0
> >read(11, 0xbfa256c3, 1)                 = -1 EAGAIN (Resource temporarily 
> >unavailable)
> >gettimeofday({1215792640, 591768}, NULL) = 0
> ...

> So poll i saying that data is available, and read is saying that it isn't. 
> Is anyone else confused?

poll() is saying that the data is available on fd 9 and read() on fd 9
does not return an error. The other two file descriptors should have not
been read.

-- 
 .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
     |
     |        dave@fly.srk.fer.hr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runsv and EAGAIN
  2008-07-12 16:07   ` Dražen Kačar
@ 2008-07-13  2:46     ` Charlie Brady
  2009-01-20 23:21       ` runsv spinning 100% CPU (was Re: runsv and EAGAIN) Charlie Brady
  0 siblings, 1 reply; 7+ messages in thread
From: Charlie Brady @ 2008-07-13  2:46 UTC (permalink / raw)
  To: Dražen Kačar; +Cc: supervision

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 3209 bytes --]


On Sat, 12 Jul 2008, Dra¾en Kaèar wrote:

> Charlie Brady wrote:
>>> Here's a sample strace:
>>>
>>> rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
>>> rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
>>> poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN},
>>> {fd=11, events=POLLIN}], 3, 1000020) = 1
>
> You only have POLLIN in revents for fd 9. Also, the return value is 1,
> meaning that only one file descriptor has readable data.
>
>>> rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
>>> rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
>>> read(3, 0xbfa256c3, 1)                  = -1 EAGAIN (Resource temporarily
>>> unavailable)
>>> waitpid(-1, 0xbfa256bc, WNOHANG)        = 0
>>> read(9, "", 1)                          = 0
>>> read(11, 0xbfa256c3, 1)                 = -1 EAGAIN (Resource temporarily
>>> unavailable)
>>> gettimeofday({1215792640, 591768}, NULL) = 0
>> ...
>
>> So poll i saying that data is available, and read is saying that it isn't.
>> Is anyone else confused?
>
> poll() is saying that the data is available on fd 9 and read() on fd 9
> does not return an error. The other two file descriptors should have not
> been read.

Thanks. I didn't read the strace at all carefully, did I?

I see in iopause() that poll() is called without checking the return 
status.

fd 3 is runsv's selfpipe. fd 9 is the control fifo and fd 11 is the log 
control fifo. Poll tells us that fd 9 had data to read, but then read 
returned 0.

I'd better check (on Monday) to see whether I somehow have multiple runsv 
processes in the same directory...

Here's the relevant part of runsv.c:

...
     sig_unblock(sig_term);
     sig_unblock(sig_child);
     iopause(x, 2 +haslog, &deadline, &now);
     sig_block(sig_term);
     sig_block(sig_child);

     while (read(selfpipe[0], &ch, 1) == 1)
       ;
     for (;;) {
       int child;
       int wstat;

       child =wait_nohang(&wstat);
       if (!child) break;
       if ((child == -1) && (errno != error_intr)) break;
       if (child == svd[0].pid) {
         svd[0].pid =0;
         pidchanged =1;
         svd[0].wstat =wstat;
         svd[0].ctrl &=~C_TERM;
         if (svd[0].state != S_FINISH)
           if ((fd =open_read("finish")) != -1) {
             close(fd);
             svd[0].state =S_FINISH;
             update_status(&svd[0]);
             continue;
           }
         svd[0].state =S_DOWN;
         taia_uint(&deadline, 1);
         taia_add(&deadline, &svd[0].start, &deadline);
         taia_now(&svd[0].start);
         update_status(&svd[0]);
         if (taia_less(&svd[0].start, &deadline)) sleep(1);
       }
       if (haslog) {
         if (child == svd[1].pid) {
           svd[1].pid =0;
           pidchanged =1;
           svd[1].state =S_DOWN;
           svd[1].ctrl &=~C_TERM;
           taia_uint(&deadline, 1);
           taia_add(&deadline, &svd[1].start, &deadline);
           taia_now(&svd[1].start);
           update_status(&svd[1]);
           if (taia_less(&svd[1].start, &deadline)) sleep(1);
         }
       }
     }
     if (read(svd[0].fdcontrol, &ch, 1) == 1) ctrl(&svd[0], ch);
     if (haslog)
       if (read(svd[1].fdcontrol, &ch, 1) == 1) ctrl(&svd[1], ch);

...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* runsv spinning 100% CPU (was Re: runsv and EAGAIN)
  2008-07-13  2:46     ` Charlie Brady
@ 2009-01-20 23:21       ` Charlie Brady
  2009-02-10 12:41         ` Gerrit Pape
  0 siblings, 1 reply; 7+ messages in thread
From: Charlie Brady @ 2009-01-20 23:21 UTC (permalink / raw)
  To: supervision

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 4264 bytes --]


I've finally found time to look at this issue again, and have found a way 
to replicate the issue at will.


To replicate, create a service directory containing an empty file 
supervise/control in place of the fifo. Create a symlink so that runsvdir 
spawns a new runsv process. runsv will then behave as shown in the strace 
plus commentary shown below.

The unexpected behaviour comes from a combination of two factors - 
supervise/control is a file and not a fifo, and linux poll() sets POLLIN 
for a regular file at EOF:

http://www.greenend.org.uk/rjk/2001/06/poll.html

IMO this is an error condition which runsv should handle more gracefully. 
The question is - how should the error be handled or corrected? One way 
would be for runsv to replace any existing supervise/control at startup 
unless it is a fifo.

On Sat, 12 Jul 2008, Charlie Brady wrote:

> On Sat, 12 Jul 2008, Dra¾en Kaèar wrote:
>
>>  Charlie Brady wrote:
>> > >  Here's a sample strace:
>> > > 
>> > >  rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0
>> > >  rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
>> > >  poll([{fd=3, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLIN},
>> > >  {fd=11, events=POLLIN}], 3, 1000020) = 1
>>
>>  You only have POLLIN in revents for fd 9. Also, the return value is 1,
>>  meaning that only one file descriptor has readable data.
>> 
>> > >  rt_sigprocmask(SIG_BLOCK, [TERM], NULL, 8) = 0
>> > >  rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
>> > >  read(3, 0xbfa256c3, 1)                  = -1 EAGAIN (Resource 
>> > >  temporarily
>> > >  unavailable)
>> > >  waitpid(-1, 0xbfa256bc, WNOHANG)        = 0
>> > >  read(9, "", 1)                          = 0
>> > >  read(11, 0xbfa256c3, 1)                 = -1 EAGAIN (Resource 
>> > >  temporarily
>> > >  unavailable)
>> > >  gettimeofday({1215792640, 591768}, NULL) = 0
>> >  ...
>> 
>> >  So poll i saying that data is available, and read is saying that it 
>> >  isn't.
>> >  Is anyone else confused?
>>
>>  poll() is saying that the data is available on fd 9 and read() on fd 9
>>  does not return an error. The other two file descriptors should have not
>>  been read.
>
> Thanks. I didn't read the strace at all carefully, did I?
>
> I see in iopause() that poll() is called without checking the return status.
>
> fd 3 is runsv's selfpipe. fd 9 is the control fifo and fd 11 is the log 
> control fifo. Poll tells us that fd 9 had data to read, but then read 
> returned 0.
>
> I'd better check (on Monday) to see whether I somehow have multiple runsv 
> processes in the same directory...
>
> Here's the relevant part of runsv.c:
>
> ...
>     sig_unblock(sig_term);
>     sig_unblock(sig_child);
>     iopause(x, 2 +haslog, &deadline, &now);
>     sig_block(sig_term);
>     sig_block(sig_child);
>
>    while (read(selfpipe[0], &ch, 1) == 1)
>       ;
>     for (;;) {
>       int child;
>       int wstat;
>
>       child =wait_nohang(&wstat);
>       if (!child) break;
>       if ((child == -1) && (errno != error_intr)) break;
>       if (child == svd[0].pid) {
>         svd[0].pid =0;
>         pidchanged =1;
>         svd[0].wstat =wstat;
>         svd[0].ctrl &=~C_TERM;
>         if (svd[0].state != S_FINISH)
>           if ((fd =open_read("finish")) != -1) {
>             close(fd);
>             svd[0].state =S_FINISH;
>             update_status(&svd[0]);
>             continue;
>           }
>         svd[0].state =S_DOWN;
>         taia_uint(&deadline, 1);
>         taia_add(&deadline, &svd[0].start, &deadline);
>         taia_now(&svd[0].start);
>         update_status(&svd[0]);
>         if (taia_less(&svd[0].start, &deadline)) sleep(1);
>       }
>       if (haslog) {
>         if (child == svd[1].pid) {
>           svd[1].pid =0;
>           pidchanged =1;
>           svd[1].state =S_DOWN;
>           svd[1].ctrl &=~C_TERM;
>           taia_uint(&deadline, 1);
>           taia_add(&deadline, &svd[1].start, &deadline);
>           taia_now(&svd[1].start);
>           update_status(&svd[1]);
>           if (taia_less(&svd[1].start, &deadline)) sleep(1);
>         }
>       }
>     }
>     if (read(svd[0].fdcontrol, &ch, 1) == 1) ctrl(&svd[0], ch);
>     if (haslog)
>       if (read(svd[1].fdcontrol, &ch, 1) == 1) ctrl(&svd[1], ch);
>
> ...
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runsv spinning 100% CPU (was Re: runsv and EAGAIN)
  2009-01-20 23:21       ` runsv spinning 100% CPU (was Re: runsv and EAGAIN) Charlie Brady
@ 2009-02-10 12:41         ` Gerrit Pape
  2009-02-18  0:09           ` Laurent Bercot
  0 siblings, 1 reply; 7+ messages in thread
From: Gerrit Pape @ 2009-02-10 12:41 UTC (permalink / raw)
  To: supervision

Charlie Brady <charlieb-supervision <at> budge.apana.org.au> writes:
> I've finally found time to look at this issue again, and have found a way 
> to replicate the issue at will.
> 
> To replicate, create a service directory containing an empty file 
> supervise/control in place of the fifo. Create a symlink so that runsvdir 
> spawns a new runsv process. runsv will then behave as shown in the strace 
> plus commentary shown below.
> 
> The unexpected behaviour comes from a combination of two factors - 
> supervise/control is a file and not a fifo, and linux poll() sets POLLIN 
> for a regular file at EOF:
> 
> http://www.greenend.org.uk/rjk/2001/06/poll.html

Yes, I see.

> IMO this is an error condition which runsv should handle more gracefully. 
> The question is - how should the error be handled or corrected? One way 
> would be for runsv to replace any existing supervise/control at startup 
> unless it is a fifo.

Another option would be to have runsv print out an error message like
'supervise/control exists but is not a fifo.' and exit with 111.  This
may be better than forcefully removing a file.

Regards, Gerrit.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: runsv spinning 100% CPU (was Re: runsv and EAGAIN)
  2009-02-10 12:41         ` Gerrit Pape
@ 2009-02-18  0:09           ` Laurent Bercot
  0 siblings, 0 replies; 7+ messages in thread
From: Laurent Bercot @ 2009-02-18  0:09 UTC (permalink / raw)
  To: supervision

> Another option would be to have runsv print out an error message like
> 'supervise/control exists but is not a fifo.' and exit with 111.  This
> may be better than forcefully removing a file.

 I think this is the wiser choice. When in doubt, exit with an error
message and let the administrator handle it.
 However, I'd exit 100, not 111, because the problem is structural,
not temporary: it's not that some resource is unavailable, it's that
runsv is used incorrectly (i.e. supervise/ has been tampered with).

-- 
 Laurent


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-02-18  0:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-11 20:46 runsv and EAGAIN Charlie Brady
2008-07-12  1:27 ` Charlie Brady
2008-07-12 16:07   ` Dražen Kačar
2008-07-13  2:46     ` Charlie Brady
2009-01-20 23:21       ` runsv spinning 100% CPU (was Re: runsv and EAGAIN) Charlie Brady
2009-02-10 12:41         ` Gerrit Pape
2009-02-18  0:09           ` Laurent Bercot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).