supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Laurent Bercot <ska-supervision@skarnet.org>
To: supervision@list.skarnet.org
Subject: [LONG] Re: runit not collecting zombies
Date: Tue, 15 Feb 2011 14:12:18 +0100	[thread overview]
Message-ID: <20110215131218.GA18284@skarnet.org> (raw)
In-Reply-To: <20070918081441.20488.qmail@1a6f0ddc0befcc.315fe32.mid.smarden.org>


 Four years later, I'm coming back to this thread, because something is
still bothering me.

 Quick summary: Radek Podgorny and Alex Efros both had an issue where
zombie processes would accumulate and *not* be reaped by runit as they
should have been. A long discussion ensued; it appeared that the problem
was caused by the following situation:

 * a process A forks a child, B ;
 * B dies, and a SIGCHLD is sent to A ;
 * A does not wait() for B and dies ;
 * so zombie B is reparented to 1, but no SIGCHLD is sent to 1 ;
 * zombie B remains there until runit's reaper is triggered, which can
be much, much later.

 Gerrit Pape concluded:

> runit tries to over-optimise, and only wakes up to reap zombies if it
> knows there are some, at least one.  Due to the fact that the mother
> process, which re-parented itself to pid 1, on the one hand receives a
> SIGCHLD, but on the other hand doesn't care about that, exits and leaves
> the dead child alone, the child gets re-parented to runit, but without
> any notification.
> 
> The situation would have been cleaned up on your systems once any child
> process gets re-parented to process 1 before it terminates, and then
> exits, causing runit to get a SIGCHLD; which apparently didn't happen.
> It's what the kill -CONT 1 I suggested fakes.  That seems to explain why
> this problem didn't show up for years.
> 
> I prepare a new version of runit that looks for and reaps zombies not
> only if it knows that there are some, but also after a 14 seconds
> timeout, there seems to be no way around that.


 And that is what bothers me. Something is not right.
 Unix should be able to function without polling at all.
 I'm building Linux environments for embedded platforms, on which
energy consumption is an important thing. If such a basic thing as
process 1 has to do polling, I'm forfeiting my job right now.

 runit ran perfectly without polling for lots of people except Radek and
Alex. Until Gerrit had to add a polling mechanism just for them. What do
other init systems do ?


 I straced sysvinit:
Process 1 attached - interrupt to quit
select(11, [10], NULL, NULL, {2, 902034}) = 0 (Timeout)
time(NULL)                              = 1297769034
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fstat64(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
select(11, [10], NULL, NULL, {5, 0})    = 0 (Timeout)
time(NULL)                              = 1297769039
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fstat64(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
select(11, [10], NULL, NULL, {5, 0})    = 0 (Timeout)
time(NULL)                              = 1297769044
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
fstat64(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat64("/dev/initctl", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
select(11, [10], NULL, NULL, {5, 0}^C <unfinished ...>
Process 1 detached

 No luck here. sysvinit wakes up every 5 seconds. Don't ask me why:
it does not even reap children when it wakes up. Its only goal seems
to be to make sure that /dev/initctl is still there by stat()ing
it three times. Lol. sysvinit sucks - nothing new here.


 I straced Upstart:
Process 1 attached - interrupt to quit
select(11, [3 5 6 7 9 10], [], [7 9 10], NULL^C <unfinished ...>
Process 1 detached
 
 Aha. Upstart waits on notifications forever. It does not poll at all.
No, I'm definitely not going to install Upstart on embedded systems :)
but it's a good indication that it is possible to only reap children
when being triggered; it is *not necessary*, at least on Linux, to
have a timed reaping loop.

 So, where does the problem come from ?
 Do reparented zombies *really* cause no trigger ?

 I ran the following command while stracing my own process 1 (s4-svscan,
which does not poll) on a Linux 2.6.36.1 kernel:
$ execlineb -c "background { sleep 1 } s4-sleep 2"

 This little execline script will fork; the child will exec "sleep 1",
which will exit after 1 second. The parent will exec "s4-sleep 2", which will
sleep 2 seconds *without being interrupted by signals* and then exit
*without waiting for its dead child*. (I used my own version of "sleep"
just to make sure it slept for the full duration and did not wait().)

 So, when the child dies, a SIGCHLD will be sent to the parent, which
is totally oblivious to it. One second later, the parent will die, and
its zombie child will then be inherited by process 1. What happens then ?

Process 1 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 1
gettimeofday({1297770104, 310225}, NULL) = 0
read(5, "\21\0\0\0\0\0\0\0\1\0\0\0:1\0\0\350\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
read(5, 0xbfa51d0c, 128)                = -1 EAGAIN (Resource temporarily unavailable)
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 12602
wait4(-1, 0xbfa51e10, WNOHANG, NULL)    = 0
poll([{fd=5, events=POLLIN|POLLHUP}, {fd=4, events=POLLIN|POLLHUP}], 2, -1^C <unfinished ...>
Process 1 detached

 fd 5 is actually obtained via signalfd() when available, and listens to
signals such as SIGCHLD. (When signalfd() is not available, a selfpipe is
used instead.)
 The trace is crystal clear: when the parent dies and the zombie child
is reparented to 1, *process 1 does get notified with a SIGCHLD* even if
the former parent has already been notified before (and has done nothing).
Here, the signal is seen as the signalfd being available, but it's still
a signal.

 This is a very normal, expectable, sane behaviour that Linux 2.6.36.1
exhibits; and it confirms my expectation that process 1 SHOULD NOT have
a timed reaping loop.

 Upstart does the right thing (as far as waiting for notifications is
concerned, I mean). runit did the right thing before the change.

 The problem Radek and Alex had was most likely caused by a kernel bug:
in some cases, when a zombie is reparented to process 1, process 1 does
not get notified with a SIGCHLD, as it should be.

 I don't have the time or resources to explore this further; but the
modus operandi is simple.

 - Make sure you can strace your process 1. If you cannot, patch it
so it writes something (to the system log or its own stderr which should
point to the console) everytime it receives a SIGCHLD. Upstart and
sysvinit are spaghetti monsters, but runit is trivial to patch.

 - Run the following script: sh -c "sleep 1 & ; exec sleep 2"
provided your sleep binary does not do anything fancy with signals.
Or replace the "sleep 2" with something that you know does not
catch signals and lasts more than one second.

 - Check what process 1 says after 2 seconds. If it received a SIGCHLD,
your kernel works. If it did not, you have found a kernel bug.

 runit's polling mechanism is a workaround to this bug, not the
solution to some Unix problem. Gerrit, please make it optional, so
functional systems can disable polling entirely.

-- 
 Laurent


  parent reply	other threads:[~2011-02-15 13:12 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-24 23:07 Radek Podgorny
2007-05-26 10:35 ` Alex Efros
2007-05-26 10:45   ` Alex Efros
2007-05-26 12:55   ` Charlie Brady
2007-05-26 13:03     ` Alex Efros
2007-05-26 17:01   ` Paul Jarc
2007-06-02 14:55     ` Alex Efros
2007-06-03 11:10   ` Gerrit Pape
2007-06-03 14:33     ` Alex Efros
2007-06-03 16:31       ` Gerrit Pape
2007-06-11 13:11     ` Alex Efros
2007-06-18 13:45       ` Alex Efros
2007-06-19 18:13         ` Gerrit Pape
2007-06-19 19:07           ` Alex Efros
2007-06-20 16:23             ` Gerrit Pape
2007-06-20 16:57               ` Alex Efros
2007-06-20 18:35                 ` Gerrit Pape
2007-06-23  4:42                   ` Alex Efros
2007-06-26  9:59                     ` Gerrit Pape
2007-07-07  7:16                       ` Alex Efros
2007-07-07 18:13                         ` Charlie Brady
2007-07-07 19:12                           ` Alex Efros
2007-07-12 14:21                             ` Charlie Brady
2007-07-12 14:41                               ` Alex Efros
2007-07-12 14:45                                 ` Charlie Brady
2007-07-12 14:57                                   ` Alex Efros
2007-07-12 14:42                           ` Charlie Brady
2007-07-12 14:43                             ` Charlie Brady
2007-07-12 14:49                             ` Alex Efros
2007-07-12 15:11                               ` Charlie Brady
2007-07-12 15:15                                 ` Alex Efros
2007-07-12 15:40                                   ` Charlie Brady
2007-07-15 14:47                       ` Alex Efros
2007-07-15 19:07                         ` Alex Efros
2007-07-15 20:18                           ` George Georgalis
2007-07-15 20:31                             ` Paul Jarc
2007-07-15 22:35                               ` George Georgalis
2007-07-15 23:06                                 ` Paul Jarc
2007-07-15 23:23                                   ` Charlie Brady
2007-07-16  0:09                                     ` Alex Efros
2007-07-16  2:11                                       ` Charlie Brady
2007-09-12 12:53                                         ` Radek Podgorny
     [not found]                                         ` <47939.::ffff:77.75.72.5.1189601606.squirrel@mail.podgorny.cz>
2007-09-12 13:55                                           ` Charlie Brady
2007-09-12 14:35                                             ` Alex Efros
2007-09-12 14:55                                               ` Charlie Brady
2007-09-12 15:00                                                 ` Alex Efros
2007-09-12 16:02                                                   ` Charlie Brady
2007-09-12 16:10                                                     ` Radek Podgorny
2007-09-12 17:22                                                     ` Alex Efros
2007-09-12 17:40                                                       ` Charlie Brady
2007-09-12 18:18                                                         ` Alex Efros
2007-09-12 19:07                                                           ` Charlie Brady
2007-09-12 19:13                                                             ` Alex Efros
2007-09-12 19:18                                                               ` Charlie Brady
2007-09-12 19:30                                                                 ` Alex Efros
2007-09-12 19:37                                                                   ` Charlie Brady
2007-09-15 13:36                                                                 ` Alex Efros
2007-09-15 13:57                                                                   ` Alex Efros
2007-09-15 15:20                                                                     ` Charlie Brady
2007-09-15 15:28                                                                       ` Alex Efros
2007-09-15 15:47                                                                         ` Charlie Brady
2007-09-15 16:02                                                                           ` Alex Efros
2007-09-15 15:49                                                                         ` Charlie Brady
2007-09-15 15:55                                                                           ` Alex Efros
2007-09-15 16:02                                                                             ` Charlie Brady
2007-09-15 15:36                                                                       ` Alex Efros
2007-09-15 15:58                                                                         ` Charlie Brady
2007-09-15 14:03                                                                   ` Alex Efros
2007-09-17  7:56                                                                   ` Gerrit Pape
2007-09-17  9:07                                                                     ` Radek Podgorny
2007-09-17 11:59                                                                     ` Alex Efros
2007-09-18  8:14                                                                       ` Gerrit Pape
2007-09-18 11:33                                                                         ` Alex Efros
2007-09-18 11:45                                                                         ` Laurent Bercot
2011-02-15 13:12                                                                         ` Laurent Bercot [this message]
2011-02-15 15:00                                                                           ` [LONG] " Alex Efros
2011-02-15 15:22                                                                             ` Laurent Bercot
2007-09-12 16:04                                                   ` Radek Podgorny
     [not found]                                                   ` <35517.::ffff:77.75.72.5.1189613042.squirrel@mail.podgorny.cz>
2007-09-12 17:04                                                     ` Alex Efros
2007-09-12 19:38                                                       ` Mike Buland
2007-09-12 20:28                                                         ` Alex Efros
2007-09-12 20:38                                                           ` Alex Efros
2007-09-13  1:05                                                           ` Mike Buland
2007-09-13  8:58                                                       ` Radek Podgorny
     [not found]                                                       ` <50411.::ffff:77.75.72.5.1189673890.squirrel@mail.podgorny.cz>
2007-09-13 10:57                                                         ` Alex Efros
2007-09-13 12:06                                                           ` Alex Efros
2007-09-13 14:31                                                           ` Radek Podgorny
     [not found]                                                           ` <51910.::ffff:77.75.72.5.1189693860.squirrel@mail.podgorny.cz>
2007-09-13 14:51                                                             ` Alex Efros
2007-07-16  2:24                                   ` George Georgalis
2007-07-01  8:43                   ` Radek Podgorny
2007-07-02  8:28                     ` Gerrit Pape
2007-07-02 11:23                       ` Radek Podgorny
2007-07-02 12:14                         ` Gerrit Pape
2007-07-02 12:42                           ` Radek Podgorny
2007-07-07  4:54                       ` Alex Efros
2007-06-20 19:57                 ` Charlie Brady
2008-02-25  7:25 ` Alex Efros
2008-02-25 14:57   ` Charlie Brady
2008-02-25 15:23     ` Radek Podgorny
     [not found]     ` <59012.::ffff:77.75.72.226.1203952988.squirrel@mail.podgorny.cz>
2008-02-25 15:26       ` George Georgalis
2008-02-25 15:32       ` Charlie Brady
2008-02-25 16:17         ` Alex Efros
2008-02-25 17:20       ` Mike Buland
2008-02-25 15:27   ` Radek Podgorny
     [not found]   ` <34616.::ffff:77.75.72.226.1203953244.squirrel@mail.podgorny.cz>
2008-02-25 16:15     ` Alex Efros
2008-02-27  8:19   ` Bernhard Graf
2008-02-27  8:36     ` Alex Efros
2008-02-27  8:58       ` Bernhard Graf
     [not found] ` <F694D808C0BB4890A12C565F68B9A691@home.internal>
2008-02-25 16:24   ` rehan khan
2008-02-25 16:27     ` Charlie Brady
     [not found]     ` <54B6D6D6D32D4DB685F8CA9A836076D7@home.internal>
2008-02-25 17:11       ` rehan khan
2008-02-25 19:13     ` Charlie Brady
2008-10-21 21:46 ` Alex Efros

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110215131218.GA18284@skarnet.org \
    --to=ska-supervision@skarnet.org \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).