supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Laurent Bercot <ska-supervision@skarnet.org>
To: supervision@list.skarnet.org
Subject: Re: [announce] perp-2.03: persistent process supervision
Date: Mon, 14 Mar 2011 17:47:41 +0100	[thread overview]
Message-ID: <20110314164741.GA7248@skarnet.org> (raw)
In-Reply-To: <20110314153425.34ed16dc@b0llix.net>

> First, perpd(8) will not die (TM).

 Of course it will not - not in normal circumstances.
 Neither will svscan, or runsvdir, or s6-svscan.
 I trust your programming ability in that matter as much as mine - this
is not a concern at all.

 The concern is that you don't always have the say. There's this playful
thing called the Linux OOM killer. I hear the heuristics have been fixed
in recent kernel releases, but for a long time, the OOM killer had the
amusing habit of shooting processes at random, and very much failing to
locate the process that is actually responsible for the memory outage.
 There are still a whole lot of broken OOM killers out there.

 Of course, this is not a normal condition, and under careful administration
it never happens. But the point is, when you are designing a supervision
tool, you should assume that you can get a random SIGKILL (Headshot. Do
not pass Go. Do not call your cleanup routines.) at any time.

 Because if a supervision tool can't recover from an OOM event and keep
vital services running until the sysadmin finishes his coffee and can
manually repair things, then what is it good for ?

 That is why I asked my question. In other supervision schemes, tasks are
de-centralized, so if one process randomly dies, it generally does not have
much impact on the rest of the system. (If runsvdir dies, it's annoying,
but things keep working until the admin can come clean things up.)
 perpd, however, looks like a neural hub, centralizing a lot of info into
its memory. IOW, a SPOF, and you can be sure that the next broken system
tool will love to play Doom with it.

 Is your supervision chain SIGKILL-resistant ?

-- 
 Laurent


  reply	other threads:[~2011-03-14 16:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-14 10:39 Wayne Marshall
2011-03-14 13:17 ` Laurent Bercot
2011-03-14 14:02   ` Wayne Marshall
2011-03-14 14:23     ` Robin Bowes
2011-03-14 14:34       ` Wayne Marshall
2011-03-14 16:47         ` Laurent Bercot [this message]
2011-03-14 17:39           ` Wayne Marshall
2011-03-14 17:52             ` Paul Jarc
2011-03-14 18:43               ` Wayne Marshall
2011-03-14 18:34             ` Laurent Bercot
2011-03-14 15:03       ` Charlie Brady
2011-03-14 15:35         ` Wayne Marshall
2011-03-14 17:02         ` Laurent Bercot
2011-03-14 17:42           ` Charlie Brady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110314164741.GA7248@skarnet.org \
    --to=ska-supervision@skarnet.org \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).