Re: hello - hanging services

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

From: Laurent Bercot <ska-supervision@skarnet.org>
To: supervision@list.skarnet.org
Subject: Re: hello - hanging services
Date: Thu, 19 Aug 2010 07:46:35 +0200	[thread overview]
Message-ID: <20100819054635.GA14146@skarnet.org> (raw)
In-Reply-To: <20100818180205.5be254c7.jean.bruenn@ip-minds.de>

> I understand your thoughts about this, and yes i have thought about
> this, too. But let's make it clear: This can happen with runit as it
> is now, also: a weird written run-script or a broken log-script might
> compromise the existing functionality of runit (if it doesnt, adding a
> new variant like a hangcheck-script wouldn't do so neither). I mean:
> what happens currently if one of the services which you're trying to
> start hangs? I havent tried yet, so i guess only the service which
> you're trying to start would be compromised - not whole runit. And this
> wouldn't be the case with my suggestion neither.

 The problem is that your suggestion affects the reliability of the
service you want to check.
 If ./run hangs, well, the service hangs. ./run IS the service: of
course you need to write the script properly if you want the service
to function properly. There's nothing we can do about that. Same for
the logger, which is also a service (albeit a special one).
 If ./hangcheck hangs, then what should be the default policy? To be
congruent with a watchdog's purpose, you should restart the service.
But then, you might have a buggy ./hangcheck script and a perfectly
functional service, and restarting it for no good reason is a decrease
in service availability and reliability (and a waste of resources).
By adding ./hangcheck support, you are adding a dependency, and making
the service architecture more fragile. That's what Charlie meant
(I think).

 Small is beautiful for a reason: small has less hidden costs.
Everytime you want to add a feature, look for the hidden costs.
Sometimes the feature is worth paying them. Most of the time it's not.

> Probably you're right, though i don't exactly understand your
> argumentation because: Runit is starting crashed processes (this
> shouldn't be the job of an Init-System - the job of an init-system is
> starting processes, not making sure that they're up and running -
> thats the job of a software-watchdog).

 Please read the list archives; this has been discussed at length.
What it comes down to is the duties of process 1, and process 1 *has*
to restart processes (at least one), in order to keep the system in
a usable state no matter what happens, no matter what dies. A supervision
architecture such as runit is a natural consequence of properly
implementing process 1's duties.

 You are mixing two different notions of 'up and running'.
 What runit does (and what any init system *should* do) is make sure
that the *process* corresponding to a given service has been properly
forked and exec'ed. As long as the process is there, runit is happy.
It's an external process management tool.
 What a software watchdog does is make sure that said process actually
does what is expected of it, as opposed to i.e. hang or busyloop. This
is more complex, because it requires knowledge of what the service is
supposed to do, and it generally can't be done without access to the
service's source code.

> BUT: runit is doing exactly
> this. Runit is taking care that your service is up and running, by
> restarting it if its crashing - By argueing that "checking whether a
> service is responding (and thus working) is not the job of runit", you
> might also argue that "restarting a crashed job is not runit's job".

 No. runit's job is process management. runit is there to ensure that
the process tree you want is always there; and that includes restarting
crashed processes if needed. But runit's job is not making sure that
every process in the process tree is doing exactly what's it's supposed
to do. Again, there's a difference between "process A is up", which runit
can and should control, and "process A is behaving as expected", which
can only be controlled by some A-specific watchdog.

 And since this is Unix, two different things should be handled by two
different tools.

-- 
 Laurent

next prev parent reply	other threads:[~2010-08-19  5:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-17 17:08 Jean-Michel Bruenn
     [not found] ` <Pine.LNX.4.64.1008171311210.4362@e-smith.charlieb.ott.istop.com>
2010-08-17 17:24   ` Jean-Michel Bruenn
2010-08-17 17:38     ` Charlie Brady
2010-08-18 10:57       ` Laurent Bercot
2010-08-18 15:06         ` Jean-Michel Bruenn
2010-08-18 15:23           ` Charlie Brady
2010-08-18 16:02             ` Jean-Michel Bruenn
2010-08-19  5:46               ` Laurent Bercot [this message]
2010-08-20 12:24                 ` Nicolás de la Torre
2010-08-20 14:42                   ` Tobia Conforto
2010-08-20 14:59                     ` Charlie Brady
     [not found]                     ` <BB40BB3F77C4402181674BE975669A4F@HEL.local>
2010-08-20 15:11                       ` Rehan
2010-08-20 15:13                         ` Charlie Brady
2010-08-20 15:40                     ` Laurent Bercot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100819054635.GA14146@skarnet.org \
    --to=ska-supervision@skarnet.org \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).