supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: "Nicolás de la Torre" <ndelatorre@gmail.com>
To: supervision@list.skarnet.org
Subject: Re: hello - hanging services
Date: Fri, 20 Aug 2010 09:24:01 -0300	[thread overview]
Message-ID: <AANLkTi=tT3_c1suuxYY0Ygxj__hipqsQqm5Lb4vJO8af@mail.gmail.com> (raw)
In-Reply-To: <20100819054635.GA14146@skarnet.org>

I must be on this list by mistake, please unsubscribe.

2010/8/19 Laurent Bercot <ska-supervision@skarnet.org>
>
> > I understand your thoughts about this, and yes i have thought about
> > this, too. But let's make it clear: This can happen with runit as it
> > is now, also: a weird written run-script or a broken log-script might
> > compromise the existing functionality of runit (if it doesnt, adding a
> > new variant like a hangcheck-script wouldn't do so neither). I mean:
> > what happens currently if one of the services which you're trying to
> > start hangs? I havent tried yet, so i guess only the service which
> > you're trying to start would be compromised - not whole runit. And this
> > wouldn't be the case with my suggestion neither.
>
>  The problem is that your suggestion affects the reliability of the
> service you want to check.
>  If ./run hangs, well, the service hangs. ./run IS the service: of
> course you need to write the script properly if you want the service
> to function properly. There's nothing we can do about that. Same for
> the logger, which is also a service (albeit a special one).
>  If ./hangcheck hangs, then what should be the default policy? To be
> congruent with a watchdog's purpose, you should restart the service.
> But then, you might have a buggy ./hangcheck script and a perfectly
> functional service, and restarting it for no good reason is a decrease
> in service availability and reliability (and a waste of resources).
> By adding ./hangcheck support, you are adding a dependency, and making
> the service architecture more fragile. That's what Charlie meant
> (I think).
>
>  Small is beautiful for a reason: small has less hidden costs.
> Everytime you want to add a feature, look for the hidden costs.
> Sometimes the feature is worth paying them. Most of the time it's not.
>
>
> > Probably you're right, though i don't exactly understand your
> > argumentation because: Runit is starting crashed processes (this
> > shouldn't be the job of an Init-System - the job of an init-system is
> > starting processes, not making sure that they're up and running -
> > thats the job of a software-watchdog).
>
>  Please read the list archives; this has been discussed at length.
> What it comes down to is the duties of process 1, and process 1 *has*
> to restart processes (at least one), in order to keep the system in
> a usable state no matter what happens, no matter what dies. A supervision
> architecture such as runit is a natural consequence of properly
> implementing process 1's duties.
>
>  You are mixing two different notions of 'up and running'.
>  What runit does (and what any init system *should* do) is make sure
> that the *process* corresponding to a given service has been properly
> forked and exec'ed. As long as the process is there, runit is happy.
> It's an external process management tool.
>  What a software watchdog does is make sure that said process actually
> does what is expected of it, as opposed to i.e. hang or busyloop. This
> is more complex, because it requires knowledge of what the service is
> supposed to do, and it generally can't be done without access to the
> service's source code.
>
>
> > BUT: runit is doing exactly
> > this. Runit is taking care that your service is up and running, by
> > restarting it if its crashing - By argueing that "checking whether a
> > service is responding (and thus working) is not the job of runit", you
> > might also argue that "restarting a crashed job is not runit's job".
>
>  No. runit's job is process management. runit is there to ensure that
> the process tree you want is always there; and that includes restarting
> crashed processes if needed. But runit's job is not making sure that
> every process in the process tree is doing exactly what's it's supposed
> to do. Again, there's a difference between "process A is up", which runit
> can and should control, and "process A is behaving as expected", which
> can only be controlled by some A-specific watchdog.
>
>  And since this is Unix, two different things should be handled by two
> different tools.
>
> --
>  Laurent


  reply	other threads:[~2010-08-20 12:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-17 17:08 Jean-Michel Bruenn
     [not found] ` <Pine.LNX.4.64.1008171311210.4362@e-smith.charlieb.ott.istop.com>
2010-08-17 17:24   ` Jean-Michel Bruenn
2010-08-17 17:38     ` Charlie Brady
2010-08-18 10:57       ` Laurent Bercot
2010-08-18 15:06         ` Jean-Michel Bruenn
2010-08-18 15:23           ` Charlie Brady
2010-08-18 16:02             ` Jean-Michel Bruenn
2010-08-19  5:46               ` Laurent Bercot
2010-08-20 12:24                 ` Nicolás de la Torre [this message]
2010-08-20 14:42                   ` Tobia Conforto
2010-08-20 14:59                     ` Charlie Brady
     [not found]                     ` <BB40BB3F77C4402181674BE975669A4F@HEL.local>
2010-08-20 15:11                       ` Rehan
2010-08-20 15:13                         ` Charlie Brady
2010-08-20 15:40                     ` Laurent Bercot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=tT3_c1suuxYY0Ygxj__hipqsQqm5Lb4vJO8af@mail.gmail.com' \
    --to=ndelatorre@gmail.com \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).