Re: Service watchdog

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

From: Petr Malat <oss@malat.biz>
To: supervision@list.skarnet.org
Subject: Re: Service watchdog
Date: Thu, 21 Oct 2021 11:20:05 +0200	[thread overview]
Message-ID: <YXEwxR39nS96Husf@ntb.petris.klfree.czf> (raw)
In-Reply-To: <em0b895a4a-cf06-4ab4-aadd-14c1201c5d08@elzian>

Hi!

> > Yes, in my usecase this would be used at the place where sd_notify()
> > is used if the service runs under systemd. Then periodically executed
> > watchdog could check the service makes progress and react if it
> > doesn't.
> 
>  If a single notification step is enough for you, i.e. the service
> goes from a "preparing" state to a "ready" state and remains ready
> until the process dies, then what you want is implemented in the s6
> process supervisor: https://skarnet.org/software/s6/notifywhenup.html
> 
>  Then you can synchronously wait for service readiness
> (s6-svwait $service) or, if you have a watchdog service, periodically
> poll for readiness (s6-svstat -r $service).
> 
>  But that's only valid if your service can only change states once
> (from "not ready" to "ready"). If you need anything more complex, s6
> won't support it intrinsically.
No, I need to monitor the service is alive - my watchdog script would
test if the age of the status message is older than a defined threshold
in which case it would kill the service (and the rest would be handled
in finish script).

>  The reason why there isn't more advanced support for this in any
> supervision suite (save systemd but even there it's pretty minimal)
> is that service states other than "not ready yet" and "ready" are
> very much service-dependent and it's impossible for a generic process
> supervisor to support enough states for every possible existing service.
> Daemons that need complex states usually come with their own
> monitoring software that handles their specific states, with integrated
> health checks etc.
> 
>  So my advice would be:
>  - if what you need is just readiness notification, switch to s6.
> It's very similar to runit and I think you'll find it has other
> benefits as well. The drawback, obviously, is that it's not in busybox
> and the required effort to switch may not be worth it.
>  - if you need anything more complex, you can stick to runit, but you
> will kinda need to write your own monitor for your daemon, because
> that's what everyone does.
> 
>  Depending on the details of the monitoring you need, the monitoring
> software can be implemented as another service (e.g. to receive
> heartbeats from your daemon), or as a polling client (e.g. to do
> periodic health checks). Both approaches are valid.
That's what I thought of as well, but having this completely out of the
runsv can lead to a possible race window when the watchdog can kill a
service, which has restarted itself. This could be avoided if the check
would be serialized with other steps (run/finish execution) within
runsv. So far the futile restart of the service doesn't seem to cause
problems to me, so I'm not much bothered with it.

>  Don't hack on runit, especially the control pipe thing. It will not
> end well.
>  (runit's control pipe feature is super dangerous, because it allows a
> service to hijack the control flow of its supervisor, which endangers
> the supervisor's safety. That's why s6 does not implement it; it
> provides similar - albeit slightly less powerful - control features
> via ways that never give the service any power over the supervisor.)
The main reason I wanted to use the service pipe for it was a possibility
to see the service status in the process tree, which would be a nice
benefit.

BR,
  Petr

next prev parent reply	other threads:[~2021-10-21  9:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19  7:20 Petr Malat
2021-10-19  7:24 ` Ellenor Bjornsdottir
2021-10-19  7:41   ` Petr Malat
2021-10-19  9:47     ` Laurent Bercot
2021-10-21  9:20       ` Petr Malat [this message]
2021-10-19 17:46     ` Steve Litt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXEwxR39nS96Husf@ntb.petris.klfree.czf \
    --to=oss@malat.biz \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).