supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Steve Litt <slitt@troubleshooters.com>
To: supervision@list.skarnet.org
Subject: Re: Service watchdog
Date: Tue, 19 Oct 2021 13:46:03 -0400	[thread overview]
Message-ID: <20211019134603.42da874f@mydesk.domain.cxm> (raw)
In-Reply-To: <YW52nxwaMYS76C+2@ntb.petris.klfree.czf>

Petr Malat said on Tue, 19 Oct 2021 09:41:19 +0200

>Yes, in my usecase this would be used at the place where sd_notify()
>is used if the service runs under systemd. Then periodically executed
>watchdog could check the service makes progress and react if it
>doesn't.
>
>The question is how to implement the watchdog then - it could be either
>a global service or another executable in service directory, which
>would be started periodically by runsv.

LOL, I'll tell you how I did it on my reminder system, and you can
decide whether or not to do it my way...

I have a reminder system written by me in Perl early this century, when
I still used Perl. It runs 5 times a day via cron, popping a window up
on the screen telling me of my appointments. Some consider it
intrusive, I like it that way (which is why I wrote it that way).

After a few years of using my reminder system, it became apparent that
sometimes it was failing silently, and I wouldn't notice the
absence of popup windows, causing me to miss appointments and the like.

So I wrote another program (by this time I'd switched to Python), run
as a runit service:

========================================
#!/bin/sh
cd /d/at/python/reminder_check
exec chpst -u slitt:slitt /d/at/python/reminder_check/reminder_check.py
========================================

The main routine of the Python program follows:

========================================
while True:
    if tooOld(LOGFILE, TOO_OLD_HOURS):
        alarm_all()
    time.sleep(SLEEP_SECONDS)
========================================

So every SLEEP_SECONDS seconds, it checks logfile LOGFILE, which is
written by the reminder program itself, to see if it's more than
TOO_OLD_HOURS old, and if it does, it throws up a big old green and
purple window proclaiming the alarm system is broken.

In my case, SLEEP_SECONDS is 3600. Yeah, it's polling instead of
interrupt driven, but I make no apology for polling once an hour.
Matter of fact, I'd make no apologies for 10 second polling, given that
if everything's OK all it's going to do is check a file date.

It seems to me the key question is how quickly do you need to be
informed of the failure of the watched daemon. If being informed a
minute later is OK, I'd say my method is fine. If being informed a
second later is OK, I'd rewrite the time check in C and then if it
flunks, system() the "on error" program. If you need subsecond warning,
my method is probably not what you want.

By the way, when I test for a daemon functioning, I typically don't use
svstatus or that other program that just returns a 1 or 0, because I
don't care if the program is running: I want to know that it's
*functioning*, so I test the functionality of the running program. So
for the network, I'd do a quick 1 iteration ping, for PostGreSQL I
might do a simple select statement, etc.

Best of luck.

SteveT

Steve Litt 
Spring 2021 featured book: Troubleshooting Techniques of the Successful
Technologist http://www.troubleshooters.com/techniques

      parent reply	other threads:[~2021-10-19 17:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19  7:20 Petr Malat
2021-10-19  7:24 ` Ellenor Bjornsdottir
2021-10-19  7:41   ` Petr Malat
2021-10-19  9:47     ` Laurent Bercot
2021-10-21  9:20       ` Petr Malat
2021-10-19 17:46     ` Steve Litt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211019134603.42da874f@mydesk.domain.cxm \
    --to=slitt@troubleshooters.com \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).