From: "Laurent Bercot" <ska-supervision@skarnet.org>
To: supervision <supervision@list.skarnet.org>
Subject: Re: s6 daemon restart feature enhancement suggestion
Date: Sun, 26 May 2024 19:53:39 +0000 [thread overview]
Message-ID: <em73b9b71f-50d0-4f8f-a2e2-5fd316aa57fb@6cbd56c6.com> (raw)
In-Reply-To: <JH0PR04MB74791432C577E33B1027F049D1F72@JH0PR04MB7479.apcprd04.prod.outlook.com>
>Let me say that a $daemon i.e. wpa_supplicant or iwd providing
>$service=WiFi{wpa} has been pulled into the s6-rc compiled db and
>started in the supervision tree.
>But the system doesn't have the hardware to support that, or some
>important resource is unavailable.
So, here's my question: if the system doesn't have the hardware to
support that, why is the daemon in the database in the first place?
s6-rc, in its current incarnation, is very static when it comes to its
service database; this is by design. The point is that when you have a
compiled service database, you know what's in there, you know what it
does, and you know what services will be running when you boot your
system.
Adding dynamism goes against that design. I understand the value of
flexibility (this is why most distributions won't use s6-rc as is: they
need more flexibility in their service manager) but there's a trade-off
with reliability, and s6-rc weighs heavily on the reliability side.
If you are building a distribution aimed at supporting several kinds
of hardware, I suggest adding flexibility at the *source database*
level, and building the compiled database at system configuration time
(or, in extreme cases, at boot time, though I do not recommend that if
you can avoid it, since you lose the static bootability guarantee).
If your machine can't run wpa_supplicant, then the service manager
should not attempt to run wpa_supplicant in the first place, so the
wpa_supplicant service should not appear in the top bundle.
Lacking resources is a different issue: it's a temporary error, and
it makes sense for the service to fail (and be restarted) if it cannot
reserve the resources it needs. If you want to report permanent
failure, and stop trying to bring the service up, after a certain amount
of time, you can write a timeout-up file, or have a finish script exit
125, see below.
>A mechanism should be prepared, to let $daemon inform it's instance of
>s6-supervise that it can't run, or can't provide $service / it's
>services.
If you have the information before the machine boots, you should use
the information to prune your service database, and compile a database
that you know will work with your system.
If you don't have the information before the machine boots, then a
service failing to start is a normal temporary failure, and s6 will
attempt to restart the service until it reports permanent failure.
You have several ways of marking a service as permanently failed:
- (only with s6-rc) you can have a timeout-up file, see
https://skarnet.org/software/s6-rc/s6-rc-compile.html and look for
"timeout-up"
- (generic s6) you can have a finish script that uses data that has
been collected by s6-supervise to determine whether a permanent failure
should be reported or not. A finish script can report permanent failure
by exiting 125.
For instance, using s6-permafailon, see
https://skarnet.org/software/s6/s6-permafailon.html , allows you to
tell s6 that if the service exits nonzero too many times in a given
number of seconds, then it's hopeless.
Does this help?
--
Laurent
next parent reply other threads:[~2024-05-26 19:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1716661664.18967.ezmlm@list.skarnet.org>
[not found] ` <JH0PR04MB7479C0B0A8D2DCD1CE7E22ADD1F72@JH0PR04MB7479.apcprd04.prod.outlook.com>
[not found] ` <JH0PR04MB74791432C577E33B1027F049D1F72@JH0PR04MB7479.apcprd04.prod.outlook.com>
2024-05-26 19:53 ` Laurent Bercot [this message]
2024-05-28 5:36 ` adam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=em73b9b71f-50d0-4f8f-a2e2-5fd316aa57fb@6cbd56c6.com \
--to=ska-supervision@skarnet.org \
--cc=supervision@list.skarnet.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).