supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* Re: s6 daemon restart feature enhancement suggestion
       [not found]   ` <JH0PR04MB74791432C577E33B1027F049D1F72@JH0PR04MB7479.apcprd04.prod.outlook.com>
@ 2024-05-26 19:53     ` Laurent Bercot
  2024-05-28  5:36       ` adam
  0 siblings, 1 reply; 2+ messages in thread
From: Laurent Bercot @ 2024-05-26 19:53 UTC (permalink / raw)
  To: supervision

>Let me say that a $daemon i.e. wpa_supplicant or iwd providing 
>$service=WiFi{wpa} has been pulled into the s6-rc compiled db and 
>started in the supervision tree.
>But the system doesn't have the hardware to support that, or some 
>important resource is unavailable.

  So, here's my question: if the system doesn't have the hardware to
support that, why is the daemon in the database in the first place?

  s6-rc, in its current incarnation, is very static when it comes to its
service database; this is by design. The point is that when you have a
compiled service database, you know what's in there, you know what it
does, and you know what services will be running when you boot your
system.
  Adding dynamism goes against that design. I understand the value of
flexibility (this is why most distributions won't use s6-rc as is: they
need more flexibility in their service manager) but there's a trade-off
with reliability, and s6-rc weighs heavily on the reliability side.

  If you are building a distribution aimed at supporting several kinds
of hardware, I suggest adding flexibility at the *source database*
level, and building the compiled database at system configuration time
(or, in extreme cases, at boot time, though I do not recommend that if
you can avoid it, since you lose the static bootability guarantee).

  If your machine can't run wpa_supplicant, then the service manager
should not attempt to run wpa_supplicant in the first place, so the
wpa_supplicant service should not appear in the top bundle.

  Lacking resources is a different issue: it's a temporary error, and
it makes sense for the service to fail (and be restarted) if it cannot
reserve the resources it needs. If you want to report permanent
failure, and stop trying to bring the service up, after a certain amount
of time, you can write a timeout-up file, or have a finish script exit
125, see below.


>A mechanism should be prepared, to let $daemon inform it's instance of 
>s6-supervise that it can't run, or can't provide $service / it's 
>services.

  If you have the information before the machine boots, you should use
the information to prune your service database, and compile a database
that you know will work with your system.

  If you don't have the information before the machine boots, then a
service failing to start is a normal temporary failure, and s6 will
attempt to restart the service until it reports permanent failure.

  You have several ways of marking a service as permanently failed:

  - (only with s6-rc) you can have a timeout-up file, see
  https://skarnet.org/software/s6-rc/s6-rc-compile.html and look for
"timeout-up"

  - (generic s6) you can have a finish script that uses data that has
been collected by s6-supervise to determine whether a permanent failure
should be reported or not. A finish script can report permanent failure
by exiting 125.
  For instance, using s6-permafailon, see
  https://skarnet.org/software/s6/s6-permafailon.html , allows you to
tell s6 that if the service exits nonzero too many times in a given
number of seconds, then it's hopeless.

  Does this help?

--
  Laurent


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: s6 daemon restart feature enhancement suggestion
  2024-05-26 19:53     ` s6 daemon restart feature enhancement suggestion Laurent Bercot
@ 2024-05-28  5:36       ` adam
  0 siblings, 0 replies; 2+ messages in thread
From: adam @ 2024-05-28  5:36 UTC (permalink / raw)
  To: Laurent Bercot, supervision

Quoting Laurent Bercot (2024-05-26 12:53:39)
> If you are building a distribution aimed at supporting several kinds
> of hardware, I suggest adding flexibility at the *source database*
> level, and building the compiled database at system configuration time

If at all possible, do this.

> (or, in extreme cases, at boot time, though I do not recommend that if
> you can avoid it, since you lose the static bootability guarantee).

I believe there is also a third approach which is sort of a middle-ground: all
oneshots and longruns are compiled statically, but the contents of bundles are
determined at boot time.  Think of it by analogy to the Linux kernel: distros
nowadays build every single module under the sun -- no matter how obscure -- but
only a few ever get loaded.

I believe that s6-rc-bundle is the tool used to edit the bundles:

  https://www.skarnet.org/software/s6-rc/s6-rc-bundle.html

I haven't actually tried this myself, but perhaps it might help.  Changing the
contents of bundles seems like a much less-violent action to take at boot time,
with fewer possible failure modes.

If you go this route, there is a fork in the road.  Assuming you're using
s6-linux-init,

- The easy route: in a perfect world you'd do this boot-time detection from
  within `rc.init`, before invoking `s6-rc start $DEFAULT_BUNDLE`.
  Unfortunately at that point you probably don't have anything mounted
  read-write (except /run), haven't started your hotplug daemon (e.g. mdevd),
  and haven't done a coldplug yet.  So the easy route might not work.

- The hard route: remember that s6-rc's locks are *not* reentrant!  You will
  have to think very carefully about what locks are held with s6-rc-bundle is
  called, and whether or not to use its `-b` flag.  Don't try to use s6-rc tools
  from within a oneshot.

  - a

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-05-28  5:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1716661664.18967.ezmlm@list.skarnet.org>
     [not found] ` <JH0PR04MB7479C0B0A8D2DCD1CE7E22ADD1F72@JH0PR04MB7479.apcprd04.prod.outlook.com>
     [not found]   ` <JH0PR04MB74791432C577E33B1027F049D1F72@JH0PR04MB7479.apcprd04.prod.outlook.com>
2024-05-26 19:53     ` s6 daemon restart feature enhancement suggestion Laurent Bercot
2024-05-28  5:36       ` adam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).