supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* further claims
@ 2019-04-29 21:33 Jeff
  2019-04-30  8:56 ` Laurent Bercot
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff @ 2019-04-29 21:33 UTC (permalink / raw)
  To: supervision


At
http://skarnet.org/software/s6/why.html
one can find further interesting claims:

> The runit process, not the runsvdir process,
> runs as process 1. This lengthens the supervision chain.

haven't you claimed process #1 should supervise long running
child processes ? runit fulfils exactly this requirement by
supervising the supervisor.

this simplifies both (runit-)init (it has only to compare the PIDs
of terminated child processes with exactly 1 PID) and the
supervisor runsvdir (the latter can do its usual business without
the requirement to do process #1 specific work such as reacting
to signals in a special way, running the 3 different init stages etc.
one could also rightfully point out here that these are proces #1
specific tasks and not a supervisor's duties per se.).

this lengthens the supervision chain but also has the additional
advantage of a supervised supervisor. ;-)

maybe runsvdir was not made to run as process #1 and this was
just a hack its author came up with to replace (SysV) init totally.
who knows ? but it works well (except that runit-init looks at
/etc/runit/reboot etc after receiving SIGCONT which is no good
idea at all since it requires unnecessary read-write access to the
fs this files reside on. how about just reacting to signals, say
use STGTERM to poweroff, SIGHUP to reboot, SIGUSR1 to halt,
SIGUSR2 to reboot or poweroff and make signal handling scripts
like /etc/runit/ctrl-alt-del etc just send one of those signals to
process #1 when SIG(INT,WINCH) were received ?
does not require any read-write fs access and looks much
simpler to me.)

"Artistic considerations":

> runit has only one supervisor, runsv, for both a daemon and its logger.
> The pipe is maintained by runsv. If the runsv process dies, the pipe
> disappears and logs are lost. So, runit does not offer as strong a
> guarantee as daemontools.

sure, if (s6-)svscan dies one is in deep shit aswell, so what is the point
here ? runsv gets restarted by runsvdir but the pipe is gone (are pipes
really closed when the opening (parent) process exits without closing
them itself and subprocesses still use that very pipe ?)

> daemontools' svscan maintains an open pipe between a daemon and its logger,
> so even if the daemon, the logger, and both supervise processes die,
> the pipe is still the same so no logs are lost, ever, unless svscan itself dies.

but:

> perp has only one process, perpd, acting both as a "daemon and logger
> supervisor" (like runsv) and as a "service directory scanner" (like runsvdir).
> It maintains the pipes between the daemons and their respective loggers.
> If perpd dies, everything is lost.

same for (s6-)svscan here (at least for the pipes).

> however, perpd is well-written and has virtually no risk of dying.

the same holds probably for (s6-)svscan, i guess.

> Since perpd cannot be run as process 1, 
> this is a possible SPOF for a perp installation

but from a design perspective it seems as reliable as s6-svscan ?
or not since it uses a more integrated desing/approach ?
this design simplifies communication since tasks are not
implemented in other tools running as its (direct) subprocesses.

so all kinds of fifos/pipes used for IPC are not necessary anymore
except one socket per perpd process for client connections
and there is no need for further communication with subprocesses
(except via signals).



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: further claims
  2019-04-29 21:33 further claims Jeff
@ 2019-04-30  8:56 ` Laurent Bercot
  2019-05-01 23:09   ` Guillermo
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Bercot @ 2019-04-30  8:56 UTC (permalink / raw)
  To: supervision


>haven't you claimed process #1 should supervise long running
>child processes ? runit fulfils exactly this requirement by
>supervising the supervisor.

Not exactly, no.
If something kills runsvdir, then runit immediately enters
stage 3, and reboots the system. This is an acceptable response
to the scanner dying, but is not the same thing as supervising
it. If runsvdir's death is accidental, the system goes through
an unnecessary reboot.


>this lengthens the supervision chain but also has the additional
>advantage of a supervised supervisor. ;-)

No.


>maybe runsvdir was not made to run as process #1 and this was
>just a hack its author came up with to replace (SysV) init totally.

Gerrit may correct me here, but I think that was the idea, yes.
runit predates s6 and its goal was to provide a daemontools-like
supervision suite that could also be used as an init system. No
more, no less; and I think it succeeded.


>sure, if (s6-)svscan dies one is in deep shit aswell, so what is the point
>here ?

If s6-svscan dies, the pipes are still maintained in the
s6-supervise processes. You would need to kill the supervisor *and*
the scanner for the pipe to disappear, whereas with runit, the pipe
disappears and you can lose logs as soon as you kill the supervisor.
And of course, if s6-svscan runs as process 1, you cannot kill it.


>  runsv gets restarted by runsvdir but the pipe is gone (are pipes
>really closed when the opening (parent) process exits without closing
>them itself and subprocesses still use that very pipe ?)

  The problematic case is when the consumer (i.e. the logger) dies
while the producer (i.e. the service) is still outputting logs.
When that happens, you need a process to hold the reading end
of the logging pipe. If you don't have such a process, the pipe
is closed when the consumer dies, and any data that is still
in transit is lost.

  When the logging pipe is held by runsv, if runsv dies, then
this situation is possible. Of course nothing wrong happens as
long as the logger stays alive, but when the logger dies, the
service needs to die first, in order for the logging pipe to be
properly recreated without any log loss.

  When the logging pipe is held by s6-svscan and you have one
supervisor per process, then any of the supervisors or the
supervised processes may die at any time, but the logging pipe
is never broken. You'd have to go back and kill s6-svscan in
order to have a chance at ever losing logs.


 > [perpd]
>but from a design perspective it seems as reliable as s6-svscan ?
>or not since it uses a more integrated desing/approach ?

I trust Wayne to have written perpd correctly. However, from a
pure design perspective, perpd is unarguably more complex, since
it has to perform the job of the scanner + N supervisors in one
process, so it's naturally more difficult to make sure there's
no bugs in it.
The state machine in s6-supervise is complex enough. I wouldn't
want to maintain N similar constructs in one unique process. It's
doable, of course, but requires more effort to write, debug, and
maintain.


>this design simplifies communication since tasks are not
>implemented in other tools running as its (direct) subprocesses.

  Yes, that is the classic trade-off of multiprocess designs.
It's mostly a question of taste. I tend to favor multiprocess designs
because the costs of having more - and more complex - communication
is usually largely outweighed by the benefits of having significantly
less code and simpler code paths.

--
  Laurent



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: further claims
  2019-04-30  8:56 ` Laurent Bercot
@ 2019-05-01 23:09   ` Guillermo
  2019-05-02  0:30     ` Colin Booth
  2019-05-03  2:15     ` Runit Jeff
  0 siblings, 2 replies; 7+ messages in thread
From: Guillermo @ 2019-05-01 23:09 UTC (permalink / raw)
  To: supervision

El mar., 30 abr. 2019 a las 5:55, Laurent Bercot escribió:
>
> >haven't you claimed process #1 should supervise long running
> >child processes ? runit fulfils exactly this requirement by
> >supervising the supervisor.
>
> Not exactly, no.
> If something kills runsvdir, then runit immediately enters
> stage 3, and reboots the system. This is an acceptable response
> to the scanner dying, but is not the same thing as supervising
> it. If runsvdir's death is accidental, the system goes through
> an unnecessary reboot.

If the /etc/runit/2 process exits with code 111 or gets killed by a
signal, the runit program is actually supposed to respawn it,
according to its man page. I believe this counts as supervising at
least one process, so it would put runit in the "correct init" camp :)

There is code that checks the 'wstat' value returned by a
wait_nohang(&wstat) call that reaps the /etc/runit/2 process, however,
it is executed only if wait_exitcode(wstat) != 0. On my computer,
wait_exitcode() returns 0 if its argument is the wstat of a process
killed by a signal, so runit indeed spawns /etc/runit/3 instead of
respawning /etc/runit/2 when, for example, I point a gun at runsvdir
on purpose and use a kill -int command specifying its PID. Changing
the condition to wait_crashed(wstat) || (wait_exitcode(wstat) != 0)
makes things work as intended.

G.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: further claims
  2019-05-01 23:09   ` Guillermo
@ 2019-05-02  0:30     ` Colin Booth
  2019-05-03  2:44       ` ToyBox oneit Jeff
  2019-05-03  2:15     ` Runit Jeff
  1 sibling, 1 reply; 7+ messages in thread
From: Colin Booth @ 2019-05-02  0:30 UTC (permalink / raw)
  To: supervision

On Wed, May 01, 2019 at 08:09:58PM -0300, Guillermo wrote:
> El mar., 30 abr. 2019 a las 5:55, Laurent Bercot escribió:
> >
> > >haven't you claimed process #1 should supervise long running
> > >child processes ? runit fulfils exactly this requirement by
> > >supervising the supervisor.
> >
> > Not exactly, no.
> > If something kills runsvdir, then runit immediately enters
> > stage 3, and reboots the system. This is an acceptable response
> > to the scanner dying, but is not the same thing as supervising
> > it. If runsvdir's death is accidental, the system goes through
> > an unnecessary reboot.
> 
> If the /etc/runit/2 process exits with code 111 or gets killed by a
> signal, the runit program is actually supposed to respawn it,
> according to its man page. I believe this counts as supervising at
> least one process, so it would put runit in the "correct init" camp :)
> 
> There is code that checks the 'wstat' value returned by a
> wait_nohang(&wstat) call that reaps the /etc/runit/2 process, however,
> it is executed only if wait_exitcode(wstat) != 0. On my computer,
> wait_exitcode() returns 0 if its argument is the wstat of a process
> killed by a signal, so runit indeed spawns /etc/runit/3 instead of
> respawning /etc/runit/2 when, for example, I point a gun at runsvdir
> on purpose and use a kill -int command specifying its PID. Changing
> the condition to wait_crashed(wstat) || (wait_exitcode(wstat) != 0)
> makes things work as intended.
> 
> G.
Moving the goal post a few feet here but, the duties of a proper init
are to either: supervise one or more other things, or to bring down a
system if their one thing goes away. runit does both: it'll restart 2 in
some cases (correct, properly supervising one or more things), it'll
bring down the system in other cases (also correct). 

Honestly, it might be better to define what a bad init is and then say a
proper init is one that doesn't do that thing. A bad init is one that
allows a system to enter a totally vegetable state. By this
redefinition, a good init is one that doesn't allow systems to go
vegetable, either by having something they restart, or totally freaking
out and burning down the world if the one thing they started ever
vanishes. Hell, sinit could be made proper by forking a thing and then
issuing the reboot(2) syscall any time its child vanished. Annoyingly
aggressive on the restarts, but proper.

-- 
Colin Booth


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Runit
  2019-05-01 23:09   ` Guillermo
  2019-05-02  0:30     ` Colin Booth
@ 2019-05-03  2:15     ` Jeff
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff @ 2019-05-03  2:15 UTC (permalink / raw)
  To: supervision

>>  If something kills runsvdir, then runit immediately enters
>>  stage 3, and reboots the system. This is an acceptable response
>>  to the scanner dying, but is not the same thing as supervising
>>  it. If runsvdir's death is accidental, the system goes through
>>  an unnecessary reboot.
>
> If the /etc/runit/2 process exits with code 111 or gets killed by a
> signal, the runit program is actually supposed to respawn it,
> according to its man page. I believe this counts as supervising at
> least one process, so it would put runit in the "correct init" camp :)
>
> There is code that checks the 'wstat' value returned by a
> wait_nohang(&wstat) call that reaps the /etc/runit/2 process, however,
> it is executed only if wait_exitcode(wstat) != 0. On my computer,
> wait_exitcode() returns 0 if its argument is the wstat of a process
> killed by a signal, so runit indeed spawns /etc/runit/3 instead of
> respawning /etc/runit/2 when, for example, I point a gun at runsvdir
> on purpose and use a kill -int command specifying its PID. Changing
> the condition to wait_crashed(wstat) || (wait_exitcode(wstat) != 0)
> makes things work as intended.

that is again one of several runit problems. among them:

- see above
- no setsid(2) for child procs by default in "runsv"
- having only runsv managing the log pipe.
- runit-init requires rw fs access without the slightest need
  (setting the +x bit of  the /etc/runit/(stopit,reboot) files
  which could indeed reside on a tmpfs in /run and be
  symlinks have symlinks pointing to them (that is done in
  Void Linux)
- problems with log files while bringing down the system.
  i never encountered that with daemontools-encore, perp(d)
  and s6.

so it is a quite dated project that clearly shows its age.
i would recommend against using it at all (except its
"chpst" and "utmpset" utilities).




^ permalink raw reply	[flat|nested] 7+ messages in thread

* ToyBox oneit
  2019-05-02  0:30     ` Colin Booth
@ 2019-05-03  2:44       ` Jeff
  2019-05-05  2:07         ` ToyBox init Jeff
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff @ 2019-05-03  2:44 UTC (permalink / raw)
  To: supervision

> By this redefinition, a good init is one that doesn't allow systems to go
> vegetable, either by having something they restart, or totally freaking
> out and burning down the world if the one thing they started ever
> vanishes.
>
> sinit could be made proper by forking a thing and then
> issuing the reboot(2) syscall any time its child vanished.
> Annoyingly aggressive on the restarts, but proper.

maybe you should have a look at the tiny "oneit" utility that is
part of/included in ToyBox ( http://landley.net/toybox/ ):

$ toybox help oneit

usage: oneit [-p] [-c /dev/tty0] command [...]

Simple init program that runs a single supplied command line with a
controlling tty (so CTRL-C can kill it).

-c     Which console device to use (/dev/console doesn't do CTRL-C, etc)
-p     Power off instead of rebooting when command exits
-r      Restart child when it exits
-3     Write 32 bit PID of each exiting reparented process to fd 3 of child
        (Blocking writes, child must read to avoid eventual deadlock.)

Spawns a single child process (because PID 1 has signals blocked)
in its own session, reaps zombies until the child exits, then
reboots the system (or powers off with -p, or restarts the child with -r).

Responds to SIGUSR1 by halting the system, SIGUSR2 by powering off,
and SIGTERM or SIGINT reboot.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* ToyBox init
  2019-05-03  2:44       ` ToyBox oneit Jeff
@ 2019-05-05  2:07         ` Jeff
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff @ 2019-05-05  2:07 UTC (permalink / raw)
  To: supervision

> maybe you should have a look at the tiny "oneit" utility that is
> part of/included in ToyBox ( http://landley.net/toybox/ ):

ToyBox also provides its own rewrite of BusyBox init which is (almost ?)
compatible with the latter but consists of less code.

it is licensed under the very permissive ToyBox license (almost public domain)
instead of the GPL.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-05-05  2:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29 21:33 further claims Jeff
2019-04-30  8:56 ` Laurent Bercot
2019-05-01 23:09   ` Guillermo
2019-05-02  0:30     ` Colin Booth
2019-05-03  2:44       ` ToyBox oneit Jeff
2019-05-05  2:07         ` ToyBox init Jeff
2019-05-03  2:15     ` Runit Jeff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).