Re: Some suggestions on old-fashioned usage with s6 2.10.x

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found] <YBKNJEuGeYag91Q1@caspervector>
@ 2021-01-28 17:21 ` Laurent Bercot
  2021-01-28 19:08   ` Roy Lanek
                     ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Laurent Bercot @ 2021-01-28 17:21 UTC (permalink / raw)
  To: supervision

>I did not actively follow the recent evolution of s6, and have just been
>bitten badly by s6 2.10.x on my Alpine servers (where slew [1] is used
>of course) when it comes along with other updates.

  Sorry. This bears repeating: major version upgrades may break things.

  Compatibility is a good thing, that's why I try to keep major version
changes few and far between; but the other side of the coin is that
when I'm doing one, I want to make use of it and cram all the
incompatible changes that may be needed in the foreseeable future.
  So, you have to pay attention less often, but when it happens, you do
have to pay attention. Previous major version changes may have gone
smoothly - I try to keep it as smooth as possible when there's no need
to break UX - but it's no guarantee that it will always be smooth
sailing. This time, there were very visible user changes; sorry for
the inconvenience, but I reserve the right to do this, and I try to
document the breaking changes in the release notes.

  It is, admittedly, a drawback of distributions that they make major
version upgrades very silent - so, if you have local software that
relies on an old API, and the distro updates it under your feet,
you're caught unaware. I don't have a satisfying solution to this;
maybe I should have added a post-upgrade file printing red blinking
bold text, but that doesn't address automated or afk updates.

>better if we kept the option supported for a transition period, and that
>only removed it from the manual pages while urging users to get rid of
>it.  After all, in this case, silently ignoring `-s' is behaviourly
>similar to (if not perfectly compatible with) old `s6-svscan'.

  It's always a delicate balance, because "better" is not 
one-dimensional.
It would be better UX, yes, definitely. But also legacy code to maintain
until the next major update (which can take a while), and I tend to
assign a *very* high cost to legacy code in s6-svscan and s6-supervise,
for obvious reasons. And in my experience, few people (and you,
Casper, certainly belong to them!) actually bother changing their
scripts as long as they keep working - most only spring into action when
something breaks. A compromise I've found relatively efficient was to
add nagging warnings on deprecated option use, but 1. that's even more
code that will be removed, and 2. I hate nagware, with a passion, in
all its forms.
  There is no really good solution, and I prefer a short, sharp pain
(when things break) followed by relief (when they're fixed) to a long
dull ache (maintaining compat code). Especially when I'm not the one
experiencing the sharp pain ;)

>Second, `s6-svscan' now waits for its `s6-supervise' children to exit
>before exec()ing `.s6-svscan/finish'

  You seem to have found the proper way of managing this with SIG files,
but just in case: "s6-svscanctl -tb" will net you the old behaviour.

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 17:21 ` Some suggestions on old-fashioned usage with s6 2.10.x Laurent Bercot
@ 2021-01-28 19:08   ` Roy Lanek
  2021-01-28 19:55   ` Casper Ti. Vector
       [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
  2 siblings, 0 replies; 24+ messages in thread
From: Roy Lanek @ 2021-01-28 19:08 UTC (permalink / raw)
  To: supervision

> major version upgrades may break things.
As plain as day.

> I prefer a short, sharp pain (when things break) followed by
> relief (when they're fixed) to a long dull ache (maintaining
> compat code).

I could not agree more, it would also bring zero anyhow except
than extra, likely convoluted, code added which only increases
the risk of introducing new errors even after being removed.

/Roy Lanek (Yogyakarta)
-- 
555  5 l 4 c K W 4 r 3  L1NuX  555   air tenang menghanyutkan
555  5 l 4 c K W 4 r 3  L1NuX  555   still water runs deep

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 17:21 ` Some suggestions on old-fashioned usage with s6 2.10.x Laurent Bercot
  2021-01-28 19:08   ` Roy Lanek
@ 2021-01-28 19:55   ` Casper Ti. Vector
       [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
  2 siblings, 0 replies; 24+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 19:55 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 05:21:59PM +0000, Laurent Bercot wrote:
>  There is no really good solution, and I prefer a short, sharp pain
> (when things break) followed by relief (when they're fixed) to a long
> dull ache (maintaining compat code).

I see.  I personally prefer to retain compat code if said code is so
small that it can hardly be incorrect, especially when the breakage
(like kernel panics) can be very severe.  Arguably a major stylistic
difference.

>  You seem to have found the proper way of managing this with SIG files,
> but just in case: "s6-svscanctl -tb" will net you the old behaviour.

Now I see; thanks.  I also realised that the revised `s6-svc -X'
proposal would result in wrong behaviour when there exists a `./finish'
script, because the supervisor would exit early (and prematurely).

BTW, <https://git.skarnet.org/> seems to be returning empty HTTP replies
now; both <https://skarnet.org/> and <http://git.skarnet.org/> work as
expected though.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <YBMWuUCUTVjUNinQ@caspervector>]

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
@ 2021-01-29  0:07     ` Laurent Bercot
  2021-01-29  2:44       ` Casper Ti. Vector
       [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
  0 siblings, 2 replies; 24+ messages in thread
From: Laurent Bercot @ 2021-01-29  0:07 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>BTW, <https://git.skarnet.org/> seems to be returning empty HTTP replies
>now; both <https://skarnet.org/> and <http://git.skarnet.org/> work as
>expected though.

  That is a side effect of a recent s6-networking addition, where
s6-tlsd passes the SNI server name to the application via an
environment variable. Which allows me to serve virtual hosts even with
a HTTP/1.0 server, but only under TLS. Fun experiment. :)

  I may change it back, but I don't think the current state is broken,
because you're not supposed to access git.skarnet.org via HTTP(S)! :P

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  0:07     ` Laurent Bercot
@ 2021-01-29  2:44       ` Casper Ti. Vector
       [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
  1 sibling, 0 replies; 24+ messages in thread
From: Casper Ti. Vector @ 2021-01-29  2:44 UTC (permalink / raw)
  To: supervision

On Fri, Jan 29, 2021 at 12:07:11AM +0000, Laurent Bercot wrote:
>  I may change it back, but I don't think the current state is broken,
> because you're not supposed to access git.skarnet.org via HTTP(S)! :P

Actually I do visit the CGit web interface fairly often, using it as
a poor man's GitHub workalike :)  Perhaps I need to batch change all
<https://git.skarnet.org/> references in the UP2020 document to
<https://skarnet.org/>...

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <YBN2p2UkIiP8lMQy@caspervector>]

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
@ 2021-01-29  9:36         ` Laurent Bercot
  0 siblings, 0 replies; 24+ messages in thread
From: Laurent Bercot @ 2021-01-29  9:36 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>Actually I do visit the CGit web interface fairly often

  Oh, my bad, the links in the skaware documents actually point to
https://git.skarnet.org/something. Fair enough then, I have made
git.skarnet.org an explicit alias to skarnet.org.

>  Perhaps I need to batch change all
><https://git.skarnet.org/> references in the UP2020 document to
><https://skarnet.org/>...

  No need - I'll own that one, and keep the alias explicitly working.
It's not like subdomains are a scarce resource.

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
@ 2021-02-15 14:58 Laurent Bercot
  2021-02-15 14:59 ` Laurent Bercot
  0 siblings, 1 reply; 24+ messages in thread
From: Laurent Bercot @ 2021-02-15 14:58 UTC (permalink / raw)
  To: supervision

>I do not really understand their excuse here.  CLI incompatibility is
>trivially solvable by creating links (or so) for `halt' / `poweroff' /
>`reboot', and even the `shutdown' command can be a wrapper for an `atd'
>based mechanism.

  The options! The options need to be all compatible. :) And for
"shutdown", they would never implement a wrapper themselves, I would
have to do it for them - which is exactly what I did, although it's
a C program that actually implements shutdown, not a wrapper around an
atd program I can't assume will be present on the system.

  I'm not defending distros here, but it *is* true that a drop-in
replacement, in general, is a lot easier to deal with than a drop-in-
most-of-the-time-maybe-but-not-with-that-option replacement. Anyone
who has tried to replace GNU coreutils with busybox can relate.

>   In case they complain about the implementation of the
>CLI, the actual interface to `shutdownd' is not that similar to the
>`telinit' interface (at least to the one I think it is) either.

  Which is why s6-l-i also comes with a runleveld service, for people
who need the telinit interface. shutdownd is only for the actual
stages 3 and 4, not service management (which telinit is a now obsolete
forerunner of).

>If I understand it correctly, letting `s6-svscan' exec() stage 3 also
>achieves immunity to `kill -KILL -1'.  I also find this "old-fashioned"
>approach conceptually and implementationally simpler than an army of
>`s6-supervise' restarting only to be killed again

  What army? By the time the final kill happens, the service manager
has brought everything down, and shutdownd has cleaned up the scandir,
only leaving it with what *should* be restarted. You seem to think
I haven't given these basic things the two minutes of attention they
deserve.

  Conceptually, the "old-fashioned" approach may be simpler, yes.
Implementationally, I disagree that it is, and I'll give you a very
simple example to illustrate it, but it's not the only thing that
implementations must pay attention to, there are a few other quirks
that I've stumbled upon and that disappear when s6-svscan remains
pid 1 until the very end.

  You're going to kill every process. The zombies need to be reapt,
else you won't be able to unmount the filesystems. So your pid 1
needs to be able to wait for children it doesn't know it has
(foreground does not) and guarantee that it doesn't try unmounting
the filesystems before having reapt everything (a shell does not give
ordering guarantees when it gets a SIGCHLD, even though it works in
practice). So for this specific use I had to add a special case to
execline's wait command, "wait { }", that waits on *everything*, and
also make sure that wait doesn't die because it's going to run as pid 1,
even very briefly.
  And after that, you need to make sure to unmount filesystems
immediately, because if you spawn other processes, you would first have
to wait on them as well.

  For every process that may run as pid 1, you need extra special care.
Using an interpreter program as pid 1 means your interpreter needs to
have been designed for it. Using execline means every execline binary
that may run as pid 1 needs to be vetted for it. If your shutdown
sequence is e.g. written in Lisp, and your Lisp interpreter handles
pid 1 duties correctly, okay, that's fair, but that's *two* programs
that need to do it, when one would be enough.
  s6-svscan has already been designed for that and provides all the
guarantees you need. When s6-svscan is running as pid 1, it takes away
a lot of mental burden off the shutdown sequence.

>  and a `shutdownd'
>restarting to execute the halting procedure (see some kind of "state"
>here?  Functional programmers do not hate it for nothing).

  Yes, there is one bit of state involved. I think our feeble human minds,
and a fortiori computers, can handle one bit of state.

>   I know this
>seems less recoverable than the `shutdownd' approach, but does that
>count as a reason strong enough to warrant the latter approach, if the
>halting procedure has already been distilled to its bare essentials
>and is virtually immune to all non-fatal problems (that is, excluding
>something as severe as the absence of a `reboot -f' implementation)?

  My point is that making the halting procedure virtually immune to all
non-fatal problems is *more difficult* when you tear down the
supervision tree early. I am more confident in the shutdownd approach,
because it is less fragile, more forgiving. If there's a bug in it, it
will be easy to fix.

  I understand that the barebones approach is intellectually more
satisfying - it's more minimalistic, more symmetrical, etc. But shutting
down a machine is *not* symmetrical to booting it. When you boot, you
start with nothing and need a precise sequence of instructions in order
to build up to a functional system. When you shutdown, you have a fully
functional system already, that has proven to be working, and you just
need to clean up and make sure you don't stop with an incoherent state;
you don't need to deconstruct the working system you have in order to
poweroff with the minimal amount of stuff! As long as you can cleanly
unmount the filesystems, nobody cares what your process tree looks like
when the machine is going to be *down*.

  In this instance, the existence of a reliable pid 1 with well-known
behaviour is a strong guarantee that makes writing a shutdown sequence
easy enough. Voluntarily getting rid of that guarantee and making your
system more fragile because technically supervision is not *needed*
anymore may make sense from an academic perspective, and may be
aesthetically more pleasing, but from an engineering standpoint, it is
not a good idea.

>What I intend to express is that unconditionally correlating "a bunch
>of [...] scripts" to "a 'screwdriver and duct tape' feel" is a typical
>systemd fallacy.  You seemed to be confusing "scripts containing lots of
>boilerplate" with "scripts that are minimised and clear".

  The "screwdriver and duct tape" feel does not come from the fact that
those are scripts; it comes from the fact that the scripts run in a less
forgiving environment where they have to provide the necessary guarantees
themselves, as opposed to keeping using the framework that has been
running for the whole lifetime of the system and that is still valid and
helpful, even though for once you have to interact with it and tell it
to stop supervising some services because we're shutting down - which is
the exact kind of situation the supervision API was made for.

  The distinction is similar to doing things in kernel space vs. in user
space. If I have a task to do and have a kernel running, I prefer to do
the task in user space - it's more comfortable and less error-prone, and
if someone wishes to do it in kernel space, my reaction will be "why?
this is more hackish, they're probably trying to flex their kernel
programmer muscles, good engineering says this belongs in user space".
Running naked scripts as pid 1 when you don't have to kinda gives me
the same feeling.

>According to Guillermo's observation about the behavioural similarity
>between slew's `rc.boot'/`rc.halt' and the current mechanism with
>s6-linux-init, if I understand the big picture correctly enough, the
>fundamental difference between the approaches might be the difference in
>languages (to avoid further digression, here I expressly avoid talking
>about Lisp ;) and the attendant difference in dependencies.  Speaking of
>the latter, I do not find declaring dependence on things like `rc' and
>BusyBox really a problem to any packager of systemd.  Speaking of the
>former, the "old-fashioned" approach is obviously more flexible; I have
>also said that it is probably shorter and perhaps clearer.

  The fundamental difference is that the current s6-linux-init hardcodes
a lot of things in stage 1, purposefully. Yes, it is less flexible -
though you *still* have a stage 1 hook if you really need it - but the
whole point is to make stage 1 entirely turnkey and foolproof, and only
hand off to the user when the supervision framework is in place and
they don't have to worry about basic things like not being able to log
into the system. Same reason why I prefer the shutdownd approach:
minimize and automate all the parts where the supervision tree is not
operational, so that users can always assume that nothing they do is
going to brick the system.

  It bears repeating that the main criticism I've received for the s6
ecosystem is, overwhelmingly, the *abundance* of moving parts, and the
difficulty of grasping the big picture. The current s6-linux-init helps
with this, by hiding a lot of fragile moving parts and making it
*easier* to switch to s6 as an init system without having to fully
understand the intricate details of stage 1.
  Of course, it's not necessarily perceived as a benefit by tinkerers
like you, who do not mind, or even enjoy, the extra DIY feel. I'm
sorry - but if you need that kind of flexibility in stage 1, you are
perfectly capable of building your own stage 1 without s6-linux-init.

  I also disagree that the script approach is shorter and/or clearer.
It may be clearer to people who read a script better than a doc page
(or C code), but I don't think it should matter as long as the doc is
accurate; if it's not, that's what should be fixed. And the source code
may be shorter with a scripted stage 1, for sure, but the code paths
taken by the CPU are way shorter with the C version, and make fewer
assumptions. I'm confident that the current s6-linux-init breaks in
significantly fewer situations than its previous incarnation.

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-02-15 14:58 Laurent Bercot
@ 2021-02-15 14:59 ` Laurent Bercot
  0 siblings, 0 replies; 24+ messages in thread
From: Laurent Bercot @ 2021-02-15 14:59 UTC (permalink / raw)
  To: supervision

(Apologies for the broken threading, I originally sent my answer with
the incorrect From: and it was rightfully rejected.)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Some suggestions on old-fashioned usage with s6 2.10.x
@ 2021-01-28 10:08 Casper Ti. Vector
  2021-01-28 11:09 ` Casper Ti. Vector
  2021-01-29  1:41 ` Guillermo
  0 siblings, 2 replies; 24+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 10:08 UTC (permalink / raw)
  To: supervision

I did not actively follow the recent evolution of s6, and have just been
bitten badly by s6 2.10.x on my Alpine servers (where slew [1] is used
of course) when it comes along with other updates.

[1] <https://gitea.com/CasperVector/slew>

First, kernel panic on booting.  With some tentative echo(1) invocations
(with I/O redirections to /dev/console when necessary) and messing with
console resolution (so I could see the outputs before the panic), I
found the problem occurred with `s6-svscan' exiting because of the
legacy `-s' option in [2].  The fix itself is trivial, but it would be
better if we kept the option supported for a transition period, and that
only removed it from the manual pages while urging users to get rid of
it.  After all, in this case, silently ignoring `-s' is behaviourly
similar to (if not perfectly compatible with) old `s6-svscan'.

[2] <https://gitea.com/CasperVector/slew/src/commit/
             fe32c2f1e3bf5cf700ff99d13eb13720353823bb/init/rc.boot>

Second, `s6-svscan' now waits for its `s6-supervise' children to exit
before exec()ing `.s6-svscan/finish', so it hangs forever (save for
magic SysRq) due to the catch-all logger on halting.  I do know that the
recommended way to shut down is to use `s6-linux-init-shutdown', but
it will be nice if the old-fashioned way (with stage 1 and stage 3 as
static scripts) is supported as well after minimal modifications to both
s6 and (for instance) slew.  I also understand that `s6-svc -X' has been
removed, and that the invocation in [3] would no longer work anyway
because [3] is exec()ed by `s6-svscan'.  However, I think the following
way is practical yet minimal: introduce an option (perhaps still `-X')
of `s6-svc', but that tells `s6-supervise' to exit normally *upon
receiving SIGTERM or SIGHUP* (this is where the behaviour differs from
the old `s6-svc -X') without waiting for the children to exit; then move
the `s6-svc -X' invocation from `rc.halt' into `rc.fin' (where `s6-rc -d
change all' is also spawn).

[3] <https://gitea.com/CasperVector/slew/src/commit/
             fe32c2f1e3bf5cf700ff99d13eb13720353823bb/init/rc.halt>

Any suggestions?

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 10:08 Casper Ti. Vector
@ 2021-01-28 11:09 ` Casper Ti. Vector
  2021-01-28 14:05   ` Casper Ti. Vector
  2021-01-29  1:41 ` Guillermo
  1 sibling, 1 reply; 24+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 11:09 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 06:08:36PM +0800, Casper Ti. Vector wrote:
> then move the `s6-svc -X' invocation from `rc.halt' into `rc.fin'

The `s6-svc -a' invocation in `rc.halt' needs to be moved accordingly.
Moreover, `.s6-svscan/finish' (linked to `rc.halt') will still need its
$1 set to `reboot', `halt' or `poweroff' by `s6-svscan' on exec().

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 11:09 ` Casper Ti. Vector
@ 2021-01-28 14:05   ` Casper Ti. Vector
  0 siblings, 0 replies; 24+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 14:05 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 07:09:08PM +0800, Casper Ti. Vector wrote:
> Moreover, `.s6-svscan/finish' (linked to `rc.halt') will still need its
> $1 set to `reboot', `halt' or `poweroff' by `s6-svscan' on exec().

I did not realise the great simplification to the command line options
of `s6-svscanctl' would not have been possible if s6-svscan(ctl) need
to, for example, know about halt, poweroff and reboot.  Here I retract
the quoted statement; instead, I will rework the mechanism around
`.s6-svscan/SIG*' in slew, and yet attempt to make the behaviour
mostly backwards compatible.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 10:08 Casper Ti. Vector
  2021-01-28 11:09 ` Casper Ti. Vector
@ 2021-01-29  1:41 ` Guillermo
  2021-01-29  3:06   ` Casper Ti. Vector
       [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
  1 sibling, 2 replies; 24+ messages in thread
From: Guillermo @ 2021-01-29  1:41 UTC (permalink / raw)
  To: Supervision

El jue, 28 ene 2021 a las 7:08, Casper Ti. Vector escribió:
>
> I did not actively follow the recent evolution of s6, and have just been
> bitten badly by s6 2.10.x on my Alpine servers (where slew [1] is used
> of course) when it comes along with other updates.
>
> [1] <https://gitea.com/CasperVector/slew>
> [...]
> it will be nice if the old-fashioned way (with stage 1 and stage 3 as
> static scripts) is supported as well [...]

Out of curiosity, do you have a reason for wanting to keep the
"old-fashioned way"? Is it a goal of your project to depend on s6 and
s6-rc, but not current s6-linux-init? It seems to me that doing so
would be easier. It even looks like you could use the current
/etc/slew/init/rc.{init,fin} scripts (perhaps with minor adjustments)
as s6-linux-init's rc.init and rc.shutdown for slew, respectively.

G.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  1:41 ` Guillermo
@ 2021-01-29  3:06   ` Casper Ti. Vector
  2021-01-29 17:27     ` Guillermo
       [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
  1 sibling, 1 reply; 24+ messages in thread
From: Casper Ti. Vector @ 2021-01-29  3:06 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 10:41:24PM -0300, Guillermo wrote:
> Out of curiosity, do you have a reason for wanting to keep the
> "old-fashioned way"? Is it a goal of your project to depend on s6 and
> s6-rc, but not current s6-linux-init? It seems to me that doing so
> would be easier. It even looks like you could use the current
> /etc/slew/init/rc.{init,fin} scripts (perhaps with minor adjustments)
> as s6-linux-init's rc.init and rc.shutdown for slew, respectively.

Not using s6-linux-init has never been an explicit goal, but using
static scripts was a natural choice when s6-linux-init only provided
`s6-linux-init-maker', which produced scripts that were not that
flexible.

Currently I do not understand the `s6-linux-init-shutdown(d)' way
well, so the old-fashioned way is retained at least for now, given its
simplicity in implementation and seemingly better flexibility.  Frankly
it is my intuition that the new way costs more than the old way, but
does not provide that much in return.  (Feel free to prove me wrong.)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  3:06   ` Casper Ti. Vector
@ 2021-01-29 17:27     ` Guillermo
  2021-01-29 17:39       ` Guillermo
  0 siblings, 1 reply; 24+ messages in thread
From: Guillermo @ 2021-01-29 17:27 UTC (permalink / raw)
  To: Supervision

El vie, 29 ene 2021 a las 0:07, Casper Ti. Vector escribió:
>
> Not using s6-linux-init has never been an explicit goal, [...]
>
> Currently I do not understand the `s6-linux-init-shutdown(d)' way
> well, so the old-fashioned way is retained at least for now, [...]

Forgive me if I'm misunderstanding aspects of the architecture, but
jugdging from a quick look at the Git repository, I *think* that you
could depend on s6-linux-init (the package):

* You could replace /etc/slew'/init/rc.boot with s6-linux-init (the
program), or a wrapper script around it. s6-linux-init does more or
less the same as the current script, except running load_clock.rc
(which you could do in /etc/slew'/init/rc.init).
* You could move the current /etc/slew/run to
/etc/s6-linux-init/current/run-image when installing slew. I'm not
sure what skalibs' hiercopy() would do if the latter is just a symlink
to the former.
* You could replace /etc/slew'/init/rc.halt with a service directory
for s6-linux-init-shutdownd in /etc/slew'/run/service, just like you
do now for the catch-all logger.
s6-linux-init-shutdownd does more or less the same as the current
script, except running save_log.rc, save_clock.rc and calling swapoff.
You can move that to /etc/slew'/init/rc.fin, or
/etc/s6-linux-init/current/rc.shutdown.final if you must do some of it
after killing all processes and unmounting filesystems.
* You could symlink /etc/s6-linux-init/current/rc.{init,shutdown} to
/etc/slew'/init/rc.{init,fin}, perhaps with some minimal
modifications.
* You could replace calls to s6-svscanctl in
/etc/slew'/run/service/.s6-svscan/SIG* with calls to s6-linux-init-hpr
if you want to be able to do e.g. 'busybox poweroff''.
* You could replace /etc/slew'/run/service/.s6-svscan/finish with
something simpler, instead of being a symlink to
/etc/slew/init/rc.halt.

G.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29 17:27     ` Guillermo
@ 2021-01-29 17:39       ` Guillermo
  0 siblings, 0 replies; 24+ messages in thread
From: Guillermo @ 2021-01-29 17:39 UTC (permalink / raw)
  To: Supervision

El vie, 29 ene 2021 a las 14:27, Guillermo escribió:
> [...]

Huh. Lots of apostrophes that shouldn't be there, and that I just
couldn't see without a fixed width font...

G.

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <YBN7zfp/MmbcHOCF@caspervector>]

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
@ 2021-01-29  9:57     ` Laurent Bercot
  2021-01-29 14:33       ` Casper Ti. Vector
       [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
  0 siblings, 2 replies; 24+ messages in thread
From: Laurent Bercot @ 2021-01-29  9:57 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>Currently I do not understand the `s6-linux-init-shutdown(d)' way
>well, so the old-fashioned way is retained at least for now, given its
>simplicity in implementation and seemingly better flexibility.  Frankly
>it is my intuition that the new way costs more than the old way, but
>does not provide that much in return.  (Feel free to prove me wrong.)

  It may cost more *to you*, but there is real and significant value
in following existing interfaces that people are familiar with. Being
able to just use "reboot" instead of the, uh, slightly less intuitive
"s6-svscanctl -6 /run/service" to reboot your machine, is one fewer
obstacle on the way to mainstream s6 adoption.

  Additionally, and maybe more to your liking, there are also technical
benefits to never killing s6-svscan. Being able to assume that a
supervision tree will be operational at all times, including during
shutdown (and even in stage 4!), is really comfortable, it cuts down
on a lot of specialcasing, it makes shutdown procedures recoverable,
integration into various configurations easier (I'm thinking
containers with or without a catch-all logger, for instance), and
all-in-all has just less of a "screwdriver and duct tape" feel than
a bunch of execline (or rc ;)) scripts.

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  9:57     ` Laurent Bercot
@ 2021-01-29 14:33       ` Casper Ti. Vector
       [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
  1 sibling, 0 replies; 24+ messages in thread
From: Casper Ti. Vector @ 2021-01-29 14:33 UTC (permalink / raw)
  To: supervision

On Fri, Jan 29, 2021 at 09:57:43AM +0000, Laurent Bercot wrote:
>  It may cost more *to you*, but there is real and significant value
> in following existing interfaces that people are familiar with. Being
> able to just use "reboot" instead of the, uh, slightly less intuitive
> "s6-svscanctl -6 /run/service" to reboot your machine, is one fewer
> obstacle on the way to mainstream s6 adoption.

But even `s6-reboot' from older s6-linux-init, or `busybox reboot'
with slew can already do that...

>  Additionally, and maybe more to your liking, there are also technical
> benefits to never killing s6-svscan. Being able to assume that a
> supervision tree will be operational at all times, including during
> shutdown (and even in stage 4!), is really comfortable, it cuts down
> on a lot of specialcasing, it makes shutdown procedures recoverable,
> integration into various configurations easier (I'm thinking
> containers with or without a catch-all logger, for instance), and

There is some non-trivial trade-off: in short, the existence of the
supervision tree after stage 2 is by itself a kind of "special case"
(eg. search for "careful handling" in [1]).  I am also thinking about
an application scenario, where a supervision tree with a new s6 version
replaces the active tree with an old version.  This is somewhat silly:
it can be a little useful in case of major version bump, but is probably
better solved by complete reboot to completely get rid of all old things
(s6 or not, updated together) in the memory.

[1] <https://skarnet.org/software/s6/s6-svscan-1.html>.

> all-in-all has just less of a "screwdriver and duct tape" feel than
> a bunch of execline (or rc ;)) scripts.

I am very sorry, but I do feel a strong smell of systemd mindset here :(

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <YBQcwHN1L/N2dedx@caspervector>]

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
@ 2021-01-29 15:48         ` Laurent Bercot
  2021-01-31  7:49           ` stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x] s.karrmann
                             ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Laurent Bercot @ 2021-01-29 15:48 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>But even `s6-reboot' from older s6-linux-init, or `busybox reboot'
>with slew can already do that...

  Yes. And as your sharp mind undoubtedly noticed, those commands are
not the same as "reboot".

  Which means burden on users.

  Yes, I also thought it was a small burden at first, but it's not.
It means that all sysvinit-compatible automation does not work, so
there is some porting work to do. And the gap between "a little work"
and "zero work" is HUGE. It's much bigger than the gap between
"a little work" and "a lot of work".

  Bear in mind that my eventual goal for s6 is distro adoption. And
distro maintainers will find any and every excuse to reject it.
Having a "shutdown" command that works exactly like sysvinit's
shutdown is essential, because it deals with a major objection, which
is incompatibility and user-unfriendliness.

>There is some non-trivial trade-off: in short, the existence of the
>supervision tree after stage 2 is by itself a kind of "special case"
>(eg. search for "careful handling" in [1]).

  I feel like you misinterpreted my meaning.
  The *absence* of a supervision tree after stage 2 is precisely what
requires careful handling, and runit only works because Linux has
that peculiarity that kill -9 -1 does not kill the emitter!
  Having a supervision tree in stage 3 actually *helps* with the
late shutdown procedure: shutdownd dies right after the kill (which
would make it usable even on a system without the Linux specialcase)
and is restarted by the supervisor for stage 4.

>   I am also thinking about
>an application scenario, where a supervision tree with a new s6 version
>replaces the active tree with an old version.  This is somewhat silly:
>it can be a little useful in case of major version bump, but is probably
>better solved by complete reboot to completely get rid of all old things
>(s6 or not, updated together) in the memory.

  Yes, upgrading your init without rebooting is generally not worth
it. Note that s6-svscan could still be configured to do that with
clever use of SIG scripts; but restarting the s6-supervise processes
is a pain to do without restarting your whole supervision tree, so
it's probably better to just reboot.
  This is the case with every single init out there, so you can't paint
that as a drawback of s6. You can wish it were easier, and I agree
that it would be nice, but the necessary trade-offs to make rebootless
init upgrades viable are very much not worth it.

>>  all-in-all has just less of a "screwdriver and duct tape" feel than
>>  a bunch of execline (or rc ;)) scripts.
>I am very sorry, but I do feel a strong smell of systemd mindset here :(

  A systemd mindset in an attempt to be a drop-in replacement for
sysvinit. Yeah, right.

  More seriously, you're being unfair, because you're not locked in
at all. You can use the new s6-linux-init and *still* do everything
you were doing before:
  - you can manually edit your run-image
  - you can remove the runleveld service (which is only used for
telinit emulation) and even the shutdownd service
  - you can write SIG scripts to do shutdowns the preferred way
  - I absolutely recommend against doing this, but you *still* have
a place in stage 1 where you can fiddle with things: in the
init script before the call to the s6-linux-init binary.

  So basically, all you're complaining about is that s6-linux-init-maker
is not generating your preferred run-image layout out-of-the-box
anymore. Well, you're an advanced user, you know what you are doing;
the knobs and levers are *still all there*. The only binary that
kinda hardcodes things is s6-linux-init itself, and if you give it a
try, I'm pretty sure you'll like it, because there was never any reason
to modify the core of stage 1 in the first place and what it does is
what any kind of stage 1 needs to do, no matter what language it's
written in.
  And if you don't like it, you're still free to ditch the s6-linux-init
package entirely and keep using your own stage 1.

  Besides, when systemd advocates paint sysv-rc shell scripts as
"duct tape", they're *right*. sysv-rc (and OpenRC) scripts are loaded
with boilerplate that only exists to compensate for the lack of a
supervision infrastructure, and systemd, like any supervision system,
does away with that. systemd has 99 problems, but rightly calling out
oversized script scaffoldings ain't one. Its disingenuousness lies in
pretending that an overengineered, opaque, all-encompassing, unescapable
framework is better than the duct tape; and I think you'll find that
s6-linux-init isn't quite the monster you seem to believe it is.

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x]
  2021-01-29 15:48         ` Laurent Bercot
@ 2021-01-31  7:49           ` s.karrmann
  2021-01-31 10:25             ` Laurent Bercot
  2021-02-15  8:36           ` Some suggestions on old-fashioned usage with s6 2.10.x Casper Ti. Vector
       [not found]           ` <YCoykUYGXVt+BAT9@caspervector>
  2 siblings, 1 reply; 24+ messages in thread
From: s.karrmann @ 2021-01-31  7:49 UTC (permalink / raw)
  To: supervision; +Cc: ska-supervision

Dear all,

after Laurents explanation about the supervision tree in stage 2 and 3,
I got the idea to put stage2 completely into a normal supervised service:

> 2021-01-29.16:48
> From: "Laurent Bercot" <ska-supervision@skarnet.org>
> To: "Casper Ti. Vector" <caspervector@gmail.com>, supervision@list.skarnet.org
> Subject: Re: Some suggestions on old-fashioned usage with s6 2.10.x
> 
> [...]
> >There is some non-trivial trade-off: in short, the existence of the
> >supervision tree after stage 2 is by itself a kind of "special case"
> >(eg. search for "careful handling" in [1]).
> 
> I feel like you misinterpreted my meaning.
> The *absence* of a supervision tree after stage 2 is precisely what
> requires careful handling, and runit only works because Linux has
> that peculiarity that kill -9 -1 does not kill the emitter!
> Having a supervision tree in stage 3 actually *helps* with the
> late shutdown procedure: shutdownd dies right after the kill (which
> would make it usable even on a system without the Linux specialcase)
> and is restarted by the supervisor for stage 4.
> [...] 

$ cat /etc/s6/services/s6-rc-up/run 
#! /usr/bin/execlineb -P

s6-envdir /etc/s6/init-env
multisubstitute {
  importas SCANDIR SCANDIR
  importas LIVEDIR LIVEDIR      
  importas COMPILED COMPILED     
  importas RCDEFAULT RCDEFAULT                              
  importas PATH PATH                          
  }
export PATH ${PATH}

# optional:  -- Question: Is this necessary?
  redirfd -w 0 ${SCANDIR}/service/s6-svscan-log/fifo
  # now the catch all logger runs
  fdclose 0

foreground { mkdir -p ${LIVEDIR} }
foreground { s6-rc-init -l ${LIVEDIR}/live -c ${COMPILED} ${SCAN} }
foreground { s6-svc -O . } # don't restart me
foreground { s6-rc -l ${LIVEDIR}/live -t 10000 change ${RCDEFAULT} }
# notify s6-supervise:
fdmove 1 3
foreground { echo "s6-rc ready, stage 2 is up." }
fdclose 1  # -- Question: Is this necessary?
# NB: shutdown should create ./down here, to avoid race conditions
# NB: init must ensure that there is no ./down here at startup.
# That is automatically fulfilled, if copied from a repo to /run/...
### THE END #####################################################################

and my init is:

$ cat /etc/s6/init
#! /usr/bin/execlineb -P
   
cd /
s6-setsid -qb                                         
envfile /etc/s6/init-envfile                                                      
multisubstitute {
  importas SCANDIR SCANDIR
  importas LIVEDIR LIVEDIR      
  importas COMPILED COMPILED     
  importas RCDEFAULT RCDEFAULT                              
  importas PATH PATH                          
  }
export PATH ${PATH}                                                  
             
# stage 1 init as PID=1
 
ifelse -nX                                   
  { # basic initialization
      foreground { # a hooks
        elglob -s locals /etc/s6/init.d/stage1a.d/*
          forx -E local { ${locals} }
            ${local}
      }

      foreground {
        # cf. https://code.dogmap.org./fs/
        elglob fss /fs/*
        forx -E -p fs { ${fss} }
          if { test -e ${fs}/mount-at-boot }
          mount ${fs}/mount # todo fsmount ${fs}
      }

      foreground { # b hooks
        elglob -s locals /etc/s6/init.d/stage1b.d/*
          forx -E local { ${locals} }
            ${local}
      }

    foreground { ln -s /fs/run-s6/mount/${SCANDIR} /run/s6 }
    foreground { mkdir -p /fs/run-s6/mount/${SCANDIR} }
    cp -a ${REPO} ${SCANDIR}
  }
  { # fallback login
    sulogin --force -t 600 # timeout 600 seconds, i.e. 10 minutes.
    # kernel panic
  }

# now the /dev must contain some files, i.e. /dev/null
redirfd -r 0 /dev/null  # useful for testing from a tty, i.e. don't consume input

execline-cd ${SCAN}
# catch all log also for stage 2
# s6-log duplicates it to console
redirfd -wnb 1 ./s6-svscan-log/fifo  # open fifo nonblocking
fdmove -c 2 1                        # copy it
emptyenv -P
exec -c
s6-svscan                         # start service scanner, i.e. PID=1 in stage 2
### THE END #####################################################################

I have three basic services:
- s6-linux-init-early-getty
- s6-rc-up
- s6-svscan-log

Everything else is up to s6-rc. Well, it will be. I'm still moving my Debian from systemd ("black box") to s6...
Also I may switch to s6-linux-init finally.

Kind regards,
Stefan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x]
  2021-01-31  7:49           ` stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x] s.karrmann
@ 2021-01-31 10:25             ` Laurent Bercot
  2021-01-31 20:51               ` stage2 as a service Stefan Karrmann
  0 siblings, 1 reply; 24+ messages in thread
From: Laurent Bercot @ 2021-01-31 10:25 UTC (permalink / raw)
  To: s.karrmann, supervision

[-- Attachment #1: Type: text/plain, Size: 3342 bytes --]

  Hi Stefan,
  Long time no see!

  A few comments:

># optional:  -- Question: Is this necessary?
>   redirfd -w 0 ${SCANDIR}/service/s6-svscan-log/fifo
>   # now the catch all logger runs
>   fdclose 0

  I'm not sure what you're trying to do here. The catch-all logger
should be automatically unblocked when
${SCANDIR}/service/s6-svscan-log/run starts.
  The fifo trick should not be visible at all in stage 2: by the time
stage 2 is running, everything is clean and no trickery should take
place. The point of the fifo trick is to make the supervision tree
log to a service that is part of the same supervision tree; but once
the tree has started, no sleight of hand is required.

>foreground { s6-svc -O . } # don't restart me

  If you have to do this, it is the first sign that you're abusing
the supervision pattern; see below.

>foreground { s6-rc -l ${LIVEDIR}/live -t 10000 change ${RCDEFAULT} }
># notify s6-supervise:
>fdmove 1 3
>foreground { echo "s6-rc ready, stage 2 is up." }
>fdclose 1  # -- Question: Is this necessary?

  It's not strictly necessary to close the fd after notifying readiness,
but it's a good idea nonetheless since the fd is unusable afterwards.
However, readiness notification is only useful when your service is
actually providing a... service once it's ready; here, your "service"
dies immediately, and is not restarted.
  That's because it's really a oneshot that you're treating as a
longrun, which is abusing the pattern.

># NB: shutdown should create ./down here, to avoid race conditions

  And here is the final proof: in order to make your architecture work,
you have to *fight* supervision features, because they are getting in
your way instead of helping you.
  This shows that it's really not a good idea to run stage 2 as a
supervised service. Stage 2 is really a one-time initialization script
that should be run after the supervision tree is started, but *not*
supervised.

>   { # fallback login
>     sulogin --force -t 600 # timeout 600 seconds, i.e. 10 minutes.
>     # kernel panic
>   }

  Your need for sulogin here comes from the fact that you're doing quite
complex operations in stage 1: a user-defined set of hooks, then
several filesystem mounts, then another user-defined set of hooks.
And even then, you're running those in foreground blocks, so you're
not catching the errors; the only time your fallback activates is if
the cp -a from ${REPO} fails. Was that intended?

  In any case, that's a lot of error-prone work that could be done in
stage 2 instead. If you keep stage 1 as barebones as possible (and
only mount one single writable filesystem for the service directories)
you should be able to do away with sulogin entirely. sulogin is a
horrible hack that was only written because sysvinit is complex enough
that it needs a special debugging tool if something breaks in the
middle.
  With an s6-based init, it's not the case. Ideally, any failure that
happens before your early getty is running can only be serious enough
that you have to init=/bin/sh anyway. And for everything else, you have
your early getty. No need for special tools.

>Also I may switch to s6-linux-init finally.

  It should definitely spare you a lot of work. That's what it's for :)

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: stage2 as a service
  2021-01-31 10:25             ` Laurent Bercot
@ 2021-01-31 20:51               ` Stefan Karrmann
  2021-02-01 10:35                 ` Laurent Bercot
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Karrmann @ 2021-01-31 20:51 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: supervision

Hi Laurent,

Laurent Bercot @ 2021-01-31.10:25:22 +0000:
>  Hi Stefan,
>  Long time no see!

Yes, but still known. I'm impressed!

>  A few comments:
>
> > # optional:  -- Question: Is this necessary?
> >   redirfd -w 0 ${SCANDIR}/service/s6-svscan-log/fifo
> >   # now the catch all logger runs
> >   fdclose 0
>
>  I'm not sure what you're trying to do here. The catch-all logger
> should be automatically unblocked when
> ${SCANDIR}/service/s6-svscan-log/run starts.

Yes, that's the idea.

>  The fifo trick should not be visible at all in stage 2: by the time
> stage 2 is running, everything is clean and no trickery should take
> place. The point of the fifo trick is to make the supervision tree
> log to a service that is part of the same supervision tree; but once
> the tree has started, no sleight of hand is required.

For the normal case you are absolutly right. But with stage 2 as a service
you have a race condition between stage 2 and s6-svscan-log. The usual
trick for stage 2 solves this problem.

> > foreground { s6-svc -O . } # don't restart me
>
>  If you have to do this, it is the first sign that you're abusing
> the supervision pattern; see below.

Well, running once is a part of supervise from the start on, by djb. It's
invented for oneshots.

> > foreground { s6-rc -l ${LIVEDIR}/live -t 10000 change ${RCDEFAULT} }
> > # notify s6-supervise:
> > fdmove 1 3
> > foreground { echo "s6-rc ready, stage 2 is up." }
> > fdclose 1  # -- Question: Is this necessary?
>
>  It's not strictly necessary to close the fd after notifying readiness,
> but it's a good idea nonetheless since the fd is unusable afterwards.
> However, readiness notification is only useful when your service is
> actually providing a... service once it's ready; here, your "service"
> dies immediately, and is not restarted.

You are right.

> That's because it's really a oneshot

Yes, as implemented since djb's daemontools.

> that you're treating as a longrun, which is abusing the pattern.
>
>
> > # NB: shutdown should create ./down here, to avoid race conditions
>
>  And here is the final proof: in order to make your architecture work,
> you have to *fight* supervision features, because they are getting in
> your way instead of helping you.

Well, s6-rc is using ./down, too. The shutdown is a very special case for
supervision.

>  This shows that it's really not a good idea to run stage 2 as a
> supervised service. Stage 2 is really a one-time initialization script
> that should be run after the supervision tree is started, but *not*
> supervised.

Stage 2 as a service allows us to restart it, if - accidentally - it is
necessary. Obviously, that should be really seldom the case.

> >   { # fallback login
> >     sulogin --force -t 600 # timeout 600 seconds, i.e. 10 minutes.
> >     # kernel panic
> >   }
>
>  Your need for sulogin here comes from the fact that you're doing quite
> complex operations in stage 1: a user-defined set of hooks, then
> several filesystem mounts, then another user-defined set of hooks.
> And even then, you're running those in foreground blocks, so you're
> not catching the errors; the only time your fallback activates is if
> the cp -a from ${REPO} fails. Was that intended?

No, I should replace foreground by if.

Well, actually I don't use the hooks. But distribution maintainers often
wants such things. E.g. they can scan for mapped devices (raid, lvm,
crypt). On the other hand, I know no distribution which uses Paul Jarc's
/fs/*.

>  In any case, that's a lot of error-prone work that could be done in
> stage 2 instead. If you keep stage 1 as barebones as possible (and
> only mount one single writable filesystem for the service directories)
> you should be able to do away with sulogin entirely. sulogin is a
> horrible hack that was only written because sysvinit is complex enough
> that it needs a special debugging tool if something breaks in the
> middle.

Reasonable. I mount only /run and /var, because the log, even the
catch-all-log resides in /var/log/.

>  With an s6-based init, it's not the case. Ideally, any failure that
> happens before your early getty is running can only be serious enough
> that you have to init=/bin/sh anyway. And for everything else, you have
> your early getty. No need for special tools.

Okay, thats resonable and simpler.

> > Also I may switch to s6-linux-init finally.
>
>  It should definitely spare you a lot of work. That's what it's for :)

I'm still migrating from systemd to s6{,-rc} with /fs/* step by step.
Therfore, I need more flexibility than s6-linux-init.

> --
>  Laurent

Kind regards,
--
Stefan Karrmann

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: stage2 as a service
  2021-01-31 20:51               ` stage2 as a service Stefan Karrmann
@ 2021-02-01 10:35                 ` Laurent Bercot
  0 siblings, 0 replies; 24+ messages in thread
From: Laurent Bercot @ 2021-02-01 10:35 UTC (permalink / raw)
  To: Stefan Karrmann; +Cc: supervision

>For the normal case you are absolutly right. But with stage 2 as a service
>you have a race condition between stage 2 and s6-svscan-log. The usual
>trick for stage 2 solves this problem.

  Ah, now I get it: stage 2 must not start before the catch-all logger
is ready, so you open the fifo for writing with the intent to block
until the reader has started. Yes, that makes sense and is needed.

  However, I will list it as another drawback of your approach :P
The internal sauce is visible in your stage 2 script. Ideally, stage 2
would only run once everything is already in place and wouldn't have
to bother with any mechanism details. (s6-linux-init achieves this.)

>Well, running once is a part of supervise from the start on, by djb. It's
>invented for oneshots.

  I disagree. svc -o was made for short-lived processes that you may want
to run again. Or for testing a crashing daemon without having to deal
with automatic restarts and failure loops. Anything that you
potentially run more than once. It's "run once", but still in the
context of supervision; you would only use it on services that have
at least the potential to make use of supervision.

  For things that you will only ever run once, why even supervise them in
the first place? Supervision is a wonderful tool, but like any tool, it
should only be used when appropriate, and I don't think the one-time
initialization script is the place for it.

>Well, s6-rc is using ./down, too. The shutdown is a very special case for
>supervision.

  There is an important difference.
  s6-rc is using ./down files for services that it wants down,
independently from the machine's lifetime. Typically it's using them
at boot time, in order to have the supervisors start, but not the
services themselves (they'll be brought up later on according to the
dependency graph). s6-rc *wants* the services to be supervised, even
if it's not starting them at the same time as the supervisors.

  You're only using a ./down file at shutdown time because you have a
service that you know must not restart when the supervisor is killed,
and will never restart, and has not been restarted for the entire
machine lifetime. The presence of the supervisor here is not a feature,
it brings you no value, on the contrary - it's only making your life
more difficult.

>Stage 2 as a service allows us to restart it, if - accidentally - it is
>necessary. Obviously, that should be really seldom the case.

  Honestly, I can't think of a single case where you'd need to restart
the initialization sequence of your machine. Anything you'd want to
restart once the system has booted should be handled by the service
manager.

>I'm still migrating from systemd to s6{,-rc} with /fs/* step by step.
>Therfore, I need more flexibility than s6-linux-init.

  The migration from systemd's service manager (to s6-rc or anything
else) is totally independent from the init system change. You can
make a systemd oneshot that launches s6-rc-init then s6-rc, and
convert all your systemd services one by one to s6-rc services; then,
once you don't depend on systemd for anything else than the early
boot and the s6-rc service, you can switch inits, and then you should
be able to use s6-linux-init.

  I generally recommend doing the opposite: switching to s6-linux-init
first then converting services to s6-rc, because the latter is a lot
more work: for instance Adélie uses s6-linux-init but still has
OpenRC as its service manager, because I haven't done the conversion
work yet. However, it's different with systemd, because systemd cannot
be run as not-pid-1 - its service manager cannot be separated from its
early boot functionality. So you have to keep it as init until it's
not needed for anything else. They call it modular software ;)

--
  Laurent

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29 15:48         ` Laurent Bercot
  2021-01-31  7:49           ` stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x] s.karrmann
@ 2021-02-15  8:36           ` Casper Ti. Vector
       [not found]           ` <YCoykUYGXVt+BAT9@caspervector>
  2 siblings, 0 replies; 24+ messages in thread
From: Casper Ti. Vector @ 2021-02-15  8:36 UTC (permalink / raw)
  To: supervision

(I am extremely sorry for delaying this mail so much.  I have just done
two major refactoring/overhaul projects in this vacation around the
Spring Festival, and still have one remaining.  These projects are a
part of my formal occupation, but I would not have much low-distraction
time best for this kind of work apart from vacations.  By the way, Happy
Chinese "Niu" Year.)

On Fri, Jan 29, 2021 at 03:48:09PM +0000, Laurent Bercot wrote:
>  Bear in mind that my eventual goal for s6 is distro adoption. And
> distro maintainers will find any and every excuse to reject it.
> Having a "shutdown" command that works exactly like sysvinit's
> shutdown is essential, because it deals with a major objection, which
> is incompatibility and user-unfriendliness.

I do not really understand their excuse here.  CLI incompatibility is
trivially solvable by creating links (or so) for `halt' / `poweroff' /
`reboot', and even the `shutdown' command can be a wrapper for an `atd'
based mechanism.  In case they complain about the implementation of the
CLI, the actual interface to `shutdownd' is not that similar to the
`telinit' interface (at least to the one I think it is) either.

>  The *absence* of a supervision tree after stage 2 is precisely what
> requires careful handling, and runit only works because Linux has
> that peculiarity that kill -9 -1 does not kill the emitter!
>  Having a supervision tree in stage 3 actually *helps* with the
> late shutdown procedure: shutdownd dies right after the kill (which
> would make it usable even on a system without the Linux specialcase)
> and is restarted by the supervisor for stage 4.

If I understand it correctly, letting `s6-svscan' exec() stage 3 also
achieves immunity to `kill -KILL -1'.  I also find this "old-fashioned"
approach conceptually and implementationally simpler than an army of
`s6-supervise' restarting only to be killed again, and a `shutdownd'
restarting to execute the halting procedure (see some kind of "state"
here?  Functional programmers do not hate it for nothing).  I know this
seems less recoverable than the `shutdownd' approach, but does that
count as a reason strong enough to warrant the latter approach, if the
halting procedure has already been distilled to its bare essentials
and is virtually immune to all non-fatal problems (that is, excluding
something as severe as the absence of a `reboot -f' implementation)?

> [...] More seriously, you're being unfair, because you're not locked
> in at all. You can use the new s6-linux-init and *still* do everything
> you were doing before: [...]
>  Besides, when systemd advocates paint sysv-rc shell scripts as
> "duct tape", they're *right*. sysv-rc (and OpenRC) scripts are loaded
> with boilerplate that only exists to compensate for the lack of a
> supervision infrastructure, and systemd, like any supervision system,
> does away with that. systemd has 99 problems, but rightly calling out
> oversized script scaffoldings ain't one. Its disingenuousness lies in
> pretending that an overengineered, opaque, all-encompassing, unescapable
> framework is better than the duct tape; and I think you'll find that
> s6-linux-init isn't quite the monster you seem to believe it is.

What I intend to express is that unconditionally correlating "a bunch
of [...] scripts" to "a 'screwdriver and duct tape' feel" is a typical
systemd fallacy.  You seemed to be confusing "scripts containing lots of
boilerplate" with "scripts that are minimised and clear".

>  So basically, all you're complaining about is that s6-linux-init-maker
> is not generating your preferred run-image layout out-of-the-box
> anymore. Well, you're an advanced user, you know what you are doing;
> the knobs and levers are *still all there*. The only binary that
> kinda hardcodes things is s6-linux-init itself, and if you give it a
> try, I'm pretty sure you'll like it, because there was never any reason
> to modify the core of stage 1 in the first place and what it does is
> what any kind of stage 1 needs to do, no matter what language it's
> written in.

According to Guillermo's observation about the behavioural similarity
between slew's `rc.boot'/`rc.halt' and the current mechanism with
s6-linux-init, if I understand the big picture correctly enough, the
fundamental difference between the approaches might be the difference in
languages (to avoid further digression, here I expressly avoid talking
about Lisp ;) and the attendant difference in dependencies.  Speaking of
the latter, I do not find declaring dependence on things like `rc' and
BusyBox really a problem to any packager of systemd.  Speaking of the
former, the "old-fashioned" approach is obviously more flexible; I have
also said that it is probably shorter and perhaps clearer.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

^ permalink raw reply	[flat|nested] 24+ messages in thread

[parent not found: <YCoykUYGXVt+BAT9@caspervector>]

[parent not found: <em949fd937-c7bc-43db-9b49-3cc235b8f2ad@elzian>]

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]             ` <em949fd937-c7bc-43db-9b49-3cc235b8f2ad@elzian>
@ 2021-02-16  8:53               ` Casper Ti. Vector
  0 siblings, 0 replies; 24+ messages in thread
From: Casper Ti. Vector @ 2021-02-16  8:53 UTC (permalink / raw)
  To: supervision

On Mon, Feb 15, 2021 at 02:54:52PM +0000, Laurent Bercot wrote:
>  The options! The options need to be all compatible. :) And for
> "shutdown", they would never implement a wrapper themselves, I would
> have to do it for them - which is exactly what I did, although it's
> a C program that actually implements shutdown, not a wrapper around an
> atd program I can't assume will be present on the system.

OK, now I understand their excuse.  Nevertheless I still do not think
all these necessarily require something like `shutdownd'; even in the
absence of `atd', chainloading a backgrounding timer for `shutdown' is
not a big exercise with execline (which is perhaps exactly what you have
already done in `s6-linux-init-maker').

>  What army? By the time the final kill happens, the service manager
> has brought everything down, and shutdownd has cleaned up the scandir,
> only leaving it with what *should* be restarted. You seem to think
> I haven't given these basic things the two minutes of attention they
> deserve.

Sorry then, I did not see that in the documentation; now the scandir
cleanup contributes some additional complexity.  Since the mechanism
behind `shutdownd' does not seem to be adequately explained at least to
me, here I explicitly do not conclude this addition is worthy or not.

>  Conceptually, the "old-fashioned" approach may be simpler, yes.
> Implementationally, I disagree that it is, and I'll give you a very
> simple example to illustrate it, but it's not the only thing that
> implementations must pay attention to, there are a few other quirks
> that I've stumbled upon and that disappear when s6-svscan remains
> pid 1 until the very end. [...] after ["wait { }"], you need to make
> sure to unmount filesystems immediately [...]

This is not exactly what older s6-linux-init actually do, which has
been mimicked by slew.  As long as the procedure between `wait { }' and
`umount' does not produce orphans, the `umount' will be fine.  I have
noticed you saying "a shell does not give ordering guarantees when it
gets a SIGCHLD", but it seems to me that the no-orphan requirement can
be verified by ensuring no commands involved gets backgrounded.  Of
course, feel free to correct that; more importantly, may I request you
to list the quirks you have encountered?  Only by that may we really see
how much the remaining `s6-svscan' brings, in comparison with how much
it takes (see my paragraph above).

> If your shutdown sequence is e.g. written in Lisp, and your Lisp
> interpreter handles pid 1 duties correctly, okay, that's fair, but
> that's *two* programs that need to do it, when one would be enough. [...]
>  The fundamental difference is that the current s6-linux-init hardcodes
> a lot of things in stage 1, purposefully. Yes, it is less flexible -
> though you *still* have a stage 1 hook if you really need it - but the
> whole point is to make stage 1 entirely turnkey and foolproof [...]

When mentioning Lisp, I did not mean to imply Lisp interpreters, but
optimising Lisp compilers, which blur the border between scripts and
compiled programs (cf. `fdclose' and `fd_close()').  But you have said
the problem is not about scripting, so we do not disagree on this; with
this background, I do not quite understand your emphasis on stage 1 in
s6-linux-init -- do you mean somewhere that it prepares for `shutdownd'?

>  The "screwdriver and duct tape" feel does not come from the fact that
> those are scripts; it comes from the fact that the scripts run in a less
> forgiving environment where they have to provide the necessary guarantees
> themselves, as opposed to keeping using the framework that has been
> running for the whole lifetime of the system and that is still valid and
> helpful, even though for once you have to interact with it and tell it
> to stop supervising some services because we're shutting down - which is
> the exact kind of situation the supervision API was made for.

Now that scripting does not seem to be a major problem (which falsifies
my previous judgement that it was; sorry for that), the only crucial
issue is the costs and benefits of the supervision tree on halting.
So may I again request you to spare some time to explain the detailed
workflow behind `shutdownd', and the actual quirks that a remaining
`s6-svscan' helps to solve?  Perhaps current s6-linux-init and older
s6-linux-init (with derivatives like slew) are just software that suit
different niches (eg. sysvinit/systemd-minded audience vs. those who
accept daemontools-ish software well), which would be perfectly fine.

>  I also disagree that the script approach is shorter and/or clearer.
> It may be clearer to people who read a script better than a doc page
> (or C code), but I don't think it should matter as long as the doc is
> accurate; if it's not, that's what should be fixed. And the source code
> may be shorter with a scripted stage 1, for sure, but the code paths
> taken by the CPU are way shorter with the C version, and make fewer
> assumptions. I'm confident that the current s6-linux-init breaks in
> significantly fewer situations than its previous incarnation.

Then the `shutdownd' documentation might need to be fixed; BTW, the "Is
it possible to write stage {1,3} init in a scripting language?" sections
from `s6-svscan-1.html' have not seen real changes since 2014 ;)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-02-16  8:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <YBKNJEuGeYag91Q1@caspervector>
2021-01-28 17:21 ` Some suggestions on old-fashioned usage with s6 2.10.x Laurent Bercot
2021-01-28 19:08   ` Roy Lanek
2021-01-28 19:55   ` Casper Ti. Vector
     [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
2021-01-29  0:07     ` Laurent Bercot
2021-01-29  2:44       ` Casper Ti. Vector
     [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
2021-01-29  9:36         ` Laurent Bercot
2021-02-15 14:58 Laurent Bercot
2021-02-15 14:59 ` Laurent Bercot
  -- strict thread matches above, loose matches on Subject: below --
2021-01-28 10:08 Casper Ti. Vector
2021-01-28 11:09 ` Casper Ti. Vector
2021-01-28 14:05   ` Casper Ti. Vector
2021-01-29  1:41 ` Guillermo
2021-01-29  3:06   ` Casper Ti. Vector
2021-01-29 17:27     ` Guillermo
2021-01-29 17:39       ` Guillermo
     [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
2021-01-29  9:57     ` Laurent Bercot
2021-01-29 14:33       ` Casper Ti. Vector
     [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
2021-01-29 15:48         ` Laurent Bercot
2021-01-31  7:49           ` stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x] s.karrmann
2021-01-31 10:25             ` Laurent Bercot
2021-01-31 20:51               ` stage2 as a service Stefan Karrmann
2021-02-01 10:35                 ` Laurent Bercot
2021-02-15  8:36           ` Some suggestions on old-fashioned usage with s6 2.10.x Casper Ti. Vector
     [not found]           ` <YCoykUYGXVt+BAT9@caspervector>
     [not found]             ` <em949fd937-c7bc-43db-9b49-3cc235b8f2ad@elzian>
2021-02-16  8:53               ` Casper Ti. Vector

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).