supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* Re: Some suggestions on old-fashioned usage with s6 2.10.x
@ 2021-02-15 14:58 Laurent Bercot
  2021-02-15 14:59 ` Laurent Bercot
  0 siblings, 1 reply; 20+ messages in thread
From: Laurent Bercot @ 2021-02-15 14:58 UTC (permalink / raw)
  To: supervision


>I do not really understand their excuse here.  CLI incompatibility is
>trivially solvable by creating links (or so) for `halt' / `poweroff' /
>`reboot', and even the `shutdown' command can be a wrapper for an `atd'
>based mechanism.

  The options! The options need to be all compatible. :) And for
"shutdown", they would never implement a wrapper themselves, I would
have to do it for them - which is exactly what I did, although it's
a C program that actually implements shutdown, not a wrapper around an
atd program I can't assume will be present on the system.

  I'm not defending distros here, but it *is* true that a drop-in
replacement, in general, is a lot easier to deal with than a drop-in-
most-of-the-time-maybe-but-not-with-that-option replacement. Anyone
who has tried to replace GNU coreutils with busybox can relate.


>   In case they complain about the implementation of the
>CLI, the actual interface to `shutdownd' is not that similar to the
>`telinit' interface (at least to the one I think it is) either.

  Which is why s6-l-i also comes with a runleveld service, for people
who need the telinit interface. shutdownd is only for the actual
stages 3 and 4, not service management (which telinit is a now obsolete
forerunner of).


>If I understand it correctly, letting `s6-svscan' exec() stage 3 also
>achieves immunity to `kill -KILL -1'.  I also find this "old-fashioned"
>approach conceptually and implementationally simpler than an army of
>`s6-supervise' restarting only to be killed again

  What army? By the time the final kill happens, the service manager
has brought everything down, and shutdownd has cleaned up the scandir,
only leaving it with what *should* be restarted. You seem to think
I haven't given these basic things the two minutes of attention they
deserve.

  Conceptually, the "old-fashioned" approach may be simpler, yes.
Implementationally, I disagree that it is, and I'll give you a very
simple example to illustrate it, but it's not the only thing that
implementations must pay attention to, there are a few other quirks
that I've stumbled upon and that disappear when s6-svscan remains
pid 1 until the very end.

  You're going to kill every process. The zombies need to be reapt,
else you won't be able to unmount the filesystems. So your pid 1
needs to be able to wait for children it doesn't know it has
(foreground does not) and guarantee that it doesn't try unmounting
the filesystems before having reapt everything (a shell does not give
ordering guarantees when it gets a SIGCHLD, even though it works in
practice). So for this specific use I had to add a special case to
execline's wait command, "wait { }", that waits on *everything*, and
also make sure that wait doesn't die because it's going to run as pid 1,
even very briefly.
  And after that, you need to make sure to unmount filesystems
immediately, because if you spawn other processes, you would first have
to wait on them as well.

  For every process that may run as pid 1, you need extra special care.
Using an interpreter program as pid 1 means your interpreter needs to
have been designed for it. Using execline means every execline binary
that may run as pid 1 needs to be vetted for it. If your shutdown
sequence is e.g. written in Lisp, and your Lisp interpreter handles
pid 1 duties correctly, okay, that's fair, but that's *two* programs
that need to do it, when one would be enough.
  s6-svscan has already been designed for that and provides all the
guarantees you need. When s6-svscan is running as pid 1, it takes away
a lot of mental burden off the shutdown sequence.


>  and a `shutdownd'
>restarting to execute the halting procedure (see some kind of "state"
>here?  Functional programmers do not hate it for nothing).

  Yes, there is one bit of state involved. I think our feeble human minds,
and a fortiori computers, can handle one bit of state.


>   I know this
>seems less recoverable than the `shutdownd' approach, but does that
>count as a reason strong enough to warrant the latter approach, if the
>halting procedure has already been distilled to its bare essentials
>and is virtually immune to all non-fatal problems (that is, excluding
>something as severe as the absence of a `reboot -f' implementation)?

  My point is that making the halting procedure virtually immune to all
non-fatal problems is *more difficult* when you tear down the
supervision tree early. I am more confident in the shutdownd approach,
because it is less fragile, more forgiving. If there's a bug in it, it
will be easy to fix.

  I understand that the barebones approach is intellectually more
satisfying - it's more minimalistic, more symmetrical, etc. But shutting
down a machine is *not* symmetrical to booting it. When you boot, you
start with nothing and need a precise sequence of instructions in order
to build up to a functional system. When you shutdown, you have a fully
functional system already, that has proven to be working, and you just
need to clean up and make sure you don't stop with an incoherent state;
you don't need to deconstruct the working system you have in order to
poweroff with the minimal amount of stuff! As long as you can cleanly
unmount the filesystems, nobody cares what your process tree looks like
when the machine is going to be *down*.

  In this instance, the existence of a reliable pid 1 with well-known
behaviour is a strong guarantee that makes writing a shutdown sequence
easy enough. Voluntarily getting rid of that guarantee and making your
system more fragile because technically supervision is not *needed*
anymore may make sense from an academic perspective, and may be
aesthetically more pleasing, but from an engineering standpoint, it is
not a good idea.


>What I intend to express is that unconditionally correlating "a bunch
>of [...] scripts" to "a 'screwdriver and duct tape' feel" is a typical
>systemd fallacy.  You seemed to be confusing "scripts containing lots of
>boilerplate" with "scripts that are minimised and clear".

  The "screwdriver and duct tape" feel does not come from the fact that
those are scripts; it comes from the fact that the scripts run in a less
forgiving environment where they have to provide the necessary guarantees
themselves, as opposed to keeping using the framework that has been
running for the whole lifetime of the system and that is still valid and
helpful, even though for once you have to interact with it and tell it
to stop supervising some services because we're shutting down - which is
the exact kind of situation the supervision API was made for.

  The distinction is similar to doing things in kernel space vs. in user
space. If I have a task to do and have a kernel running, I prefer to do
the task in user space - it's more comfortable and less error-prone, and
if someone wishes to do it in kernel space, my reaction will be "why?
this is more hackish, they're probably trying to flex their kernel
programmer muscles, good engineering says this belongs in user space".
Running naked scripts as pid 1 when you don't have to kinda gives me
the same feeling.


>According to Guillermo's observation about the behavioural similarity
>between slew's `rc.boot'/`rc.halt' and the current mechanism with
>s6-linux-init, if I understand the big picture correctly enough, the
>fundamental difference between the approaches might be the difference in
>languages (to avoid further digression, here I expressly avoid talking
>about Lisp ;) and the attendant difference in dependencies.  Speaking of
>the latter, I do not find declaring dependence on things like `rc' and
>BusyBox really a problem to any packager of systemd.  Speaking of the
>former, the "old-fashioned" approach is obviously more flexible; I have
>also said that it is probably shorter and perhaps clearer.

  The fundamental difference is that the current s6-linux-init hardcodes
a lot of things in stage 1, purposefully. Yes, it is less flexible -
though you *still* have a stage 1 hook if you really need it - but the
whole point is to make stage 1 entirely turnkey and foolproof, and only
hand off to the user when the supervision framework is in place and
they don't have to worry about basic things like not being able to log
into the system. Same reason why I prefer the shutdownd approach:
minimize and automate all the parts where the supervision tree is not
operational, so that users can always assume that nothing they do is
going to brick the system.

  It bears repeating that the main criticism I've received for the s6
ecosystem is, overwhelmingly, the *abundance* of moving parts, and the
difficulty of grasping the big picture. The current s6-linux-init helps
with this, by hiding a lot of fragile moving parts and making it
*easier* to switch to s6 as an init system without having to fully
understand the intricate details of stage 1.
  Of course, it's not necessarily perceived as a benefit by tinkerers
like you, who do not mind, or even enjoy, the extra DIY feel. I'm
sorry - but if you need that kind of flexibility in stage 1, you are
perfectly capable of building your own stage 1 without s6-linux-init.

  I also disagree that the script approach is shorter and/or clearer.
It may be clearer to people who read a script better than a doc page
(or C code), but I don't think it should matter as long as the doc is
accurate; if it's not, that's what should be fixed. And the source code
may be shorter with a scripted stage 1, for sure, but the code paths
taken by the CPU are way shorter with the C version, and make fewer
assumptions. I'm confident that the current s6-linux-init breaks in
significantly fewer situations than its previous incarnation.

--
  Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-02-15 14:58 Some suggestions on old-fashioned usage with s6 2.10.x Laurent Bercot
@ 2021-02-15 14:59 ` Laurent Bercot
  0 siblings, 0 replies; 20+ messages in thread
From: Laurent Bercot @ 2021-02-15 14:59 UTC (permalink / raw)
  To: supervision

(Apologies for the broken threading, I originally sent my answer with
the incorrect From: and it was rightfully rejected.)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]             ` <em949fd937-c7bc-43db-9b49-3cc235b8f2ad@elzian>
@ 2021-02-16  8:53               ` Casper Ti. Vector
  0 siblings, 0 replies; 20+ messages in thread
From: Casper Ti. Vector @ 2021-02-16  8:53 UTC (permalink / raw)
  To: supervision

On Mon, Feb 15, 2021 at 02:54:52PM +0000, Laurent Bercot wrote:
>  The options! The options need to be all compatible. :) And for
> "shutdown", they would never implement a wrapper themselves, I would
> have to do it for them - which is exactly what I did, although it's
> a C program that actually implements shutdown, not a wrapper around an
> atd program I can't assume will be present on the system.

OK, now I understand their excuse.  Nevertheless I still do not think
all these necessarily require something like `shutdownd'; even in the
absence of `atd', chainloading a backgrounding timer for `shutdown' is
not a big exercise with execline (which is perhaps exactly what you have
already done in `s6-linux-init-maker').

>  What army? By the time the final kill happens, the service manager
> has brought everything down, and shutdownd has cleaned up the scandir,
> only leaving it with what *should* be restarted. You seem to think
> I haven't given these basic things the two minutes of attention they
> deserve.

Sorry then, I did not see that in the documentation; now the scandir
cleanup contributes some additional complexity.  Since the mechanism
behind `shutdownd' does not seem to be adequately explained at least to
me, here I explicitly do not conclude this addition is worthy or not.

>  Conceptually, the "old-fashioned" approach may be simpler, yes.
> Implementationally, I disagree that it is, and I'll give you a very
> simple example to illustrate it, but it's not the only thing that
> implementations must pay attention to, there are a few other quirks
> that I've stumbled upon and that disappear when s6-svscan remains
> pid 1 until the very end. [...] after ["wait { }"], you need to make
> sure to unmount filesystems immediately [...]

This is not exactly what older s6-linux-init actually do, which has
been mimicked by slew.  As long as the procedure between `wait { }' and
`umount' does not produce orphans, the `umount' will be fine.  I have
noticed you saying "a shell does not give ordering guarantees when it
gets a SIGCHLD", but it seems to me that the no-orphan requirement can
be verified by ensuring no commands involved gets backgrounded.  Of
course, feel free to correct that; more importantly, may I request you
to list the quirks you have encountered?  Only by that may we really see
how much the remaining `s6-svscan' brings, in comparison with how much
it takes (see my paragraph above).

> If your shutdown sequence is e.g. written in Lisp, and your Lisp
> interpreter handles pid 1 duties correctly, okay, that's fair, but
> that's *two* programs that need to do it, when one would be enough. [...]
>  The fundamental difference is that the current s6-linux-init hardcodes
> a lot of things in stage 1, purposefully. Yes, it is less flexible -
> though you *still* have a stage 1 hook if you really need it - but the
> whole point is to make stage 1 entirely turnkey and foolproof [...]

When mentioning Lisp, I did not mean to imply Lisp interpreters, but
optimising Lisp compilers, which blur the border between scripts and
compiled programs (cf. `fdclose' and `fd_close()').  But you have said
the problem is not about scripting, so we do not disagree on this; with
this background, I do not quite understand your emphasis on stage 1 in
s6-linux-init -- do you mean somewhere that it prepares for `shutdownd'?

>  The "screwdriver and duct tape" feel does not come from the fact that
> those are scripts; it comes from the fact that the scripts run in a less
> forgiving environment where they have to provide the necessary guarantees
> themselves, as opposed to keeping using the framework that has been
> running for the whole lifetime of the system and that is still valid and
> helpful, even though for once you have to interact with it and tell it
> to stop supervising some services because we're shutting down - which is
> the exact kind of situation the supervision API was made for.

Now that scripting does not seem to be a major problem (which falsifies
my previous judgement that it was; sorry for that), the only crucial
issue is the costs and benefits of the supervision tree on halting.
So may I again request you to spare some time to explain the detailed
workflow behind `shutdownd', and the actual quirks that a remaining
`s6-svscan' helps to solve?  Perhaps current s6-linux-init and older
s6-linux-init (with derivatives like slew) are just software that suit
different niches (eg. sysvinit/systemd-minded audience vs. those who
accept daemontools-ish software well), which would be perfectly fine.

>  I also disagree that the script approach is shorter and/or clearer.
> It may be clearer to people who read a script better than a doc page
> (or C code), but I don't think it should matter as long as the doc is
> accurate; if it's not, that's what should be fixed. And the source code
> may be shorter with a scripted stage 1, for sure, but the code paths
> taken by the CPU are way shorter with the C version, and make fewer
> assumptions. I'm confident that the current s6-linux-init breaks in
> significantly fewer situations than its previous incarnation.

Then the `shutdownd' documentation might need to be fixed; BTW, the "Is
it possible to write stage {1,3} init in a scripting language?" sections
from `s6-svscan-1.html' have not seen real changes since 2014 ;)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29 15:48         ` Laurent Bercot
@ 2021-02-15  8:36           ` Casper Ti. Vector
       [not found]           ` <YCoykUYGXVt+BAT9@caspervector>
  1 sibling, 0 replies; 20+ messages in thread
From: Casper Ti. Vector @ 2021-02-15  8:36 UTC (permalink / raw)
  To: supervision

(I am extremely sorry for delaying this mail so much.  I have just done
two major refactoring/overhaul projects in this vacation around the
Spring Festival, and still have one remaining.  These projects are a
part of my formal occupation, but I would not have much low-distraction
time best for this kind of work apart from vacations.  By the way, Happy
Chinese "Niu" Year.)

On Fri, Jan 29, 2021 at 03:48:09PM +0000, Laurent Bercot wrote:
>  Bear in mind that my eventual goal for s6 is distro adoption. And
> distro maintainers will find any and every excuse to reject it.
> Having a "shutdown" command that works exactly like sysvinit's
> shutdown is essential, because it deals with a major objection, which
> is incompatibility and user-unfriendliness.

I do not really understand their excuse here.  CLI incompatibility is
trivially solvable by creating links (or so) for `halt' / `poweroff' /
`reboot', and even the `shutdown' command can be a wrapper for an `atd'
based mechanism.  In case they complain about the implementation of the
CLI, the actual interface to `shutdownd' is not that similar to the
`telinit' interface (at least to the one I think it is) either.

>  The *absence* of a supervision tree after stage 2 is precisely what
> requires careful handling, and runit only works because Linux has
> that peculiarity that kill -9 -1 does not kill the emitter!
>  Having a supervision tree in stage 3 actually *helps* with the
> late shutdown procedure: shutdownd dies right after the kill (which
> would make it usable even on a system without the Linux specialcase)
> and is restarted by the supervisor for stage 4.

If I understand it correctly, letting `s6-svscan' exec() stage 3 also
achieves immunity to `kill -KILL -1'.  I also find this "old-fashioned"
approach conceptually and implementationally simpler than an army of
`s6-supervise' restarting only to be killed again, and a `shutdownd'
restarting to execute the halting procedure (see some kind of "state"
here?  Functional programmers do not hate it for nothing).  I know this
seems less recoverable than the `shutdownd' approach, but does that
count as a reason strong enough to warrant the latter approach, if the
halting procedure has already been distilled to its bare essentials
and is virtually immune to all non-fatal problems (that is, excluding
something as severe as the absence of a `reboot -f' implementation)?

> [...] More seriously, you're being unfair, because you're not locked
> in at all. You can use the new s6-linux-init and *still* do everything
> you were doing before: [...]
>  Besides, when systemd advocates paint sysv-rc shell scripts as
> "duct tape", they're *right*. sysv-rc (and OpenRC) scripts are loaded
> with boilerplate that only exists to compensate for the lack of a
> supervision infrastructure, and systemd, like any supervision system,
> does away with that. systemd has 99 problems, but rightly calling out
> oversized script scaffoldings ain't one. Its disingenuousness lies in
> pretending that an overengineered, opaque, all-encompassing, unescapable
> framework is better than the duct tape; and I think you'll find that
> s6-linux-init isn't quite the monster you seem to believe it is.

What I intend to express is that unconditionally correlating "a bunch
of [...] scripts" to "a 'screwdriver and duct tape' feel" is a typical
systemd fallacy.  You seemed to be confusing "scripts containing lots of
boilerplate" with "scripts that are minimised and clear".

>  So basically, all you're complaining about is that s6-linux-init-maker
> is not generating your preferred run-image layout out-of-the-box
> anymore. Well, you're an advanced user, you know what you are doing;
> the knobs and levers are *still all there*. The only binary that
> kinda hardcodes things is s6-linux-init itself, and if you give it a
> try, I'm pretty sure you'll like it, because there was never any reason
> to modify the core of stage 1 in the first place and what it does is
> what any kind of stage 1 needs to do, no matter what language it's
> written in.

According to Guillermo's observation about the behavioural similarity
between slew's `rc.boot'/`rc.halt' and the current mechanism with
s6-linux-init, if I understand the big picture correctly enough, the
fundamental difference between the approaches might be the difference in
languages (to avoid further digression, here I expressly avoid talking
about Lisp ;) and the attendant difference in dependencies.  Speaking of
the latter, I do not find declaring dependence on things like `rc' and
BusyBox really a problem to any packager of systemd.  Speaking of the
former, the "old-fashioned" approach is obviously more flexible; I have
also said that it is probably shorter and perhaps clearer.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29 17:27     ` Guillermo
@ 2021-01-29 17:39       ` Guillermo
  0 siblings, 0 replies; 20+ messages in thread
From: Guillermo @ 2021-01-29 17:39 UTC (permalink / raw)
  To: Supervision

El vie, 29 ene 2021 a las 14:27, Guillermo escribió:
> [...]

Huh. Lots of apostrophes that shouldn't be there, and that I just
couldn't see without a fixed width font...

G.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  3:06   ` Casper Ti. Vector
@ 2021-01-29 17:27     ` Guillermo
  2021-01-29 17:39       ` Guillermo
  0 siblings, 1 reply; 20+ messages in thread
From: Guillermo @ 2021-01-29 17:27 UTC (permalink / raw)
  To: Supervision

El vie, 29 ene 2021 a las 0:07, Casper Ti. Vector escribió:
>
> Not using s6-linux-init has never been an explicit goal, [...]
>
> Currently I do not understand the `s6-linux-init-shutdown(d)' way
> well, so the old-fashioned way is retained at least for now, [...]

Forgive me if I'm misunderstanding aspects of the architecture, but
jugdging from a quick look at the Git repository, I *think* that you
could depend on s6-linux-init (the package):

* You could replace /etc/slew'/init/rc.boot with s6-linux-init (the
program), or a wrapper script around it. s6-linux-init does more or
less the same as the current script, except running load_clock.rc
(which you could do in /etc/slew'/init/rc.init).
* You could move the current /etc/slew/run to
/etc/s6-linux-init/current/run-image when installing slew. I'm not
sure what skalibs' hiercopy() would do if the latter is just a symlink
to the former.
* You could replace /etc/slew'/init/rc.halt with a service directory
for s6-linux-init-shutdownd in /etc/slew'/run/service, just like you
do now for the catch-all logger.
s6-linux-init-shutdownd does more or less the same as the current
script, except running save_log.rc, save_clock.rc and calling swapoff.
You can move that to /etc/slew'/init/rc.fin, or
/etc/s6-linux-init/current/rc.shutdown.final if you must do some of it
after killing all processes and unmounting filesystems.
* You could symlink /etc/s6-linux-init/current/rc.{init,shutdown} to
/etc/slew'/init/rc.{init,fin}, perhaps with some minimal
modifications.
* You could replace calls to s6-svscanctl in
/etc/slew'/run/service/.s6-svscan/SIG* with calls to s6-linux-init-hpr
if you want to be able to do e.g. 'busybox poweroff''.
* You could replace /etc/slew'/run/service/.s6-svscan/finish with
something simpler, instead of being a symlink to
/etc/slew/init/rc.halt.

G.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
@ 2021-01-29 15:48         ` Laurent Bercot
  2021-02-15  8:36           ` Casper Ti. Vector
       [not found]           ` <YCoykUYGXVt+BAT9@caspervector>
  0 siblings, 2 replies; 20+ messages in thread
From: Laurent Bercot @ 2021-01-29 15:48 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>But even `s6-reboot' from older s6-linux-init, or `busybox reboot'
>with slew can already do that...

  Yes. And as your sharp mind undoubtedly noticed, those commands are
not the same as "reboot".

  Which means burden on users.

  Yes, I also thought it was a small burden at first, but it's not.
It means that all sysvinit-compatible automation does not work, so
there is some porting work to do. And the gap between "a little work"
and "zero work" is HUGE. It's much bigger than the gap between
"a little work" and "a lot of work".

  Bear in mind that my eventual goal for s6 is distro adoption. And
distro maintainers will find any and every excuse to reject it.
Having a "shutdown" command that works exactly like sysvinit's
shutdown is essential, because it deals with a major objection, which
is incompatibility and user-unfriendliness.


>There is some non-trivial trade-off: in short, the existence of the
>supervision tree after stage 2 is by itself a kind of "special case"
>(eg. search for "careful handling" in [1]).

  I feel like you misinterpreted my meaning.
  The *absence* of a supervision tree after stage 2 is precisely what
requires careful handling, and runit only works because Linux has
that peculiarity that kill -9 -1 does not kill the emitter!
  Having a supervision tree in stage 3 actually *helps* with the
late shutdown procedure: shutdownd dies right after the kill (which
would make it usable even on a system without the Linux specialcase)
and is restarted by the supervisor for stage 4.


>   I am also thinking about
>an application scenario, where a supervision tree with a new s6 version
>replaces the active tree with an old version.  This is somewhat silly:
>it can be a little useful in case of major version bump, but is probably
>better solved by complete reboot to completely get rid of all old things
>(s6 or not, updated together) in the memory.

  Yes, upgrading your init without rebooting is generally not worth
it. Note that s6-svscan could still be configured to do that with
clever use of SIG scripts; but restarting the s6-supervise processes
is a pain to do without restarting your whole supervision tree, so
it's probably better to just reboot.
  This is the case with every single init out there, so you can't paint
that as a drawback of s6. You can wish it were easier, and I agree
that it would be nice, but the necessary trade-offs to make rebootless
init upgrades viable are very much not worth it.


>>  all-in-all has just less of a "screwdriver and duct tape" feel than
>>  a bunch of execline (or rc ;)) scripts.
>I am very sorry, but I do feel a strong smell of systemd mindset here :(

  A systemd mindset in an attempt to be a drop-in replacement for
sysvinit. Yeah, right.

  More seriously, you're being unfair, because you're not locked in
at all. You can use the new s6-linux-init and *still* do everything
you were doing before:
  - you can manually edit your run-image
  - you can remove the runleveld service (which is only used for
telinit emulation) and even the shutdownd service
  - you can write SIG scripts to do shutdowns the preferred way
  - I absolutely recommend against doing this, but you *still* have
a place in stage 1 where you can fiddle with things: in the
init script before the call to the s6-linux-init binary.

  So basically, all you're complaining about is that s6-linux-init-maker
is not generating your preferred run-image layout out-of-the-box
anymore. Well, you're an advanced user, you know what you are doing;
the knobs and levers are *still all there*. The only binary that
kinda hardcodes things is s6-linux-init itself, and if you give it a
try, I'm pretty sure you'll like it, because there was never any reason
to modify the core of stage 1 in the first place and what it does is
what any kind of stage 1 needs to do, no matter what language it's
written in.
  And if you don't like it, you're still free to ditch the s6-linux-init
package entirely and keep using your own stage 1.

  Besides, when systemd advocates paint sysv-rc shell scripts as
"duct tape", they're *right*. sysv-rc (and OpenRC) scripts are loaded
with boilerplate that only exists to compensate for the lack of a
supervision infrastructure, and systemd, like any supervision system,
does away with that. systemd has 99 problems, but rightly calling out
oversized script scaffoldings ain't one. Its disingenuousness lies in
pretending that an overengineered, opaque, all-encompassing, unescapable
framework is better than the duct tape; and I think you'll find that
s6-linux-init isn't quite the monster you seem to believe it is.

--
  Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  9:57     ` Laurent Bercot
@ 2021-01-29 14:33       ` Casper Ti. Vector
       [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
  1 sibling, 0 replies; 20+ messages in thread
From: Casper Ti. Vector @ 2021-01-29 14:33 UTC (permalink / raw)
  To: supervision

On Fri, Jan 29, 2021 at 09:57:43AM +0000, Laurent Bercot wrote:
>  It may cost more *to you*, but there is real and significant value
> in following existing interfaces that people are familiar with. Being
> able to just use "reboot" instead of the, uh, slightly less intuitive
> "s6-svscanctl -6 /run/service" to reboot your machine, is one fewer
> obstacle on the way to mainstream s6 adoption.

But even `s6-reboot' from older s6-linux-init, or `busybox reboot'
with slew can already do that...

>  Additionally, and maybe more to your liking, there are also technical
> benefits to never killing s6-svscan. Being able to assume that a
> supervision tree will be operational at all times, including during
> shutdown (and even in stage 4!), is really comfortable, it cuts down
> on a lot of specialcasing, it makes shutdown procedures recoverable,
> integration into various configurations easier (I'm thinking
> containers with or without a catch-all logger, for instance), and

There is some non-trivial trade-off: in short, the existence of the
supervision tree after stage 2 is by itself a kind of "special case"
(eg. search for "careful handling" in [1]).  I am also thinking about
an application scenario, where a supervision tree with a new s6 version
replaces the active tree with an old version.  This is somewhat silly:
it can be a little useful in case of major version bump, but is probably
better solved by complete reboot to completely get rid of all old things
(s6 or not, updated together) in the memory.

[1] <https://skarnet.org/software/s6/s6-svscan-1.html>.

> all-in-all has just less of a "screwdriver and duct tape" feel than
> a bunch of execline (or rc ;)) scripts.

I am very sorry, but I do feel a strong smell of systemd mindset here :(

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
@ 2021-01-29  9:57     ` Laurent Bercot
  2021-01-29 14:33       ` Casper Ti. Vector
       [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
  0 siblings, 2 replies; 20+ messages in thread
From: Laurent Bercot @ 2021-01-29  9:57 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>Currently I do not understand the `s6-linux-init-shutdown(d)' way
>well, so the old-fashioned way is retained at least for now, given its
>simplicity in implementation and seemingly better flexibility.  Frankly
>it is my intuition that the new way costs more than the old way, but
>does not provide that much in return.  (Feel free to prove me wrong.)

  It may cost more *to you*, but there is real and significant value
in following existing interfaces that people are familiar with. Being
able to just use "reboot" instead of the, uh, slightly less intuitive
"s6-svscanctl -6 /run/service" to reboot your machine, is one fewer
obstacle on the way to mainstream s6 adoption.

  Additionally, and maybe more to your liking, there are also technical
benefits to never killing s6-svscan. Being able to assume that a
supervision tree will be operational at all times, including during
shutdown (and even in stage 4!), is really comfortable, it cuts down
on a lot of specialcasing, it makes shutdown procedures recoverable,
integration into various configurations easier (I'm thinking
containers with or without a catch-all logger, for instance), and
all-in-all has just less of a "screwdriver and duct tape" feel than
a bunch of execline (or rc ;)) scripts.

--
  Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
@ 2021-01-29  9:36         ` Laurent Bercot
  0 siblings, 0 replies; 20+ messages in thread
From: Laurent Bercot @ 2021-01-29  9:36 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>Actually I do visit the CGit web interface fairly often

  Oh, my bad, the links in the skaware documents actually point to
https://git.skarnet.org/something. Fair enough then, I have made
git.skarnet.org an explicit alias to skarnet.org.


>  Perhaps I need to batch change all
><https://git.skarnet.org/> references in the UP2020 document to
><https://skarnet.org/>...

  No need - I'll own that one, and keep the alias explicitly working.
It's not like subdomains are a scarce resource.

--
  Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  1:41 ` Guillermo
@ 2021-01-29  3:06   ` Casper Ti. Vector
  2021-01-29 17:27     ` Guillermo
       [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
  1 sibling, 1 reply; 20+ messages in thread
From: Casper Ti. Vector @ 2021-01-29  3:06 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 10:41:24PM -0300, Guillermo wrote:
> Out of curiosity, do you have a reason for wanting to keep the
> "old-fashioned way"? Is it a goal of your project to depend on s6 and
> s6-rc, but not current s6-linux-init? It seems to me that doing so
> would be easier. It even looks like you could use the current
> /etc/slew/init/rc.{init,fin} scripts (perhaps with minor adjustments)
> as s6-linux-init's rc.init and rc.shutdown for slew, respectively.

Not using s6-linux-init has never been an explicit goal, but using
static scripts was a natural choice when s6-linux-init only provided
`s6-linux-init-maker', which produced scripts that were not that
flexible.

Currently I do not understand the `s6-linux-init-shutdown(d)' way
well, so the old-fashioned way is retained at least for now, given its
simplicity in implementation and seemingly better flexibility.  Frankly
it is my intuition that the new way costs more than the old way, but
does not provide that much in return.  (Feel free to prove me wrong.)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-29  0:07     ` Laurent Bercot
@ 2021-01-29  2:44       ` Casper Ti. Vector
       [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
  1 sibling, 0 replies; 20+ messages in thread
From: Casper Ti. Vector @ 2021-01-29  2:44 UTC (permalink / raw)
  To: supervision

On Fri, Jan 29, 2021 at 12:07:11AM +0000, Laurent Bercot wrote:
>  I may change it back, but I don't think the current state is broken,
> because you're not supposed to access git.skarnet.org via HTTP(S)! :P

Actually I do visit the CGit web interface fairly often, using it as
a poor man's GitHub workalike :)  Perhaps I need to batch change all
<https://git.skarnet.org/> references in the UP2020 document to
<https://skarnet.org/>...

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 10:08 Casper Ti. Vector
  2021-01-28 11:09 ` Casper Ti. Vector
@ 2021-01-29  1:41 ` Guillermo
  2021-01-29  3:06   ` Casper Ti. Vector
       [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
  1 sibling, 2 replies; 20+ messages in thread
From: Guillermo @ 2021-01-29  1:41 UTC (permalink / raw)
  To: Supervision

El jue, 28 ene 2021 a las 7:08, Casper Ti. Vector escribió:
>
> I did not actively follow the recent evolution of s6, and have just been
> bitten badly by s6 2.10.x on my Alpine servers (where slew [1] is used
> of course) when it comes along with other updates.
>
> [1] <https://gitea.com/CasperVector/slew>
> [...]
> it will be nice if the old-fashioned way (with stage 1 and stage 3 as
> static scripts) is supported as well [...]

Out of curiosity, do you have a reason for wanting to keep the
"old-fashioned way"? Is it a goal of your project to depend on s6 and
s6-rc, but not current s6-linux-init? It seems to me that doing so
would be easier. It even looks like you could use the current
/etc/slew/init/rc.{init,fin} scripts (perhaps with minor adjustments)
as s6-linux-init's rc.init and rc.shutdown for slew, respectively.

G.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
@ 2021-01-29  0:07     ` Laurent Bercot
  2021-01-29  2:44       ` Casper Ti. Vector
       [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
  0 siblings, 2 replies; 20+ messages in thread
From: Laurent Bercot @ 2021-01-29  0:07 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>BTW, <https://git.skarnet.org/> seems to be returning empty HTTP replies
>now; both <https://skarnet.org/> and <http://git.skarnet.org/> work as
>expected though.

  That is a side effect of a recent s6-networking addition, where
s6-tlsd passes the SNI server name to the application via an
environment variable. Which allows me to serve virtual hosts even with
a HTTP/1.0 server, but only under TLS. Fun experiment. :)

  I may change it back, but I don't think the current state is broken,
because you're not supposed to access git.skarnet.org via HTTP(S)! :P

--
  Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 17:21 ` Laurent Bercot
  2021-01-28 19:08   ` Roy Lanek
@ 2021-01-28 19:55   ` Casper Ti. Vector
       [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
  2 siblings, 0 replies; 20+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 19:55 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 05:21:59PM +0000, Laurent Bercot wrote:
>  There is no really good solution, and I prefer a short, sharp pain
> (when things break) followed by relief (when they're fixed) to a long
> dull ache (maintaining compat code).

I see.  I personally prefer to retain compat code if said code is so
small that it can hardly be incorrect, especially when the breakage
(like kernel panics) can be very severe.  Arguably a major stylistic
difference.

>  You seem to have found the proper way of managing this with SIG files,
> but just in case: "s6-svscanctl -tb" will net you the old behaviour.

Now I see; thanks.  I also realised that the revised `s6-svc -X'
proposal would result in wrong behaviour when there exists a `./finish'
script, because the supervisor would exit early (and prematurely).

BTW, <https://git.skarnet.org/> seems to be returning empty HTTP replies
now; both <https://skarnet.org/> and <http://git.skarnet.org/> work as
expected though.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 17:21 ` Laurent Bercot
@ 2021-01-28 19:08   ` Roy Lanek
  2021-01-28 19:55   ` Casper Ti. Vector
       [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
  2 siblings, 0 replies; 20+ messages in thread
From: Roy Lanek @ 2021-01-28 19:08 UTC (permalink / raw)
  To: supervision

> major version upgrades may break things.
As plain as day.

> I prefer a short, sharp pain (when things break) followed by
> relief (when they're fixed) to a long dull ache (maintaining
> compat code).

I could not agree more, it would also bring zero anyhow except
than extra, likely convoluted, code added which only increases
the risk of introducing new errors even after being removed.

/Roy Lanek (Yogyakarta)
-- 
555  5 l 4 c K W 4 r 3  L1NuX  555   air tenang menghanyutkan
555  5 l 4 c K W 4 r 3  L1NuX  555   still water runs deep

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
       [not found] <YBKNJEuGeYag91Q1@caspervector>
@ 2021-01-28 17:21 ` Laurent Bercot
  2021-01-28 19:08   ` Roy Lanek
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Laurent Bercot @ 2021-01-28 17:21 UTC (permalink / raw)
  To: supervision


>I did not actively follow the recent evolution of s6, and have just been
>bitten badly by s6 2.10.x on my Alpine servers (where slew [1] is used
>of course) when it comes along with other updates.

  Sorry. This bears repeating: major version upgrades may break things.

  Compatibility is a good thing, that's why I try to keep major version
changes few and far between; but the other side of the coin is that
when I'm doing one, I want to make use of it and cram all the
incompatible changes that may be needed in the foreseeable future.
  So, you have to pay attention less often, but when it happens, you do
have to pay attention. Previous major version changes may have gone
smoothly - I try to keep it as smooth as possible when there's no need
to break UX - but it's no guarantee that it will always be smooth
sailing. This time, there were very visible user changes; sorry for
the inconvenience, but I reserve the right to do this, and I try to
document the breaking changes in the release notes.

  It is, admittedly, a drawback of distributions that they make major
version upgrades very silent - so, if you have local software that
relies on an old API, and the distro updates it under your feet,
you're caught unaware. I don't have a satisfying solution to this;
maybe I should have added a post-upgrade file printing red blinking
bold text, but that doesn't address automated or afk updates.


>better if we kept the option supported for a transition period, and that
>only removed it from the manual pages while urging users to get rid of
>it.  After all, in this case, silently ignoring `-s' is behaviourly
>similar to (if not perfectly compatible with) old `s6-svscan'.

  It's always a delicate balance, because "better" is not 
one-dimensional.
It would be better UX, yes, definitely. But also legacy code to maintain
until the next major update (which can take a while), and I tend to
assign a *very* high cost to legacy code in s6-svscan and s6-supervise,
for obvious reasons. And in my experience, few people (and you,
Casper, certainly belong to them!) actually bother changing their
scripts as long as they keep working - most only spring into action when
something breaks. A compromise I've found relatively efficient was to
add nagging warnings on deprecated option use, but 1. that's even more
code that will be removed, and 2. I hate nagware, with a passion, in
all its forms.
  There is no really good solution, and I prefer a short, sharp pain
(when things break) followed by relief (when they're fixed) to a long
dull ache (maintaining compat code). Especially when I'm not the one
experiencing the sharp pain ;)


>Second, `s6-svscan' now waits for its `s6-supervise' children to exit
>before exec()ing `.s6-svscan/finish'

  You seem to have found the proper way of managing this with SIG files,
but just in case: "s6-svscanctl -tb" will net you the old behaviour.

--
  Laurent


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 11:09 ` Casper Ti. Vector
@ 2021-01-28 14:05   ` Casper Ti. Vector
  0 siblings, 0 replies; 20+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 14:05 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 07:09:08PM +0800, Casper Ti. Vector wrote:
> Moreover, `.s6-svscan/finish' (linked to `rc.halt') will still need its
> $1 set to `reboot', `halt' or `poweroff' by `s6-svscan' on exec().

I did not realise the great simplification to the command line options
of `s6-svscanctl' would not have been possible if s6-svscan(ctl) need
to, for example, know about halt, poweroff and reboot.  Here I retract
the quoted statement; instead, I will rework the mechanism around
`.s6-svscan/SIG*' in slew, and yet attempt to make the behaviour
mostly backwards compatible.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Some suggestions on old-fashioned usage with s6 2.10.x
  2021-01-28 10:08 Casper Ti. Vector
@ 2021-01-28 11:09 ` Casper Ti. Vector
  2021-01-28 14:05   ` Casper Ti. Vector
  2021-01-29  1:41 ` Guillermo
  1 sibling, 1 reply; 20+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 11:09 UTC (permalink / raw)
  To: supervision

On Thu, Jan 28, 2021 at 06:08:36PM +0800, Casper Ti. Vector wrote:
> then move the `s6-svc -X' invocation from `rc.halt' into `rc.fin'

The `s6-svc -a' invocation in `rc.halt' needs to be moved accordingly.
Moreover, `.s6-svscan/finish' (linked to `rc.halt') will still need its
$1 set to `reboot', `halt' or `poweroff' by `s6-svscan' on exec().

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Some suggestions on old-fashioned usage with s6 2.10.x
@ 2021-01-28 10:08 Casper Ti. Vector
  2021-01-28 11:09 ` Casper Ti. Vector
  2021-01-29  1:41 ` Guillermo
  0 siblings, 2 replies; 20+ messages in thread
From: Casper Ti. Vector @ 2021-01-28 10:08 UTC (permalink / raw)
  To: supervision

I did not actively follow the recent evolution of s6, and have just been
bitten badly by s6 2.10.x on my Alpine servers (where slew [1] is used
of course) when it comes along with other updates.

[1] <https://gitea.com/CasperVector/slew>

First, kernel panic on booting.  With some tentative echo(1) invocations
(with I/O redirections to /dev/console when necessary) and messing with
console resolution (so I could see the outputs before the panic), I
found the problem occurred with `s6-svscan' exiting because of the
legacy `-s' option in [2].  The fix itself is trivial, but it would be
better if we kept the option supported for a transition period, and that
only removed it from the manual pages while urging users to get rid of
it.  After all, in this case, silently ignoring `-s' is behaviourly
similar to (if not perfectly compatible with) old `s6-svscan'.

[2] <https://gitea.com/CasperVector/slew/src/commit/
             fe32c2f1e3bf5cf700ff99d13eb13720353823bb/init/rc.boot>

Second, `s6-svscan' now waits for its `s6-supervise' children to exit
before exec()ing `.s6-svscan/finish', so it hangs forever (save for
magic SysRq) due to the catch-all logger on halting.  I do know that the
recommended way to shut down is to use `s6-linux-init-shutdown', but
it will be nice if the old-fashioned way (with stage 1 and stage 3 as
static scripts) is supported as well after minimal modifications to both
s6 and (for instance) slew.  I also understand that `s6-svc -X' has been
removed, and that the invocation in [3] would no longer work anyway
because [3] is exec()ed by `s6-svscan'.  However, I think the following
way is practical yet minimal: introduce an option (perhaps still `-X')
of `s6-svc', but that tells `s6-supervise' to exit normally *upon
receiving SIGTERM or SIGHUP* (this is where the behaviour differs from
the old `s6-svc -X') without waiting for the children to exit; then move
the `s6-svc -X' invocation from `rc.halt' into `rc.fin' (where `s6-rc -d
change all' is also spawn).

[3] <https://gitea.com/CasperVector/slew/src/commit/
             fe32c2f1e3bf5cf700ff99d13eb13720353823bb/init/rc.halt>

Any suggestions?

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-02-16  8:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-15 14:58 Some suggestions on old-fashioned usage with s6 2.10.x Laurent Bercot
2021-02-15 14:59 ` Laurent Bercot
     [not found] <YBKNJEuGeYag91Q1@caspervector>
2021-01-28 17:21 ` Laurent Bercot
2021-01-28 19:08   ` Roy Lanek
2021-01-28 19:55   ` Casper Ti. Vector
     [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
2021-01-29  0:07     ` Laurent Bercot
2021-01-29  2:44       ` Casper Ti. Vector
     [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
2021-01-29  9:36         ` Laurent Bercot
  -- strict thread matches above, loose matches on Subject: below --
2021-01-28 10:08 Casper Ti. Vector
2021-01-28 11:09 ` Casper Ti. Vector
2021-01-28 14:05   ` Casper Ti. Vector
2021-01-29  1:41 ` Guillermo
2021-01-29  3:06   ` Casper Ti. Vector
2021-01-29 17:27     ` Guillermo
2021-01-29 17:39       ` Guillermo
     [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
2021-01-29  9:57     ` Laurent Bercot
2021-01-29 14:33       ` Casper Ti. Vector
     [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
2021-01-29 15:48         ` Laurent Bercot
2021-02-15  8:36           ` Casper Ti. Vector
     [not found]           ` <YCoykUYGXVt+BAT9@caspervector>
     [not found]             ` <em949fd937-c7bc-43db-9b49-3cc235b8f2ad@elzian>
2021-02-16  8:53               ` Casper Ti. Vector

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).