Re: stage2 as a service - Stefan Karrmann

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

From: Stefan Karrmann <S.Karrmann@web.de>
To: Laurent Bercot <ska-supervision@skarnet.org>
Cc: supervision@list.skarnet.org
Subject: Re: stage2 as a service
Date: Sun, 31 Jan 2021 21:51:55 +0100	[thread overview]
Message-ID: <20210131205155.GA26069@web.de> (raw)
In-Reply-To: <em16c497f5-cdc4-49bb-9b61-ae39745c4b6f@elzian>

Hi Laurent,

Laurent Bercot @ 2021-01-31.10:25:22 +0000:
>  Hi Stefan,
>  Long time no see!

Yes, but still known. I'm impressed!

>  A few comments:
>
> > # optional:  -- Question: Is this necessary?
> >   redirfd -w 0 ${SCANDIR}/service/s6-svscan-log/fifo
> >   # now the catch all logger runs
> >   fdclose 0
>
>  I'm not sure what you're trying to do here. The catch-all logger
> should be automatically unblocked when
> ${SCANDIR}/service/s6-svscan-log/run starts.

Yes, that's the idea.

>  The fifo trick should not be visible at all in stage 2: by the time
> stage 2 is running, everything is clean and no trickery should take
> place. The point of the fifo trick is to make the supervision tree
> log to a service that is part of the same supervision tree; but once
> the tree has started, no sleight of hand is required.

For the normal case you are absolutly right. But with stage 2 as a service
you have a race condition between stage 2 and s6-svscan-log. The usual
trick for stage 2 solves this problem.

> > foreground { s6-svc -O . } # don't restart me
>
>  If you have to do this, it is the first sign that you're abusing
> the supervision pattern; see below.

Well, running once is a part of supervise from the start on, by djb. It's
invented for oneshots.

> > foreground { s6-rc -l ${LIVEDIR}/live -t 10000 change ${RCDEFAULT} }
> > # notify s6-supervise:
> > fdmove 1 3
> > foreground { echo "s6-rc ready, stage 2 is up." }
> > fdclose 1  # -- Question: Is this necessary?
>
>  It's not strictly necessary to close the fd after notifying readiness,
> but it's a good idea nonetheless since the fd is unusable afterwards.
> However, readiness notification is only useful when your service is
> actually providing a... service once it's ready; here, your "service"
> dies immediately, and is not restarted.

You are right.

> That's because it's really a oneshot

Yes, as implemented since djb's daemontools.

> that you're treating as a longrun, which is abusing the pattern.
>
>
> > # NB: shutdown should create ./down here, to avoid race conditions
>
>  And here is the final proof: in order to make your architecture work,
> you have to *fight* supervision features, because they are getting in
> your way instead of helping you.

Well, s6-rc is using ./down, too. The shutdown is a very special case for
supervision.

>  This shows that it's really not a good idea to run stage 2 as a
> supervised service. Stage 2 is really a one-time initialization script
> that should be run after the supervision tree is started, but *not*
> supervised.

Stage 2 as a service allows us to restart it, if - accidentally - it is
necessary. Obviously, that should be really seldom the case.

> >   { # fallback login
> >     sulogin --force -t 600 # timeout 600 seconds, i.e. 10 minutes.
> >     # kernel panic
> >   }
>
>  Your need for sulogin here comes from the fact that you're doing quite
> complex operations in stage 1: a user-defined set of hooks, then
> several filesystem mounts, then another user-defined set of hooks.
> And even then, you're running those in foreground blocks, so you're
> not catching the errors; the only time your fallback activates is if
> the cp -a from ${REPO} fails. Was that intended?

No, I should replace foreground by if.

Well, actually I don't use the hooks. But distribution maintainers often
wants such things. E.g. they can scan for mapped devices (raid, lvm,
crypt). On the other hand, I know no distribution which uses Paul Jarc's
/fs/*.

>  In any case, that's a lot of error-prone work that could be done in
> stage 2 instead. If you keep stage 1 as barebones as possible (and
> only mount one single writable filesystem for the service directories)
> you should be able to do away with sulogin entirely. sulogin is a
> horrible hack that was only written because sysvinit is complex enough
> that it needs a special debugging tool if something breaks in the
> middle.

Reasonable. I mount only /run and /var, because the log, even the
catch-all-log resides in /var/log/.

>  With an s6-based init, it's not the case. Ideally, any failure that
> happens before your early getty is running can only be serious enough
> that you have to init=/bin/sh anyway. And for everything else, you have
> your early getty. No need for special tools.

Okay, thats resonable and simpler.

> > Also I may switch to s6-linux-init finally.
>
>  It should definitely spare you a lot of work. That's what it's for :)

I'm still migrating from systemd to s6{,-rc} with /fs/* step by step.
Therfore, I need more flexibility than s6-linux-init.

> --
>  Laurent

Kind regards,
--
Stefan Karrmann

next prev parent reply	other threads:[~2021-01-31 20:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-28 10:08 Some suggestions on old-fashioned usage with s6 2.10.x Casper Ti. Vector
2021-01-28 11:09 ` Casper Ti. Vector
2021-01-28 14:05   ` Casper Ti. Vector
2021-01-29  1:41 ` Guillermo
2021-01-29  3:06   ` Casper Ti. Vector
2021-01-29 17:27     ` Guillermo
2021-01-29 17:39       ` Guillermo
     [not found]   ` <YBN7zfp/MmbcHOCF@caspervector>
2021-01-29  9:57     ` Laurent Bercot
2021-01-29 14:33       ` Casper Ti. Vector
     [not found]       ` <YBQcwHN1L/N2dedx@caspervector>
2021-01-29 15:48         ` Laurent Bercot
2021-01-31  7:49           ` stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x] s.karrmann
2021-01-31 10:25             ` Laurent Bercot
2021-01-31 20:51               ` Stefan Karrmann [this message]
2021-02-01 10:35                 ` stage2 as a service Laurent Bercot
2021-02-15  8:36           ` Some suggestions on old-fashioned usage with s6 2.10.x Casper Ti. Vector
     [not found]           ` <YCoykUYGXVt+BAT9@caspervector>
     [not found]             ` <em949fd937-c7bc-43db-9b49-3cc235b8f2ad@elzian>
2021-02-16  8:53               ` Casper Ti. Vector
     [not found] <YBKNJEuGeYag91Q1@caspervector>
2021-01-28 17:21 ` Laurent Bercot
2021-01-28 19:08   ` Roy Lanek
2021-01-28 19:55   ` Casper Ti. Vector
     [not found]   ` <YBMWuUCUTVjUNinQ@caspervector>
2021-01-29  0:07     ` Laurent Bercot
2021-01-29  2:44       ` Casper Ti. Vector
     [not found]       ` <YBN2p2UkIiP8lMQy@caspervector>
2021-01-29  9:36         ` Laurent Bercot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210131205155.GA26069@web.de \
    --to=s.karrmann@web.de \
    --cc=ska-supervision@skarnet.org \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).