From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 31460 invoked from network); 31 Jan 2021 20:52:20 -0000 Received: from alyss.skarnet.org (95.142.172.232) by inbox.vuxu.org with ESMTPUTF8; 31 Jan 2021 20:52:20 -0000 Received: (qmail 1407 invoked by uid 89); 31 Jan 2021 20:52:41 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Received: (qmail 1398 invoked from network); 31 Jan 2021 20:52:41 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1612126334; bh=V3wHc26jR6MxzAnIfb1Ke8LoqwSUa86hK2JDm7R79eM=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:References:In-Reply-To; b=qquHmjkkzX5eTUnyjYXMAF7QoHePh/9jj7XXklSeH4oFGHlzNeLB22MYjB2G4b808 7HrlVHackVw2a41LsfDYEC20EsaurbUW/MHFxVYzLaKckPcSTpwP3/iZOlfB49PShX 5jqlqAOsNOw3HQgLDLytMoVS26AhnFr9+GXEqPg8= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Date: Sun, 31 Jan 2021 21:51:55 +0100 From: Stefan Karrmann To: Laurent Bercot Cc: supervision@list.skarnet.org Subject: Re: stage2 as a service Message-ID: <20210131205155.GA26069@web.de> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Provags-ID: V03:K1:p0G2VC4Oc78LmCMJbPhhPw0lK0lQRquRvWdYct5RNAuSB+1rA3X 1cdQFGmv0DVv8xjWuSdOyNMq/ttuP3012kpQgxY0ck/sW/D8ZAFwt4qH8EB/gvm8XKPNmRM vqm+tjGlsjN5EvB66ioW/XwrOsWe406GDiOMJPLbLHOIQis5FKPUGeJN0nnb80SMDcwkPZ2 k8yySk5DztVyR1Y8A3PrQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:s4GDIMVevGc=:mrC+tbJI3nNE0xaNj4J/e1 pdTWyWwmdD9vKnK6toTy650XuNw6/kSlhNky9oJE1LPhndvK2HgiahrO32FQjT3E/5MrwsNW+ /Z3GqpKPJg3bjOP78oHc72x3FhmB5uBKbXoxSfaamDyDj2HByVZoT0WrntNOy1df9pxcskErv FaH5YchCdyu1gzTwzzXYwqVw6wyyde10ZJI7w3IhkWsXB3F4J1tTkKaJr2KFR0E4131aiuOSw HVSwoBEB7G41B/Ea16amzBIEPE0heEsz9ZQHeL0lYhyUHPAaMiDQxY+TEo09TyEeUFLpAgleq zlPrcRIixSlQeECjnkUVtHHAyM7CIO5dsGh+8RLseKHtn+KhXZiVEPoBK3QLUdQN2AY1P5KBx rgEfcJyRQrYBkUcfe5h/rr/KR0crcftIlZiFhWMDmQIUM3cUSN8o+53G6IB9/bP+6wRF82SKn xA5+KtguXvxOqb9gfFLtDY5UToZCa9pZTBQ835PsgIDZe7gwWdIE15Tg1awOIENEiKb9LYzvN 7Im2Zu7fOARywXtZlKL2iMP3maXExjj/o2S3W6c6n3JY/AuefOXNEH1Ivf7DfZ1KqEM8Ye+EL eWqCNe7jj/Go6tiO6NvTBPTT/466rnH19V1oVgUIpZtnDqZMVEFS7Jwmj/yXtVEKyRt6MpXN1 GoWuQxCxi9CLyGuBWNP5mG4O3i5Hjquttt0f+lAkc0uuD9FFj4hkuAz4DHqm4fK2GaDC+VYaY Uy+WSzGQweLSpSIj5mRLfMqvgApYUOCLMNOEYo7PAu63Fi+fGzjxKiD8ewwyrdXtGAfg+q6yD 3TG6BYHs6dq4w44+CsmzsA3rbf3cCpnlj0Cz9GcV0iLGs7MiCn6Ehi2X0GgMDBqgso+ZKfox8 1jjSq255JCfiUvsw557w== Content-Transfer-Encoding: quoted-printable Hi Laurent, Laurent Bercot @ 2021-01-31.10:25:22 +0000: > Hi Stefan, > Long time no see! Yes, but still known. I'm impressed! > A few comments: > > > # optional: -- Question: Is this necessary? > > redirfd -w 0 ${SCANDIR}/service/s6-svscan-log/fifo > > # now the catch all logger runs > > fdclose 0 > > I'm not sure what you're trying to do here. The catch-all logger > should be automatically unblocked when > ${SCANDIR}/service/s6-svscan-log/run starts. Yes, that's the idea. > The fifo trick should not be visible at all in stage 2: by the time > stage 2 is running, everything is clean and no trickery should take > place. The point of the fifo trick is to make the supervision tree > log to a service that is part of the same supervision tree; but once > the tree has started, no sleight of hand is required. For the normal case you are absolutly right. But with stage 2 as a service you have a race condition between stage 2 and s6-svscan-log. The usual trick for stage 2 solves this problem. > > foreground { s6-svc -O . } # don't restart me > > If you have to do this, it is the first sign that you're abusing > the supervision pattern; see below. Well, running once is a part of supervise from the start on, by djb. It's invented for oneshots. > > foreground { s6-rc -l ${LIVEDIR}/live -t 10000 change ${RCDEFAULT} } > > # notify s6-supervise: > > fdmove 1 3 > > foreground { echo "s6-rc ready, stage 2 is up." } > > fdclose 1 # -- Question: Is this necessary? > > It's not strictly necessary to close the fd after notifying readiness, > but it's a good idea nonetheless since the fd is unusable afterwards. > However, readiness notification is only useful when your service is > actually providing a... service once it's ready; here, your "service" > dies immediately, and is not restarted. You are right. > That's because it's really a oneshot Yes, as implemented since djb's daemontools. > that you're treating as a longrun, which is abusing the pattern. > > > > # NB: shutdown should create ./down here, to avoid race conditions > > And here is the final proof: in order to make your architecture work, > you have to *fight* supervision features, because they are getting in > your way instead of helping you. Well, s6-rc is using ./down, too. The shutdown is a very special case for supervision. > This shows that it's really not a good idea to run stage 2 as a > supervised service. Stage 2 is really a one-time initialization script > that should be run after the supervision tree is started, but *not* > supervised. Stage 2 as a service allows us to restart it, if - accidentally - it is necessary. Obviously, that should be really seldom the case. > > { # fallback login > > sulogin --force -t 600 # timeout 600 seconds, i.e. 10 minutes. > > # kernel panic > > } > > Your need for sulogin here comes from the fact that you're doing quite > complex operations in stage 1: a user-defined set of hooks, then > several filesystem mounts, then another user-defined set of hooks. > And even then, you're running those in foreground blocks, so you're > not catching the errors; the only time your fallback activates is if > the cp -a from ${REPO} fails. Was that intended? No, I should replace foreground by if. Well, actually I don't use the hooks. But distribution maintainers often wants such things. E.g. they can scan for mapped devices (raid, lvm, crypt). On the other hand, I know no distribution which uses Paul Jarc's /fs/*. > In any case, that's a lot of error-prone work that could be done in > stage 2 instead. If you keep stage 1 as barebones as possible (and > only mount one single writable filesystem for the service directories) > you should be able to do away with sulogin entirely. sulogin is a > horrible hack that was only written because sysvinit is complex enough > that it needs a special debugging tool if something breaks in the > middle. Reasonable. I mount only /run and /var, because the log, even the catch-all-log resides in /var/log/. > With an s6-based init, it's not the case. Ideally, any failure that > happens before your early getty is running can only be serious enough > that you have to init=3D/bin/sh anyway. And for everything else, you hav= e > your early getty. No need for special tools. Okay, thats resonable and simpler. > > Also I may switch to s6-linux-init finally. > > It should definitely spare you a lot of work. That's what it's for :) I'm still migrating from systemd to s6{,-rc} with /fs/* step by step. Therfore, I need more flexibility than s6-linux-init. > -- > Laurent Kind regards, =2D- Stefan Karrmann