From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2598 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Laurent Bercot" Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: interesting claims Date: Thu, 16 May 2019 08:32:08 +0000 Message-ID: References: <11997211556565598@myt6-27270b78ac4f.qloud-c.yandex.net> <20190501033355.6e41e707@mydesk.domain.cxm> <20190515132206.03f9736e@mydesk.domain.cxm> <20190516012214.15ffcf2e@dickeberta> <20190515210717.27b002ba@mydesk.domain.cxm> Reply-To: "Laurent Bercot" Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="15593"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: eM_Client/7.2.34711.0 To: "supervision@list.skarnet.org" Original-X-From: supervision-return-2188-gcsg-supervision=m.gmane.org@list.skarnet.org Thu May 16 10:31:27 2019 Return-path: Envelope-to: gcsg-supervision@m.gmane.org Original-Received: from alyss.skarnet.org ([95.142.172.232]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hRBnS-0003r8-4v for gcsg-supervision@m.gmane.org; Thu, 16 May 2019 10:31:26 +0200 Original-Received: (qmail 24202 invoked by uid 89); 16 May 2019 08:31:51 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Original-Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Original-Received: (qmail 24195 invoked from network); 16 May 2019 08:31:51 -0000 In-Reply-To: Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2598 Archived-At: >The Question: As a newbie outsider I wonder, after following the >discussion of supervision and tasks on stages (1,2,3), that there is a >restrictive linear progression that prevents reversal. In terms of pid1 >that I may not totally understand, is there a way that an admin can >reduce the system back to pid1 and restart processes instead of taking >the system down and restarting? If a glitch is found, usually it is >corrected and we find it simple to just do a reboot. What if you can >fix the problem and do it on the fly. The question would be why (or why >not), and I am not sure I can answer it, but if you theoretically can do >so, then can you also kill pid2 while pid10 is still running. With my >limited vision I see stages as one-way check valves in a series of fluid >linear flow. I'm not sure I understand your question, but I think there are really two different questions here; I'll try to reformulate them, correct me if I'm wrong. 1. Is booting a system a linear process where every step is reversible? 2. Is it possible to restart a system "from scratch" without rebooting? The answer to both questions is "not really, but it doesn't matter". We've been talking a lot about stages 1, 2 and 3 (and sometimes 4) lately because I've been working on s6-linux-init, which focuses on booting and especially on stage 1. But it's a very narrow, very specific thing to focus on. Stage 1 is a critical part of the booting process, obviously, and has to be done right, but once it is, you can basically forget about it. Most of the machine's lifetime, including most of the booting sequence, happens in stage 2. Stage 1 is just early preparation, the very basic minimum things you should be able to assume, such as "there is a supervision tree running and I can add services to it"; for all intents and purposes, stage 2 is where you will be working, even if your focus is to bring the machine up, e.g. if you're writing a service manager. Stage 1 isn't reversible; once it's done, you never touch it again, you don't need to "reverse" it. It would be akin to also unloading the kernel from memory before shutting down - it's just not necessary. Stage 2 is where things happen. But what happens in stage 2 isn't really reversible either: there is still a certain amount of one-time initialization that needs to be done at boot time and doesn't need to be undone at shutdown time. Booting and shutting down can be made symmetric up to a point, but never entirely; the most obvious example is mounting filesystems. There is a point in the boot sequence where the filesystems are mounted; however, *unmounting* filesystems cannot be done at the symmetrical point in the shutdown sequence - it has to be done at the very end of the boot sequence, in stage 4, right before the power goes off. Why? Because during shutdown, you may still have user processes running, that prevent filesystems from being unmounted, so you can only unmount filesystems after killing everything, which happens at the end. Whereas during the boot sequence, you don't have random user processes yet, you have a much more controlled environment. Booting and shutting down can't be made 100% symmetric. But that's not a problem, because *symmetry is not a goal*. The goal of the boot sequence is to make the machine operational; the goal of the shutdown sequence is to make sure the plug can be pulled without causing problems. Symmetry makes sense in a service manager, because it helps to see a service as being "up" or "down", and there is a hierarchy of dependencies between services that make it natural to bring services "up" or "down" in a certain, reversible order. But service management isn't all there is, and in the bigger picture, a machine's lifetime isn't perfectly symmetrical. And that's okay. As for restarting a system from scratch without rebooting, the question is what you want to achieve. - If you want to be able to go through the whole shutdown procedure with bringing down services etc. but *not* the actual hardware reboot, and bringing up the whole system again from pid 1, yes, it is theoretically possible, but not particularly useful. The shutdown procedure is designed to make the system ready for poweroff, and it's quite a waste if you're not going to poweroff. The boot procedure is designed to get the system from a just-powered-on state to a fully operational state, and it's also quite a waste if the system is already fully operational. There aren't many problems which doing this is the right solution to. - If you want to kill every process but pid 1 and have the system reconstruct itself from there, then yes, it is possible, and that is the whole point of having a supervision tree rooted in pid 1. When you kill every process, the supervision tree respawns, so you always have a certain set of services running, and the system can always recover from whatever you throw at it. Try it: grab a machine with a supervision tree and a root shell, run "kill -9 -1", see what happens. -- Laurent