From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29025 invoked from network); 1 Sep 2020 10:00:28 -0000 Received: from alyss.skarnet.org (95.142.172.232) by inbox.vuxu.org with ESMTPUTF8; 1 Sep 2020 10:00:28 -0000 Received: (qmail 14180 invoked by uid 89); 1 Sep 2020 10:00:50 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Received: (qmail 14173 invoked from network); 1 Sep 2020 10:00:49 -0000 From: "Laurent Bercot" To: Supervision Subject: possible s6-rc redesign (was: [request for review] Port of s6 documentation to mdoc(7)) Date: Tue, 01 Sep 2020 10:00:22 +0000 Message-Id: In-Reply-To: References: <877dtgtu1z.fsf@ada> Reply-To: "Laurent Bercot" User-Agent: eM_Client/8.0.3385.0 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduiedrudefjedgvdefucetufdoteggodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgpdfqfgfvnecuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkfgjfhhrfgggtgfgsehtqhertddtreejnecuhfhrohhmpedfnfgruhhrvghnthcuuegvrhgtohhtfdcuoehskhgrqdhsuhhpvghrvhhishhiohhnsehskhgrrhhnvghtrdhorhhgqeenucggtffrrghtthgvrhhnpedtfeekueetieektefhveeuveevtdekgeefhfeutedvieeiheduueeiudehvdetveenucffohhmrghinhepshhkrghrnhgvthdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht >I have only seen one new feature committed to the Git repository so >far. Is it too early to ask what are you planning to change? The new feature is orthogonal - or, rather, it will be used if I end up *not* redesigning s6-rc. The trend with distributions is to make service managers reactive to external events: typically NetworkManager and systemd-networkd because the network is the primary source of dynamic events, but even local events such as the ones produced by a device manager, or basically anything sent by the kernel on the netlink, are getting integrated into that model. s6-rc is, by essence, static: the set of services is known in advance, and there is no reacting to external events - there is only the admin starting and stopping services. This has advantages - a compile-time analysis is possible, with early cycle detection, etc.; but the model doesn't integrate well with modern distro needs. So, I've been thinking about ways to add dynamic event management to s6-rc; and I've found two options. Option 1 is to add dynamic event management *on top of* s6-rc. That is my natural instinct; that is what I've always done with software, that's what keeps the various parts of my software as clean and simple as possible. Here, it would mean: - having a classic s6-rc database for "static" services - having an additional "dynamic" database for services that can be triggered by external events. (The database is static in essence, but I call it "dynamic" because it would host the services that can be started dynamically.) - having a s6-rc-eventd daemon listening to events and executing s6-rc commands on the dynamic database depending on the events it receives. Paired with a s6-rc-event program that sends events to s6-rc-eventd, meant to be invoked in situations such as udevd/mdevd rules, a netlink listener, etc. This model works in my head, the s6-rc-event[d] programs would be quite simple to write, it would solve the problem in a modular way just like the skarnet.org usual, so it seems like a no-brainer. Except for one thing: I don't think anybody would use this. Only me, you, and the other 6 hardcore people in the world who actually like this kind of design. If there's one thing that has been made painfully obvious to me these past few years, it is that most people, and especially most *distro* people - which are the ones I would like to reach -, perceive the s6 stack as very complex. They're intimidated by it; they find the abundance of moving parts off-putting and difficult to get into. With very few exceptions, the people who actually take the plunge and make the time and energy investment necessary to understand the model, what the various parts do and how they fit together, those people all love it, and are very enthusiastic about it, and they're what keeps me going. But the initial barrier of potential, the ultra-steep learning curve, is indisputably the limiting factor in the spread of the s6 ecosystem. s6 as a supervision suite? okay, people will use it; but it's already perceived as a bit complex, because there are a lot of binaries. It's on the high end of the acceptable difficulty range. s6 as an init system? "what is this s6-linux-init thing? why do I need this? runit is simpler, I'll stick to runit." Even though runit has problems, has less functionality, and is barely maintained. There are, for instance, several people in Void Linux who are interested in switching to s6, but despite s6 being an almost drop-in replacement for runit, the switch has not been made, because it requires learning s6 and s6-linux-init, and most Void people do not feel the effort is worth it. s6-rc? "waah I don't like the source directory format, I want text files, and why is it so different from 'service foo start'? And why doesn't it come with integrated policy like OpenRC or systemd?" People understand the benefit in separating mechanism from policy, in theory, but in practice nobody wants to write policy. (Can't blame them: I find it super boring, too.) Having the tool is not enough; it needs to be gift-wrapped as well, it needs to be nice to use. If I add a s6-rc-event family of binaries to s6-rc, the fact that it is yet another layer of functionality, that you now need *two* databases, etc., will make a whole additional category of people just give up. The outreach will be, mark my words, *zero*. If not negative. The fact is that a full-featured init system *is* a complex beast, and the s6 stack does nothing more than is strictly needed, but it exposes all the tools, all the entrails, all the workings of the system, and that is a lot for non-specialists to handle. Integrated init systems, such as systemd, are significantly *more* complex than the s6 stack, but they do a better job of *hiding* the complexity, and presenting a relatively simple interface. That is why, despite being technically inferior (on numerous metrics: bugs, length of code paths, resource consumption, actual modularity, flexibility, portability, etc.), they are more easily accepted: they are just less intimidating. As a friend told me, and it was an enlightening moment: you are keeping the individual parts simple, but in doing so, you are moving the complexity to the *interactions* between the parts, and are burdening the user with that complexity. You are keeping the code simple, which has undeniable maintainability benefits, but you are making the administration more difficult, and the trade-off is not good enough for a lot of users. For a while, my answer to that has been: this is all an interface problem. I need to work on s6-frontend, in order to provide a unified, user-friendly interface; then, people who want simplicity can use the high-level interface, and advanced users can lift the hood and manually tweak the engine. I still believe that is a good model and a good idea. However, having worked for a couple months on a user-friendly interface for service management with s6-rc that could be a prototype for a part of s6-frontend, and having started to think about details of s6-frontend, I've come to realize that shrinkwrapping the s6 ecosystem as it is today *will already be pretty hard*, and a lot, and I mean a lot, of work is going to go into that interface. And adding more moving parts in the engine will require even more work for the interface to control those moving parts. We're reaching levels of kitchensinkery I'm not comfortable with. In the end, what risks happening? A neat, slick, thrifty engine, with a lot of knobs, and a big fat complex interface on top of it - and unless you're a specialist, you *need* the interface, because there are so many knobs that you otherwise need a degree to understand what everything does. And what good is it to have such a satisfying engine if you can't use it without a thick layer of bloat? So, I think my software design needs to be rebalanced, and complexity needs to be spread more evenly. I'm certainly not going to write monoprocess behemoths, that's not what I do, but I need to stop yolo adding small binaries to address some functionality and say "there you go, here's the mechanism, how to use it is left as an exercise to the reader." Which is exactly what would happen with s6-rc-eventd. So, option 2 is to take a step back and say: a service manager is one (complex) functionality, and if I want a full-fledged service manager, I need to design it as such from the beginning, instead of having a static service manager with a program to handle dynamic stuff added next to it as an afterthought and the complexity needing to be managed by users or by s6-frontend. And that means a s6-rc redesign. I haven't made a decision yet: I'm in the process of *exploring* what a s6-rc redesign would look like. But so far, this is what I think a full service manager should do: - Be similar in concept to Upstart. The Upstart implementation is bad, but the fundational ideas are actually quite good. * That means: event-based, transitions are triggered by events, and events can have several sources: a transition finishing, but also external events such as ones coming from a network manager, or internal events coming from the daemon itself (this is necessary for Upstart because it's an init system, I don't think it is necessary for a pure service manager). - Perform as much static analysis and upfront checking as possible,=20 just like the current s6-rc. I would like to keep the same level of=20 guarantees for a fully static set of services, and ideally be able to offer some guarantees as well for dynamic ones, although it's obviously impossible to do a full analysis for them. - Support disjunctions in service trigger conditions! If I'm going to rewrite the engine, might as well allow for alternatives without forcing the user to recompile a database. - Support instances. After a lot of brainstorming and several attempts, I've been unable to find a good way to add instantiation to the current s6-rc model. If we want instantiation, it definitely needs to be a part of service manager design from the start, so this would be the opportunity. So here you are. In the weeks to come, I'll keep thinking about the details of option 2, and build an outline of the various necessary=20 parts. And eventually, if I think I can write this, with all the functionality, while still sticking to my standards of code simplicity, then it's what I'll do. If not, and in particular, if I can't get all the static analysis guarantees that I want, then I'll just go with option 1, which will do a decent job for a lot less work but will definitely not help the perception of the s6 ecosystem by normal people. -- Laurent