From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2499 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Laurent Bercot" Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: s6 bites noob Date: Mon, 04 Feb 2019 13:52:11 +0000 Message-ID: References: Reply-To: "Laurent Bercot" Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="248399"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: eM_Client/7.2.33939.0 To: "supervision@list.skarnet.org" Original-X-From: supervision-return-2089-gcsg-supervision=m.gmane.org@list.skarnet.org Mon Feb 04 14:52:16 2019 Return-path: Envelope-to: gcsg-supervision@m.gmane.org Original-Received: from alyss.skarnet.org ([95.142.172.232]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gqefY-0012PD-0W for gcsg-supervision@m.gmane.org; Mon, 04 Feb 2019 14:52:16 +0100 Original-Received: (qmail 13045 invoked by uid 89); 4 Feb 2019 13:52:38 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Original-Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 13038 invoked from network); 4 Feb 2019 13:52:38 -0000 In-Reply-To: X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedtledrkeeggdehkecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgpdfqfgfvnecuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkfgjfhhrfgggtgfgsehtqhertddtreejnecuhfhrohhmpedfnfgruhhrvghnthcuuegvrhgtohhtfdcuoehskhgrqdhsuhhpvghrvhhishhiohhnsehskhgrrhhnvghtrdhorhhgqeenucfrrghrrghmpehmohguvgepshhmthhpohhuthenucevlhhushhtvghrufhiiigvpedt Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2499 Archived-At: >But run not existing when supervise starts is a different case from run di= sappearing after supervise is already running. No, it's not. First, s6-supervise starts and initializes the state of the service to "down, wanted up" (unless there's a ./down file in which case the service isn't wanted up). Then, s6-supervise enters its main loop, where it sees that the service is down + wanted up, so it tries to start it. If there is no run file, it's a temporary failure. No matter whether or not it's the first time it tries. Adding a special case where s6-supervise aborts when it tries to start ./run for the first time would make the code more complex, especially when you try to answer questions such as "what do I do if there is a down file and the service is started later" (which is the case when s6-supervise is driven by s6-rc), or "what do I do when ./run exists but is non-executable", or other questions in the same vein. And the end benefit of having such a special case would be very dubious. >Another example of orneriness: supervise automatically does its own initia= lization, but the s6-rc program (not the eponymous suite) doesn't. Instead, = the suite has a separate init program, s6-rc-init, that's normally run at= boot time. But if it isn't run at boot time (which is a policy decision), s= 6-rc doesn't automatically run it if necessary. If rc shouldn't auto-initia= lize, neither should supervise. Now you are just acting in bad faith. Different programs, with different goals, obviously have different requirements and different behaviours; you can't seriously suggest that apples should behave like oranges. If a system uses s6-rc as its service manager, then early on in the boot process, it *will* run s6-rc-init. That's not so much a policy decision as a s6-rc mechanism. It's running a command line program in the system initialization sequence, it's not exactly difficult or convoluted. >Another one: the -d option to s6-rc is overloaded. When used with change,= it means to down the selected services. But when used with list, it means t= o invert the selection. I'm going to repeatedly forget this. You know what's interesting? It initially did not do this. And then people complained that the behaviour wasn't intuitive, and that s6-rc -d list *should* invert the selection. I thought about it and realized that it made sense, so I implemented it. I guess you can't make everyone happy. >One more: the doc for the s6-rc program says it's meant to be the one-stop = shop of service management, after compilation and initialization are done. = It has subcommands list, listall, diff, and change. But s6-rc-update is a= separate program, not a subcommand of s6-rc. I suppose there's a reason for = this, but it complicates the user interface with a seemingly arbitrary dis= tinction of whether to put a dash between "s6-rc" and the subcommand depend= ing on what the particular subcommand is. The s6-rc command operates the service management engine, relying on a stable compiled service database. It will read the database and perform operations in that context. It is the command to use when querying the database, and starting/stopping services, in a normal production environment. The s6-rc-update command does not fit in that model. It is an administration command: it changes the context in which s6-rc runs. It is not used as commonly, it is heavier and more dangerous. It switches databases! Think atomically regenerating the openrc cache after modifying your services in /etc/init.d. (OpenRC offers no such thing, and it's a mess reliability-wise.) This is a fundamentally different thing from running the engine. >The docs advise entirely copying the service repository to a ramdisk, then = using (a link to) the copy as the scan directory. This makes the running s= ystem independent of the original repo. But the doc for s6-rc-init says the = rc system remains dependent on the original compiled database, and there's = no explanation of why it isn't also copied in order to make the running sy= stem >independent. The point of operating off a copy of some data is that system operation won't be disturbed when the user modifies the original data, until they decide to commit/flush a batch of changes - which the system should then pick up as atomically as possible. A service directory is data that the user can modify. That is why it is better to run s6-supervise on a copy of a service directory (separate "live" data from "stock" data). The compiled database is already a copy. It's not data that the user modifies (except potentially for adding or removing bundles, but those are a layer on top of the core database, which remains untouched). What the user will modify is the source directory, that will be compiled into a different database, and changes will not happen to the live system until the user calls s6-rc-update, which is the atomic commit/flush operation. >Ok, Colin Booth mentioned permission issues when running as non-root. It s= houldn't be a problem, since all of this (including svscan) is running as t= he same user. Permission problems should only come into play when trying to = do things inter-user. Anyway, I checked the s6-rc-compile doc. Looks like= -h won't be necessary, since it defaults to the owner of the svscan proc. B= ut -u is needed, since it defaults to allowing only root--even though I've= never run any of this as root, and I've never asked it to try to do anythin= g as root, and I've never told it that it should expect to be root, or even = mentioned root at all. Okay, so this one requires a bit of a longer explanation. At s6-rc run time, *all* the services, be they oneshots or longruns, when they are run, are run with the uid/gid of the s6 supervision tree whose scandir was given as an argument to s6-rc-init. In normal usage, the supervision tree runs as root. So it is important to implement access control, so that a non-root user running s6-rc does not get unauthorized access to running services. For longruns, access control is done automatically via permissions on the s6-supervise control pipe. So if you start longruns via s6-rc as the user who owns the supervision tree, everything will work fine. For oneshots, the same mechanism cannot be used, because oneshots are not about sending s6-svc commands to a s6-supervise process, they are about fetching a script from the compiled database and executing it. s6-rc connects to the s6rc-oneshot-runner service (via a Unix socket, managed by s6-ipcserver), telling it what script to run; the s6-rc-oneshot-run program, spawned by s6-ipcserver, will read the database, extract the script, and run it. (This may sound overly complex, but it's the only way to ensure that oneshots, like longruns, are always run with the same, reproducible, environment.) In this case, access control is done at the Unix socket level. s6-ipcserver-access checks the client (s6-rc)'s credentials against an ACL; if the credentials are good, s6-rc-oneshot-run is allowed to run, else the connection is denied. The thing is, the ACL is created at s6-rc-compile time, and s6-rc-compile does not know what uid/gid the supervision tree is going to run under. The common, normal case for s6-rc is that the supervision tree is going to run as root, even when you create the service database as a normal user. This is a feature: writing and compiling the service database do not require root privileges, so it is expected that a database created by a normal user is still going to be operated by root. And so, the default ACL created by s6-rc-compile says "only root is going to be able to access oneshots". Which is a sane and safe default. If you want to change that, the -u and -g options to s6-rc-compile are what you need; and they are prominent in the s6-rc-compile documentation page. If you don't provide these options to s6-rc-compile, the s6rc-oneshot-runner service in your compiled database will only accept connections from root. That said, it is true that there is an inconsistency between oneshots and longruns as far as access control is performed, and that it is not obvious. The problem is, in order to be consistent here, s6-ipcserver-access would need, by default, to accept "the uid/gid that the server is running as", instead of a static list of credentials: that would make running oneshots as a normal user as transparent as running longruns. But it's currently not possible with s6-ipcserver-access. This is a good point, though, and I will think about the best way to add that feature. Least surprise is good. >Anyway, recompile with -u 1000, re-update, and try again. Now, I can't eve= n do s6-rc -a list; I get: >s6-rc fatal: unable to take locks: Permission denied Hmmm, that's weird. If all the previous operations have been done as the same user, you should never get EPERM. Have you run something as root before? > After reading more of the docs than I expected to be necessary Uh-huh. You initially expected *no* docs to be necessary. So, yes, you're going to be disappointed. :P > I'm still unable to get s6 to do the basic job I need: manage a small gr= oup of services, and funnel and log their output. It's especially frustrati= ng having to fight with software that generates gratuitous intra-user permi= ssion errors. When it comes to security, I'd rather be on the conservative side - it was either that or allow users to run services as root, which most people probably do not want. :P But yes, I'll find a way to make access control consistent between oneshots and longruns. -- Laurent