From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2475 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Laurent Bercot" Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: s6 bites noob Date: Thu, 31 Jan 2019 20:19:28 +0000 Message-ID: References: Reply-To: "Laurent Bercot" Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="227657"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: eM_Client/7.2.33939.0 To: supervision@list.skarnet.org Original-X-From: supervision-return-2065-gcsg-supervision=m.gmane.org@list.skarnet.org Thu Jan 31 21:19:31 2019 Return-path: Envelope-to: gcsg-supervision@m.gmane.org Original-Received: from alyss.skarnet.org ([95.142.172.232]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gpIo6-000x8u-JL for gcsg-supervision@m.gmane.org; Thu, 31 Jan 2019 21:19:30 +0100 Original-Received: (qmail 18319 invoked by uid 89); 31 Jan 2019 20:19:56 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Original-Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 18312 invoked from network); 31 Jan 2019 20:19:56 -0000 In-Reply-To: X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedtledrjeeigddufeejucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecupfgfoffgtffkveetuefngfdpqfgfvfenuceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkjghfrhgfgggtgfesthhqredttderjeenucfhrhhomhepfdfnrghurhgvnhhtuceuvghrtghothdfuceoshhkrgdqshhuphgvrhhvihhsihhonhesshhkrghrnhgvthdrohhrgheqnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhhtnecuvehluhhsthgvrhfuihiivgeptd Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2475 Archived-At: >mkdir test >s6-svscan --help >Well, that was surprising and unpleasant. It ignores unknown arguments, bl= ithely starts a supervision tree in the current dir (my home dir), and spam= s me with a bunch of supervise errors. Ok, kill it. > >Next test: >s6-svscan test Do you always run programs you don't know in your home directory with random arguments before reading the documentation? Because if you do, then yes, you're bound to experience a few unpleasant surprises, and s6-svscan is pretty mild in that aspect. I think you should be thankful that it didn't erase all the files in your home directory. :) >What purpose is served by supervise automatically creating the supervise a= nd event subdirs if there's no run file? It seems to accomplish nothing but = errors and confusion. Instead of creating the subdirs, and then barfing on = the absence of a run file, why not just create nothing until a run file ap= pears? It is impossible to portably wait for the appearance of a file. And testing the existence of the file first, before creating the subdirs, wouldn't help, because it would be a TOCTOU. As you have noticed and very clearly reported, s6 is not user-friendly - or rather, its friendliness is not expressed in a way you have been lulled into thinking was good by other programs. Its friendliness comes from the fact that it does not mistake you for an idiot; it assumes that you know what you are doing, and does not waste code in performing redundant checks. That's how it avoids bloat, among other things. You may find it unpleasant that s6 does not hold your hand. That is understandable. But I assure you that as soon as you get a little experience with it (and that can even be achieved by just reading the documentation *before* launching a command ;)), all the hand-holding becomes entirely unnecessary because you know what to do. >The doc for svscan at least says that it creates the .s6-svscan subdir. Th= e doc for supervise says nothing about creating the supervise subdir, thoug= h the doc for servicedir does say it. I agree, the documentation isn't perfect. I'll make sure to add a note in the s6-supervise page to mention the creation of subdirs. >Next problem. The doc for s6-svc indicates that >s6-svc -wu serv/foo > >will wait until it's up. But that's not what happens. Instead, it exits im= mediately. Right. I know why this happens, and it's not exactly a bug, but I can understand why it's confusing - and your expectation is legitimate. So I will change the behaviour so "s6-svc -wu serv/foo" does what you thought it would do. > It also doesn't even try to start the service unless -u is also given, w= hich is surprising, but technically not in contradiction of the doc. Well *that* is perfectly intentional. >And if -u is given, then -wu waits forever, even after the service is up.= In serv/foo/run I have: >#/bin/bash >echo starting; sleep 2; echo dying > >s6-svc -wu -u serv/foo/ will start it, but never exits. Likewise, s6-svc -= wd -d serv/foo/ will stop it, but never exits. Now that is probably due to your setup, because yours is the only report I have of it not working. Please pastebin the output of "strace -vf -s 256 s6-svc -uwu serv/foo" somewhere, and post the URL: I, or other people here, will be able to tell you exactly what's going wrong. Also, just in case, please also pastebin your sysdeps (by default: /usr/lib/skalibs/sysdeps/sysdeps). >So, I tried s6-rc. Set up service definition dir, compile database, create = link, run s6-rc-init, etc, then finally >s6-rc -u change foo > >It starts immediately, but rc then waits while foo goes through 12 to 15 s= tart/sleep/die cycles before rc finally exits with code 0. (And foo continu= es cycling.) But if I press ^C on rc before it exits on its own, then it ki= lls foo, writes a warning that it was unable to start the service because f= oo crashed with signal 2, and exits with code 1. This is directly related to your issue with s6-svc above. "s6-rc -u change foo" precisely calls "s6-svc -uwu" on foo's service directory, and waits for it to return. Fixing s6-svc's behaviour in your installation will also fix s6-rc's behaviour. >So I tried it again, and this time pressed ^C on rc immediately after runn= ing it, before foo had a chance to die for the first time. It reported the= same warning! The prophecy is impressive, but still, shouldn't rc just exit = immediately after foo starts, and let the supervision tree independently h= andle foo's future death? That is normally what happens, except that in your case s6-svc never returns, so from s6-rc's point of view, the service is still starting. It's the exact same issue. >Next test: I moved run to up, changed type to oneshot, recompiled, created = new link, ran s6-rc-update, and tried foo again. This time, rc hangs forev= er, and up is never executed at all. When I eventually press ^C on rc, thou= gh, it doesn't say unable to start foo; it says unable to start s6rc-onesho= t-runner. Related to the same issue as well. Oneshots are executed through a longrun service named s6rc-oneshot-runner, so when you tell s6-rc to start foo, it starts s6rc-oneshot-runner first, and since s6-svc never returns, it fails in the same way as before. >How to bring all up? The absence of an option to bring up _everything in your database_ is intentional. In the usage I have in mind, the database is added and substracted to by a distribution's package manager: when you install a service, you add this service's definition to the database (and recompile it). That means there can be way more services in a database than the user ever intends to run at the same time - and it also means that the definition of "everything" can be pretty volatile, so having a "bring up everything" command would likely do more harm than good. The intended usage is for you to create a bundle explicitly containing all the services you want to bring up, and to call s6-rc -u change on this bundle. (You can name the bundle "everything" if you like.) That way, you know exactly what services you are starting, no matter what additions are made to the database. > >And a question about the advice in the docs. if svscan's rescan is 0, and= /tmp is RAM, what's the advantage of having the scan directory be /tmp/serv= ice with symlinks to service directory copies in /tmp/services, instead of= simply having /tmp/services directly be the scan directory? > >I guess an answer might be that there can be a race between svscan's initi= al scan at system startup and the populating of /tmp/services, so it sees p= artially copied service directories. But wouldn't a simpler solution be to= either delay svscan's start until the populating is complete, or add an opt= ion to disable its initial scan? The problem isn't only the initial scan, but _any_ scan, which can come at any time via a s6-svscanctl -a command for instance. Even -t0 does not protect you against an admin, or a script, requesting a scan without you being aware of it. That's why it is better to make sure that service directories only appear in a scandir when they are complete - which is achieved by creating them elsewhere then atomically symlinking them. You make good points, and I'm sure your initial impression of s6 would have been better if you hadn't bumped against this weird s6-svc problem. So, let's solve this fast and soothe the bite. :) -- Laurent