From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2555 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Laurent Bercot" Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: further claims Date: Tue, 30 Apr 2019 08:56:41 +0000 Message-ID: References: <15044531556573627@iva6-ff1651a9aa83.qloud-c.yandex.net> Reply-To: "Laurent Bercot" Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="27466"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: eM_Client/7.2.34711.0 To: supervision Original-X-From: supervision-return-2145-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Apr 30 10:55:47 2019 Return-path: Envelope-to: gcsg-supervision@m.gmane.org Original-Received: from alyss.skarnet.org ([95.142.172.232]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hLOYF-00070U-F0 for gcsg-supervision@m.gmane.org; Tue, 30 Apr 2019 10:55:47 +0200 Original-Received: (qmail 20022 invoked by uid 89); 30 Apr 2019 08:56:13 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Original-Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Original-Received: (qmail 20015 invoked from network); 30 Apr 2019 08:56:13 -0000 In-Reply-To: <15044531556573627@iva6-ff1651a9aa83.qloud-c.yandex.net> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2555 Archived-At: >haven't you claimed process #1 should supervise long running >child processes ? runit fulfils exactly this requirement by >supervising the supervisor. Not exactly, no. If something kills runsvdir, then runit immediately enters stage 3, and reboots the system. This is an acceptable response to the scanner dying, but is not the same thing as supervising it. If runsvdir's death is accidental, the system goes through an unnecessary reboot. >this lengthens the supervision chain but also has the additional >advantage of a supervised supervisor. ;-) No. >maybe runsvdir was not made to run as process #1 and this was >just a hack its author came up with to replace (SysV) init totally. Gerrit may correct me here, but I think that was the idea, yes. runit predates s6 and its goal was to provide a daemontools-like supervision suite that could also be used as an init system. No more, no less; and I think it succeeded. >sure, if (s6-)svscan dies one is in deep shit aswell, so what is the point >here ? If s6-svscan dies, the pipes are still maintained in the s6-supervise processes. You would need to kill the supervisor *and* the scanner for the pipe to disappear, whereas with runit, the pipe disappears and you can lose logs as soon as you kill the supervisor. And of course, if s6-svscan runs as process 1, you cannot kill it. > runsv gets restarted by runsvdir but the pipe is gone (are pipes >really closed when the opening (parent) process exits without closing >them itself and subprocesses still use that very pipe ?) The problematic case is when the consumer (i.e. the logger) dies while the producer (i.e. the service) is still outputting logs. When that happens, you need a process to hold the reading end of the logging pipe. If you don't have such a process, the pipe is closed when the consumer dies, and any data that is still in transit is lost. When the logging pipe is held by runsv, if runsv dies, then this situation is possible. Of course nothing wrong happens as long as the logger stays alive, but when the logger dies, the service needs to die first, in order for the logging pipe to be properly recreated without any log loss. When the logging pipe is held by s6-svscan and you have one supervisor per process, then any of the supervisors or the supervised processes may die at any time, but the logging pipe is never broken. You'd have to go back and kill s6-svscan in order to have a chance at ever losing logs. > [perpd] >but from a design perspective it seems as reliable as s6-svscan ? >or not since it uses a more integrated desing/approach ? I trust Wayne to have written perpd correctly. However, from a pure design perspective, perpd is unarguably more complex, since it has to perform the job of the scanner + N supervisors in one process, so it's naturally more difficult to make sure there's no bugs in it. The state machine in s6-supervise is complex enough. I wouldn't want to maintain N similar constructs in one unique process. It's doable, of course, but requires more effort to write, debug, and maintain. >this design simplifies communication since tasks are not >implemented in other tools running as its (direct) subprocesses. Yes, that is the classic trade-off of multiprocess designs. It's mostly a question of taste. I tend to favor multiprocess designs because the costs of having more - and more complex - communication is usually largely outweighed by the benefits of having significantly less code and simpler code paths. -- Laurent