From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2555
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: "Laurent Bercot" <ska-supervision@skarnet.org>
Newsgroups: gmane.comp.sysutils.supervision.general
Subject: Re: further claims
Date: Tue, 30 Apr 2019 08:56:41 +0000
Message-ID: <em33d79abe-979e-471b-b7a7-63b8d895a8fa@elzian>
References: <15044531556573627@iva6-ff1651a9aa83.qloud-c.yandex.net>
Reply-To: "Laurent Bercot" <ska-supervision@skarnet.org>
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="27466"; mail-complaints-to="usenet@blaine.gmane.org"
User-Agent: eM_Client/7.2.34711.0
To: supervision <supervision@list.skarnet.org>
Original-X-From: supervision-return-2145-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Apr 30 10:55:47 2019
Return-path: <supervision-return-2145-gcsg-supervision=m.gmane.org@list.skarnet.org>
Envelope-to: gcsg-supervision@m.gmane.org
Original-Received: from alyss.skarnet.org ([95.142.172.232])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <supervision-return-2145-gcsg-supervision=m.gmane.org@list.skarnet.org>)
	id 1hLOYF-00070U-F0
	for gcsg-supervision@m.gmane.org; Tue, 30 Apr 2019 10:55:47 +0200
Original-Received: (qmail 20022 invoked by uid 89); 30 Apr 2019 08:56:13 -0000
Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm
Original-Sender: <supervision@list.skarnet.org>
Precedence: bulk
List-Post: <mailto:supervision@list.skarnet.org>
List-Help: <mailto:supervision-help@list.skarnet.org>
List-Unsubscribe: <mailto:supervision-unsubscribe@list.skarnet.org>
List-Subscribe: <mailto:supervision-subscribe@list.skarnet.org>
List-Id: <supervision.list.skarnet.org>
Original-Received: (qmail 20015 invoked from network); 30 Apr 2019 08:56:13 -0000
In-Reply-To: <15044531556573627@iva6-ff1651a9aa83.qloud-c.yandex.net>
Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2555
Archived-At: <http://permalink.gmane.org/gmane.comp.sysutils.supervision.general/2555>


>haven't you claimed process #1 should supervise long running
>child processes ? runit fulfils exactly this requirement by
>supervising the supervisor.

Not exactly, no.
If something kills runsvdir, then runit immediately enters
stage 3, and reboots the system. This is an acceptable response
to the scanner dying, but is not the same thing as supervising
it. If runsvdir's death is accidental, the system goes through
an unnecessary reboot.


>this lengthens the supervision chain but also has the additional
>advantage of a supervised supervisor. ;-)

No.


>maybe runsvdir was not made to run as process #1 and this was
>just a hack its author came up with to replace (SysV) init totally.

Gerrit may correct me here, but I think that was the idea, yes.
runit predates s6 and its goal was to provide a daemontools-like
supervision suite that could also be used as an init system. No
more, no less; and I think it succeeded.


>sure, if (s6-)svscan dies one is in deep shit aswell, so what is the point
>here ?

If s6-svscan dies, the pipes are still maintained in the
s6-supervise processes. You would need to kill the supervisor *and*
the scanner for the pipe to disappear, whereas with runit, the pipe
disappears and you can lose logs as soon as you kill the supervisor.
And of course, if s6-svscan runs as process 1, you cannot kill it.


>  runsv gets restarted by runsvdir but the pipe is gone (are pipes
>really closed when the opening (parent) process exits without closing
>them itself and subprocesses still use that very pipe ?)

  The problematic case is when the consumer (i.e. the logger) dies
while the producer (i.e. the service) is still outputting logs.
When that happens, you need a process to hold the reading end
of the logging pipe. If you don't have such a process, the pipe
is closed when the consumer dies, and any data that is still
in transit is lost.

  When the logging pipe is held by runsv, if runsv dies, then
this situation is possible. Of course nothing wrong happens as
long as the logger stays alive, but when the logger dies, the
service needs to die first, in order for the logging pipe to be
properly recreated without any log loss.

  When the logging pipe is held by s6-svscan and you have one
supervisor per process, then any of the supervisors or the
supervised processes may die at any time, but the logging pipe
is never broken. You'd have to go back and kill s6-svscan in
order to have a chance at ever losing logs.


 > [perpd]
>but from a design perspective it seems as reliable as s6-svscan ?
>or not since it uses a more integrated desing/approach ?

I trust Wayne to have written perpd correctly. However, from a
pure design perspective, perpd is unarguably more complex, since
it has to perform the job of the scanner + N supervisors in one
process, so it's naturally more difficult to make sure there's
no bugs in it.
The state machine in s6-supervise is complex enough. I wouldn't
want to maintain N similar constructs in one unique process. It's
doable, of course, but requires more effort to write, debug, and
maintain.


>this design simplifies communication since tasks are not
>implemented in other tools running as its (direct) subprocesses.

  Yes, that is the classic trade-off of multiprocess designs.
It's mostly a question of taste. I tend to favor multiprocess designs
because the costs of having more - and more complex - communication
is usually largely outweighed by the benefits of having significantly
less code and simpler code paths.

--
  Laurent