From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2560 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Colin Booth Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: further claims Date: Thu, 2 May 2019 00:30:11 +0000 Message-ID: <20190502003011.7wzyyms6ew74vvxf@cathexis.xen.prgmr.com> References: <15044531556573627@iva6-ff1651a9aa83.qloud-c.yandex.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="120302"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: NeoMutt/20170113 (1.7.2) To: supervision@list.skarnet.org Original-X-From: supervision-return-2150-gcsg-supervision=m.gmane.org@list.skarnet.org Thu May 02 02:30:18 2019 Return-path: Envelope-to: gcsg-supervision@m.gmane.org Original-Received: from alyss.skarnet.org ([95.142.172.232]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hLzc9-000V8d-OP for gcsg-supervision@m.gmane.org; Thu, 02 May 2019 02:30:17 +0200 Original-Received: (qmail 17088 invoked by uid 89); 2 May 2019 00:30:42 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Original-Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Original-Received: (qmail 17081 invoked from network); 2 May 2019 00:30:41 -0000 Content-Disposition: inline In-Reply-To: Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2560 Archived-At: On Wed, May 01, 2019 at 08:09:58PM -0300, Guillermo wrote: > El mar., 30 abr. 2019 a las 5:55, Laurent Bercot escribió: > > > > >haven't you claimed process #1 should supervise long running > > >child processes ? runit fulfils exactly this requirement by > > >supervising the supervisor. > > > > Not exactly, no. > > If something kills runsvdir, then runit immediately enters > > stage 3, and reboots the system. This is an acceptable response > > to the scanner dying, but is not the same thing as supervising > > it. If runsvdir's death is accidental, the system goes through > > an unnecessary reboot. > > If the /etc/runit/2 process exits with code 111 or gets killed by a > signal, the runit program is actually supposed to respawn it, > according to its man page. I believe this counts as supervising at > least one process, so it would put runit in the "correct init" camp :) > > There is code that checks the 'wstat' value returned by a > wait_nohang(&wstat) call that reaps the /etc/runit/2 process, however, > it is executed only if wait_exitcode(wstat) != 0. On my computer, > wait_exitcode() returns 0 if its argument is the wstat of a process > killed by a signal, so runit indeed spawns /etc/runit/3 instead of > respawning /etc/runit/2 when, for example, I point a gun at runsvdir > on purpose and use a kill -int command specifying its PID. Changing > the condition to wait_crashed(wstat) || (wait_exitcode(wstat) != 0) > makes things work as intended. > > G. Moving the goal post a few feet here but, the duties of a proper init are to either: supervise one or more other things, or to bring down a system if their one thing goes away. runit does both: it'll restart 2 in some cases (correct, properly supervising one or more things), it'll bring down the system in other cases (also correct). Honestly, it might be better to define what a bad init is and then say a proper init is one that doesn't do that thing. A bad init is one that allows a system to enter a totally vegetable state. By this redefinition, a good init is one that doesn't allow systems to go vegetable, either by having something they restart, or totally freaking out and burning down the world if the one thing they started ever vanishes. Hell, sinit could be made proper by forking a thing and then issuing the reboot(2) syscall any time its child vanished. Annoyingly aggressive on the restarts, but proper. -- Colin Booth