From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2085 Path: news.gmane.org!not-for-mail From: Wayne Marshall Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: [announce] perp-2.03: persistent process supervision Date: Mon, 14 Mar 2011 18:39:24 +0100 Organization: b0llix.net: un!x for the deranged Message-ID: <20110314183924.4dc40065@b0llix.net> References: <20110314113933.3544df05@b0llix.net> <20110314131706.GA17316@skarnet.org> <20110314150225.7cf61c3c@b0llix.net> <4D7E24DA.2030404@robinbowes.com> <20110314153425.34ed16dc@b0llix.net> <20110314164741.GA7248@skarnet.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1300124395 13786 80.91.229.12 (14 Mar 2011 17:39:55 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 14 Mar 2011 17:39:55 +0000 (UTC) Cc: supervision@list.skarnet.org To: Laurent Bercot Original-X-From: supervision-return-2319-gcsg-supervision=m.gmane.org@list.skarnet.org Mon Mar 14 18:39:51 2011 Return-path: Envelope-to: gcsg-supervision@lo.gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1PzBkC-0003CK-5o for gcsg-supervision@lo.gmane.org; Mon, 14 Mar 2011 18:39:48 +0100 Original-Received: (qmail 17109 invoked by uid 76); 14 Mar 2011 17:42:15 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 17097 invoked from network); 14 Mar 2011 17:42:14 -0000 In-Reply-To: <20110314164741.GA7248@skarnet.org> X-Mailer: Claws Mail 3.7.7 (GTK+ 2.22.1; x86_64--netbsd) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2085 Archived-At: On Mon, 14 Mar 2011 17:47:41 +0100 Laurent Bercot wrote: > > First, perpd(8) will not die (TM). > > Of course it will not - not in normal circumstances. > Neither will svscan, or runsvdir, or s6-svscan. > I trust your programming ability in that matter as much as > mine - this is not a concern at all. > > The concern is that you don't always have the say. There's > this playful thing called the Linux OOM killer. I hear the > heuristics have been fixed in recent kernel releases, but for > a long time, the OOM killer had the amusing habit of shooting > processes at random, and very much failing to locate the > process that is actually responsible for the memory outage. > There are still a whole lot of broken OOM killers out there. It is like worrying, what if init(8) should die? > > Of course, this is not a normal condition, and under careful > administration it never happens. But the point is, when you > are designing a supervision tool, you should assume that you > can get a random SIGKILL (Headshot. Do not pass Go. Do not > call your cleanup routines.) at any time. > If a system is delivering random SIGKILL, one should select another system. There is no peaceful, confident sleeping at night otherwise, no matter what supervisory framework you choose. > That is why I asked my question. In other supervision > schemes, tasks are de-centralized, so if one process randomly > dies, it generally does not have much impact on the rest of > the system. (If runsvdir dies, it's annoying, but things keep > working until the admin can come clean things up.) perpd, > however, looks like a neural hub, centralizing a lot of info > into its memory. IOW, a SPOF, and you can be sure that the > next broken system tool will love to play Doom with it. > If we talk in terms of daemontools, svscan(8) already keeps a table of supervise(8) processes, and svscan itself functions as a supervisor of those multiple supervise(8)s. So it is not much of a conceptual jump, nor extra info, to simply eliminate the supervise(8) "middlemen", and have svscan supervise the services directly. This is all that perpd(8) does (as well as what init/minit/ninit do, too.) perpd does provide redundant supervision with perpboot/inittab by default when installed with perp-setup(8). Imagining any extra security from additional layers of supervision is merely a placebo, but you are certainly welcome to it if your base system is so fundamentally flawed. For example, you can run one perpd instance per service if you like. Or you can setup your perpetrate(5) service definitions to exec services under supervision of rundeux(8). Of course you can always revert to perp-0.00, too, if you prefer. It has all the same perp usability, but with a supervisory architecture that may be more familiar to you. Wayne