From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/703 Path: main.gmane.org!not-for-mail From: Lars Kellogg-Stedman Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: Respawn limit for runsv? Date: Sat, 12 Feb 2005 08:04:59 -0500 Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1108213434 30852 80.91.229.2 (12 Feb 2005 13:03:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 12 Feb 2005 13:03:54 +0000 (UTC) Original-X-From: supervision-return-942-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Feb 12 14:03:53 2005 Original-Received: from antah.skarnet.org ([212.85.147.14] ident=qmailr) by ciao.gmane.org with smtp (Exim 4.43) id 1Czwvz-0008U6-Cm for gcsg-supervision@gmane.org; Sat, 12 Feb 2005 14:03:39 +0100 Original-Received: (qmail 30457 invoked by uid 76); 12 Feb 2005 13:06:01 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 30451 invoked from network); 12 Feb 2005 13:06:00 -0000 X-Injected-Via-Gmane: http://gmane.org/ Original-To: supervision@list.skarnet.org Original-Lines: 37 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 209-6-203-41.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com User-Agent: MT-NewsWatcher/3.4 (PPC Mac OS X) Original-Sender: news X-MailScanner-To: gcsg-supervision@gmane.org Xref: main.gmane.org gmane.comp.sysutils.supervision.general:703 X-Report-Spam: http://spam.gmane.org/gmane.comp.sysutils.supervision.general:703 > > There is no general class of problem unless you can provide a single > > instance. Seriously, I would like to live in your perfect world, but in the past couple of decades I've seen a variety of situations that would have been easier to deal with (or that, in fact, *were* easier to deal with) because of a limiter. Let's make up some examples: - A piece of hardware on which a program depends goes bad, causing the program to exit immediately upon startup. Suppose that this program has a high initial startup cost -- so not only is it respawning pointlessly, but it's driving up system load. - The disk fills up, causing your X startup to fail. But because of the continuous respawning, you can't log in on the console! - A program bug causes a crash and concomitant database corruption. Subsequent startups fail immediately, but since you're logging through svlogd, the original crash messages disappears into the ether because the roughly 3600 respawns/hour have pushed it out of the logs. - Or in any of the above scenarios, maybe you're *not* logging through svlogd, and the error messages fill up a partition and bring the system to a screeching halt, or at least give it a noticeable limp. Sure, yes, the root problem here is not unlimited respawning, but this behavior exacerbates the problem. Diagnosis and resource consumption are both aided by some sort of limit. Exponential back-off would probably be just fine, and represents a fairly common solution to this class of problem. SysV init simply pauses for a few minutes if the respawn rate exceeds a certain threshold, and then tries again. Either behavior would be helpful. If you've never encountered a situation in which this would be useful, then by all means, don't partake of whatever, if anything, I ultimately manage to produce.