From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/711 Path: main.gmane.org!not-for-mail From: Charlie Brady Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: Respawn limit for runsv? Date: Sun, 13 Feb 2005 13:42:23 -0500 (EST) Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Trace: sea.gmane.org 1108320032 26837 80.91.229.2 (13 Feb 2005 18:40:32 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 13 Feb 2005 18:40:32 +0000 (UTC) Cc: supervision@list.skarnet.org Original-X-From: supervision-return-950-gcsg-supervision=m.gmane.org@list.skarnet.org Sun Feb 13 19:40:31 2005 Original-Received: from antah.skarnet.org ([212.85.147.14] ident=qmailr) by ciao.gmane.org with smtp (Exim 4.43) id 1D0OfG-0001Uk-1s for gcsg-supervision@gmane.org; Sun, 13 Feb 2005 19:40:14 +0100 Original-Received: (qmail 7612 invoked by uid 76); 13 Feb 2005 18:42:48 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 7606 invoked from network); 13 Feb 2005 18:42:47 -0000 X-X-Sender: charlieb@e-smith.charlieb.ott.istop.com Original-To: Lars Kellogg-Stedman In-Reply-To: X-MailScanner-To: gcsg-supervision@gmane.org Xref: main.gmane.org gmane.comp.sysutils.supervision.general:711 X-Report-Spam: http://spam.gmane.org/gmane.comp.sysutils.supervision.general:711 On Sat, 12 Feb 2005, Lars Kellogg-Stedman wrote: > > > There is no general class of problem unless you can provide a single > > > instance. > > Seriously, I would like to live in your perfect world, but in the past > couple of decades I've seen a variety of situations that would have been > easier to deal with (or that, in fact, *were* easier to deal with) > because of a limiter. I only asked for one :-) > Let's make up some examples: I didn't want made up examples :-) > - A piece of hardware on which a program depends goes bad, causing the > program to exit immediately upon startup. Suppose that this program has > a high initial startup cost -- so not only is it respawning pointlessly, > but it's driving up system load. > > - The disk fills up, causing your X startup to fail. But because of the > continuous respawning, you can't log in on the console! OK, I'm convinced. > Exponential back-off would probably be just fine, and represents a > fairly common solution to this class of problem. SysV init simply > pauses for a few minutes if the respawn rate exceeds a certain > threshold, and then tries again. > > Either behavior would be helpful. But perhaps both are unnecessarily complicated. Would it not be sufficient for runsv to have a configurable "dead time" after starting the supervised process before it was again prepared to respawn the child? We want it to start a new child quickly, but we don't want it to do so often. So how about "within one second", but "at most every ten seconds". It would also be useful to have a mechanism to distinguish between a process dying in reponse to a request from runsv, and a program dying unexpectedly. Perhaps have "finish" and "unexpected_finish" scripts. I'd certainly like to have a mechanism to run a finish script if a service is taken down, but not if it just died unexpecedly. The "unexpected_finish" script could introduce the programmed delay you want, notify the admin, preserve any essential logs, etc. --- Charlie