From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1293 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: Option for runsv/runsvdir to specify how many times to restart a service in a certain time period before giving up? Date: Mon, 30 Oct 2006 16:24:20 +0200 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20061030142420.GC23323@home.power> References: <4543AEE3.50200@alex-smith.me.uk> <20061030104923.GC32166@home.power> <20061030121321.GA27602@fly.srk.fer.hr> <20061030123019.GA30814@home.power> <20061030133847.GA25085@skarnet.org> <20061030134227.GA23323@home.power> <20061030135834.GA26907@skarnet.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1162218449 21452 80.91.229.2 (30 Oct 2006 14:27:29 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 30 Oct 2006 14:27:29 +0000 (UTC) Original-X-From: supervision-return-1529-gcsg-supervision=m.gmane.org@list.skarnet.org Mon Oct 30 15:27:23 2006 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by ciao.gmane.org with smtp (Exim 4.43) id 1GeY3s-00083v-Bj for gcsg-supervision@gmane.org; Mon, 30 Oct 2006 15:24:24 +0100 Original-Received: (qmail 28957 invoked by uid 76); 30 Oct 2006 14:24:45 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 28951 invoked from network); 30 Oct 2006 14:24:45 -0000 Original-To: supervision@list.skarnet.org Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <20061030135834.GA26907@skarnet.org> User-Agent: Mutt/1.5.13 (2006-08-11) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1293 Archived-At: Hi! On Mon, Oct 30, 2006 at 02:58:34PM +0100, Laurent Bercot wrote: > Another approach to the throttle feature that doesn't require notification > from runit would be to have a short-lived program, designed to be called in > the finish script, that stores its information (last calling time ans such) > in the filesystem. Maybe it's what you were thinking about. But I'm not Yep, I'm thinking this way. > sure how to make it reliable; storing short-lived information in the > filesystem is very error-prone, that's the .pid way, which is precisely > what supervision tools were designed to avoid. I don't see any troubles making it reliable. DJB show how to develop reliable shell scripts long time ago: use atomic operations like 'mv' for updating files. # reliable counter in bash prev=$(< .counter ) next=$(( $prev + 1 )) echo $next > .counter.$$ mv .counter.$$ .counter For this task we need something more complex than just counter because we should count restarts for some time interval, but this also can be done in reliable way. P.S. If my script die before `mv` it can leave .counter.$$ file on disk. If this important we can add cleanup code. If it's important to not run 2 such scripts simultaneously, then we can use something like `chpst -l`. Etc... Reliable shell scripts is reality, let's face it. ;-) -- WBR, Alex.