From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1203 Path: news.gmane.org!not-for-mail From: Uffe Jakobsen Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: Should svwaitup/down be built again, or how to make sv do this? Date: Tue, 18 Jul 2006 10:38:04 +0200 Message-ID: <44BC9DEC.90500@uffe.org> References: <200607112100.08660.spamite@ev1.net> <200607121127.48966.spamite@ev1.net> <20060713084421.7943.qmail@0c0c26b6d83f02.315fe32.mid.smarden.org> <200607171256.14899.spamite@ev1.net> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1153211963 25704 80.91.229.2 (18 Jul 2006 08:39:23 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 18 Jul 2006 08:39:23 +0000 (UTC) Original-X-From: supervision-return-1439-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Jul 18 10:39:22 2006 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by ciao.gmane.org with smtp (Exim 4.43) id 1G2l6t-0001jb-Js for gcsg-supervision@gmane.org; Tue, 18 Jul 2006 10:39:19 +0200 Original-Received: (qmail 10196 invoked by uid 76); 18 Jul 2006 08:39:40 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 10191 invoked from network); 18 Jul 2006 08:39:40 -0000 User-Agent: Thunderbird 1.5.0.4 (Windows/20060516) Original-To: supervision@list.skarnet.org In-Reply-To: <200607171256.14899.spamite@ev1.net> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1203 Archived-At: Kevin wrote: > On Thursday 13 July 2006 03:44, Gerrit Pape wrote: >>> We cannot make sv block like svwaitup did. Either we're >>> misunderstanding how to use it, or something. >>> >>> An example service script is included below: >>> >>> #!/bin/sh >>> svwaitup ~/service/sfsagent >>> exec chpst -e env runthis.sh >> svwaitup did wait two seconds by default after the sfsagent service >> daemon has been started, this seems to be what you relied on. Indeed, >> sv doesn't do that by default, but it supports the ./check script >> instead. > > svwaitup waited until the service had been running for at least 2 seconds > before exiting, no? > > >> If you want 'sv start ~/service/sfsagent' or 'sv check >> ~/service/sfsagent' to wait for two seconds before returning, simply do >> 'sleep 2' in ~/service/sfsagent/check. > > That would be a useless check. What we want is for the sfsagent service > to be running, not just wait 2 seconds before starting. > >> Better yet, find out how to check for what the sfsagent service daemon >> needs to provide before runthis.sh can be started, and check for that >> in ~/service/sfsagent/check. > > So, it sounds like you're saying that the functionality of checking for a > service to be running, and waiting until that service is running before > continuing no longer exists in runit as it is presently built. Is this > correct? I may be wrong here but there seems to be some sort of confusion about the internal state of a service versus the external runit state running/not running. The runit supervisor framework can only see if a service (process) is running - it has no idea if that service is yet functional or not. Runit assumes that if the process is running then it is ok. It knows nothing about how long it will take for the process to become operational (as a service) from the point where it is initialized. You could have a services that performs 'deep and long' integrity checks during initialization before they start eg. listening and providing their services - and how long will 'deep and long' integrity checks during initialzation take ? seconds, minutes, hours or even days ? runit has no chance of knowing that - as long as the supervised process is running everything is ok from runit's perspactive. My guess is that you've until now just been lucky that your service typically takes just a little less than 2 seconds to initialize and become operational. That is why Gerrit suggests that you should implement a test in your check script that can determine if your service actually has become operational (functional) yet or not. Typical checks could be to test if the service is listening on a socket/port. Let's hope I'm not all wrong :-) Kind regards Uffe