From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2032 Path: news.gmane.org!not-for-mail From: Laurent Bercot Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: hello - hanging services Date: Wed, 18 Aug 2010 12:57:35 +0200 Message-ID: <20100818105735.GA13364@skarnet.org> References: <20100817190803.41e8257f.jean.bruenn@ip-minds.de> <20100817192422.a157e85f.jean.bruenn@ip-minds.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1282128937 28046 80.91.229.12 (18 Aug 2010 10:55:37 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 18 Aug 2010 10:55:37 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-2267-gcsg-supervision=m.gmane.org@list.skarnet.org Wed Aug 18 12:55:36 2010 Return-path: Envelope-to: gcsg-supervision@lo.gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1OlgIw-0007hQ-Kf for gcsg-supervision@lo.gmane.org; Wed, 18 Aug 2010 12:55:34 +0200 Original-Received: (qmail 21604 invoked by uid 76); 18 Aug 2010 10:57:35 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 21595 invoked by uid 1000); 18 Aug 2010 10:57:35 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2032 Archived-At: >> which is running a command all X seconds and if it gets a response it >> knows "ah okay, the service is still running" and if it gets no >> response "oh, the service seems to have died, let's restart it"? >> >> Difficult to implement? > > Yes. More precisely, it's not so much "difficult to implement" (I've done it for a paying customer's project) as "impossible to do without specific support in the service you're trying to manage". In other words, what Jean-Michel wants is a software watchdog; it can be done, but it's pretty intrusive. It requires having a library, a daemon, and making library calls in the managed process' source, sending messages to the daemon by doing so. The daemon is configured with a certain policy that decides "the service is running fine" or "the service has hung" depending on the frequency of the messages it receives. It's doable, and a watchdog library/daemon may even have its place in a supervision suite (I'll think about it), but it certainly has nothing to do with purely external process management tools such as runsvdir/runsv or svscan/supervise. It's a whole piece of software on its own. I'm certain that a lot of open source software watchdogs already exist out there. I'm also certain that none of them is as lightweight and easy to use as I'd like, but that's another story. -- Laurent