From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2027 Path: news.gmane.org!not-for-mail From: Charlie Brady Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runsv failing when starting up logger - missing pipe - failure of logpipe init? Date: Tue, 17 Aug 2010 07:56:11 -0400 (EDT) Message-ID: References: <20100817075720.GA9754@skarnet.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Trace: dough.gmane.org 1282046178 23655 80.91.229.12 (17 Aug 2010 11:56:18 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 17 Aug 2010 11:56:18 +0000 (UTC) Cc: supervision@list.skarnet.org To: Laurent Bercot Original-X-From: supervision-return-2262-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Aug 17 13:56:14 2010 Return-path: Envelope-to: gcsg-supervision@lo.gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1OlKm5-0004IZ-NV for gcsg-supervision@lo.gmane.org; Tue, 17 Aug 2010 13:56:13 +0200 Original-Received: (qmail 16729 invoked by uid 76); 17 Aug 2010 11:58:15 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 16714 invoked from network); 17 Aug 2010 11:58:15 -0000 X-X-Sender: charlieb@e-smith.charlieb.ott.istop.com In-Reply-To: <20100817075720.GA9754@skarnet.org> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2027 Archived-At: On Tue, 17 Aug 2010, Laurent Bercot wrote: > The pipe creation part looks correct. > The part where the error occurs looks correct. > Okay, so is there a place where the pipe might be closed? Sure enough, > there is: right at the end, if svd[0].want == W_EXIT, svd[0].state == DOWN, > svd[1].pid != 0 and svd[1].want != W_EXIT, then logpipe[1] and logpipe[0] > both get closed. And this is the only place where it can happen. > > My bet is that at some point, your runsv ran through that code, but > somehow managed to live and the services didn't die, i.e. another control > message was sent and processed before the exit condition was reached, and > runsv is still trying to supervise things - but runs into trouble with the > closed logpipe. I have no time to investigate further right now, but earlier > in your strace, you should see stuff such as the control messages arriving, > the logpipe getting closed, etc. > If my bet is correct, then the bug is that there's a case where runsv can > close the logpipe and still keep going, whereas it should exit as soon as > the logger dies no matter what (or just exit on the spot and let the logger > die on its own). Thanks Laurent, for your pointer. Unfortunately the strace won't help, since it wasn't started until long after the runsv process was already malfunctioning. Presumably Gerrit will have a good think about possible execution paths. I'll look further too. --- Charlie