From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2026 Path: news.gmane.org!not-for-mail From: Laurent Bercot Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runsv failing when starting up logger - missing pipe - failure of logpipe init? Date: Tue, 17 Aug 2010 09:57:20 +0200 Message-ID: <20100817075720.GA9754@skarnet.org> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1282031718 2032 80.91.229.12 (17 Aug 2010 07:55:18 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 17 Aug 2010 07:55:18 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-2261-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Aug 17 09:55:17 2010 Return-path: Envelope-to: gcsg-supervision@lo.gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1OlH0v-0000k1-Nu for gcsg-supervision@lo.gmane.org; Tue, 17 Aug 2010 09:55:17 +0200 Original-Received: (qmail 12999 invoked by uid 76); 17 Aug 2010 07:57:20 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 12991 invoked by uid 1000); 17 Aug 2010 07:57:20 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4i Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2026 Archived-At: >> Has anyone else seen this error condition or can posit a situation where >> it might be seen? > > The next question to ponder is where the bug lies. The runsv process here > has no fd 5 and fd 6 - IOW, logpipe[0] is 5, but isn't a valid fd. Are > there circumstances where a pipe can just cease to be? Should runsv have > detected this issue (where pipe() did not return -1, but the fds returned > were not valid)? > > Is this a linux kernel bug? Before accusing the Linux kernel, let's check the runsv code and see whether there's a possible execution path that leads to the situation you're describing... The pipe creation part looks correct. The part where the error occurs looks correct. Okay, so is there a place where the pipe might be closed? Sure enough, there is: right at the end, if svd[0].want == W_EXIT, svd[0].state == DOWN, svd[1].pid != 0 and svd[1].want != W_EXIT, then logpipe[1] and logpipe[0] both get closed. And this is the only place where it can happen. My bet is that at some point, your runsv ran through that code, but somehow managed to live and the services didn't die, i.e. another control message was sent and processed before the exit condition was reached, and runsv is still trying to supervise things - but runs into trouble with the closed logpipe. I have no time to investigate further right now, but earlier in your strace, you should see stuff such as the control messages arriving, the logpipe getting closed, etc. If my bet is correct, then the bug is that there's a case where runsv can close the logpipe and still keep going, whereas it should exit as soon as the logger dies no matter what (or just exit on the spot and let the logger die on its own). -- Laurent