From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2031
Path: news.gmane.org!not-for-mail
From: Charlie Brady <charlieb-supervision@budge.apana.org.au>
Newsgroups: gmane.comp.sysutils.supervision.general
Subject: Re: runsv failing when starting up logger - missing pipe - failure
 of logpipe init?
Date: Tue, 17 Aug 2010 16:51:43 -0400 (EDT)
Message-ID: <Pine.LNX.4.64.1008171529440.4362@e-smith.charlieb.ott.istop.com>
References: <Pine.LNX.4.64.1008161622390.20374@e-smith.charlieb.ott.istop.com>
 <Pine.LNX.4.64.1008161922530.25187@e-smith.charlieb.ott.istop.com>
 <20100817075720.GA9754@skarnet.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: dough.gmane.org 1282078313 16386 80.91.229.12 (17 Aug 2010 20:51:53 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Tue, 17 Aug 2010 20:51:53 +0000 (UTC)
To: supervision@list.skarnet.org
Original-X-From: supervision-return-2266-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Aug 17 22:51:52 2010
Return-path: <supervision-return-2266-gcsg-supervision=m.gmane.org@list.skarnet.org>
Envelope-to: gcsg-supervision@lo.gmane.org
Original-Received: from antah.skarnet.org ([212.85.147.14])
	by lo.gmane.org with smtp (Exim 4.69)
	(envelope-from <supervision-return-2266-gcsg-supervision=m.gmane.org@list.skarnet.org>)
	id 1OlT8K-0001eQ-AO
	for gcsg-supervision@lo.gmane.org; Tue, 17 Aug 2010 22:51:44 +0200
Original-Received: (qmail 3447 invoked by uid 76); 17 Aug 2010 20:53:47 -0000
Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm
List-Post: <mailto:supervision@list.skarnet.org>
List-Help: <mailto:supervision-help@list.skarnet.org>
List-Unsubscribe: <mailto:supervision-unsubscribe@list.skarnet.org>
List-Subscribe: <mailto:supervision-subscribe@list.skarnet.org>
List-Archive: <http://www.skarnet.org/lists/>
Original-Received: (qmail 3437 invoked from network); 17 Aug 2010 20:53:47 -0000
X-X-Sender: charlieb@e-smith.charlieb.ott.istop.com
In-Reply-To: <20100817075720.GA9754@skarnet.org>
Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2031
Archived-At: <http://permalink.gmane.org/gmane.comp.sysutils.supervision.general/2031>


On Tue, 17 Aug 2010, Laurent Bercot wrote:

>  Before accusing the Linux kernel, let's check the runsv code and see
> whether there's a possible execution path that leads to the situation
> you're describing...
> 
>  The pipe creation part looks correct.
>  The part where the error occurs looks correct.
>  Okay, so is there a place where the pipe might be closed? Sure enough,
> there is: right at the end, if svd[0].want == W_EXIT, svd[0].state == DOWN,
> svd[1].pid != 0 and svd[1].want != W_EXIT, then logpipe[1] and logpipe[0]
> both get closed. And this is the only place where it can happen.
> 
>  My bet is that at some point, your runsv ran through that code, but
> somehow managed to live and the services didn't die, i.e. another control
> message was sent and processed before the exit condition was reached, and
> runsv is still trying to supervise things - but runs into trouble with the
> closed logpipe. I have no time to investigate further right now, but earlier
> in your strace, you should see stuff such as the control messages arriving,
> the logpipe getting closed, etc.
>  If my bet is correct, then the bug is that there's a case where runsv can
> close the logpipe and still keep going, whereas it should exit as soon as
> the logger dies no matter what (or just exit on the spot and let the logger
> die on its own).

Some additional information. There have also been some runsv fatal startup 
errors, due to failure to flock supervise/lock. The most recent of those 
post-dates the most recent ./log/run failure.

[root@mbgvm1 ~]# tr \\0 \\n < /proc/1913/cmdline 
runsvdir
-P
/service
log: : unable to lock supervise/lock: temporary failure
runsv tug-metrics: fatal: unable to setup filedescriptor for ./log/run: 
file descriptor not open
runsv tug-metrics: fatal: unable to lock supervise/lock: temporary failure
runsv tug-metrics: fatal: unable to setup filedescriptor for ./log/run: 
file descriptor not open
runsv tug-metrics: fatal: unable to lock supervise/lock: temporary failure

[root@mbgvm1 ~]# 

Although we don't have timestamps on those log messages, I'm prepared to 
bet the were generated close to Aug 10 15:21:54 - that's when new versions 
of /service/tug* symlinks were installed via rpm:

[root@mbgvm1 ~]# ps -eo pid,lstart,cmd | grep runsv
 1913 Mon Aug  9 14:02:24 2010 runsvdir -P /service log: : unable to lock 
supervise/lock: temporary failure?runsv tug-metrics: fatal: unable to 
setup filedescriptor for ./log/run: file descriptor not open?runsv 
tug-metrics: fatal: unable to lock supervise/lock: temporary failure?runsv 
tug-metrics: fatal: unable to setup filedescriptor for ./log/run: file 
descriptor not open?runsv tug-metrics: fatal: unable to lock 
supervise/lock: temporary failure?
 1940 Mon Aug  9 14:02:24 2010 runsv oidentd
 1942 Mon Aug  9 14:02:24 2010 runsv radiusd
 1947 Mon Aug  9 14:02:24 2010 runsv eventq
 1948 Mon Aug  9 14:02:24 2010 runsv haldaemon
 1949 Mon Aug  9 14:02:24 2010 runsv dhcpd
 1950 Mon Aug  9 14:02:24 2010 runsv qdiskd
 1951 Mon Aug  9 14:02:24 2010 runsv replicate-master
 1958 Mon Aug  9 14:02:24 2010 runsv qmail
 1962 Mon Aug  9 14:02:24 2010 runsv squid
 1963 Mon Aug  9 14:02:24 2010 runsv soap-wrapper
 1964 Mon Aug  9 14:02:24 2010 runsv smtp-auth-proxy
 1973 Mon Aug  9 14:02:24 2010 runsv httpd-admin
 1984 Mon Aug  9 14:02:24 2010 runsv cvm-unix-local
 1985 Mon Aug  9 14:02:24 2010 runsv dnscache.forwarder
 1986 Mon Aug  9 14:02:24 2010 runsv ulogd
 1987 Mon Aug  9 14:02:24 2010 runsv qpsmtpd
 1992 Mon Aug  9 14:02:24 2010 runsv wan
 2000 Mon Aug  9 14:02:24 2010 runsv sshd
 2005 Mon Aug  9 14:02:24 2010 runsv tug-metrics
 2006 Mon Aug  9 14:02:24 2010 runsv ntpd
 2007 Mon Aug  9 14:02:24 2010 runsv tomcat
 2010 Mon Aug  9 14:02:24 2010 runsv heartbeat
 2011 Mon Aug  9 14:02:24 2010 runsv certmgmt
 2012 Mon Aug  9 14:02:24 2010 runsv dnscache
 2015 Mon Aug  9 14:02:24 2010 runsv snmpd
 2016 Mon Aug  9 14:02:24 2010 runsv alarmd
 2017 Mon Aug  9 14:02:24 2010 runsv vmware-guestd
 2018 Mon Aug  9 14:02:24 2010 runsv mntftpserver
 2019 Mon Aug  9 14:02:24 2010 runsv httpd-e-smith
 2020 Mon Aug  9 14:02:24 2010 runsv ldap
 2021 Mon Aug  9 14:02:24 2010 runsv clusterip
 2022 Mon Aug  9 14:02:24 2010 runsv pptpd
 2023 Mon Aug  9 14:02:24 2010 runsv tinydns
 2024 Mon Aug  9 14:02:24 2010 runsv postgresql
 2025 Mon Aug  9 14:02:24 2010 runsv raidmonitor
 4506 Tue Aug 17 16:13:25 2010 grep runsv
14262 Tue Aug 10 15:21:54 2010 runsv tug
14263 Tue Aug 10 15:21:54 2010 runsv tug-eventd
[root@mbgvm1 ~]# 

I suspect there is a race condition in runsvdir when a /service/xxx 
symlink is replaced with another. If I read the code correctly, runsvdir 
will detect that the inode number of the symlink is different, send 
sigterm to the runsv process, and then forget about it. On the next scan, 
it will start a new runsv process - which might fail if the old one still 
persists, and has a lock on the supervise/lock file.

The runsv process, on receipt of the sigterm, will send itself the 'x' 
control command. As Laurent points out, this will cause runsv to close the 
logging pipe. 

Larent also says:

 whereas it should exit as soon as the logger dies no matter what (or just
 exit on the spot and let the logger die on its own).

however I don't see that in the code. runsv will exit if it reaches this 
code and has recorded the logger as having exited, via waitpid() 
processing:

...
    if ((svd[0].want == W_EXIT) && (svd[0].state == S_DOWN)) {
      if (svd[1].pid == 0) _exit(0);
      if (svd[1].want != W_EXIT) {
        svd[1].want =W_EXIT;
        /* stopservice(&svd[1]); */
        update_status(&svd[1]);
        if (close(logpipe[1]) == -1) warn("unable to close logpipe[1]");
        if (close(logpipe[0]) == -1) warn("unable to close logpipe[0]");
      }
    } 
...

I think it is depending on closing the pipe to terminate the logger, and 
is depending on waitpid() processing to terminate the runsv process. Am I 
reading this correctly?