From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1537 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Tue, 18 Sep 2007 14:33:25 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070918113325.GE1531@home.power> References: <20070912172245.GF12043@home.power> <20070912181836.GG12043@home.power> <20070912191346.GH12043@home.power> <20070915133641.GA30650@home.power> <20070917075651.8280.qmail@f6989948e15a99.315fe32.mid.smarden.org> <20070917115924.GB1531@home.power> <20070918081441.20488.qmail@1a6f0ddc0befcc.315fe32.mid.smarden.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1190115216 5556 80.91.229.12 (18 Sep 2007 11:33:36 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 18 Sep 2007 11:33:36 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1772-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Sep 18 13:33:34 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IXbKe-0007T7-G3 for gcsg-supervision@gmane.org; Tue, 18 Sep 2007 13:33:32 +0200 Original-Received: (qmail 5151 invoked by uid 76); 18 Sep 2007 11:33:51 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 5140 invoked from network); 18 Sep 2007 11:33:51 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <20070918081441.20488.qmail@1a6f0ddc0befcc.315fe32.mid.smarden.org> User-Agent: Mutt/1.5.16 (2007-06-09) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1537 Archived-At: Hi! On Tue, Sep 18, 2007 at 08:14:41AM +0000, Gerrit Pape wrote: > The situation would have been cleaned up on your systems once any child > process gets re-parented to process 1 before it terminates, and then > exits, causing runit to get a SIGCHLD; which apparently didn't happen. The interesting question is: why it didn't happen? Or - why it stop happening after 25 May 2007 on Gentoo systems. For example, in this case parent process exits before child, so it doesn't have a chance to intercept SIGCHLD, and SIGCHLD must be delivered to runit: # ps ax | grep Z | wc 280 1681 12360 # perl -e 'fork || sleep 1; print "pid $$ exit\n"' pid 4977 exit # pid 4979 exit # sleep 15; ps ax | grep 4979 4979 pts/1 Z 0:00 [perl] # ps ax | grep Z | wc 283 1699 12488 > I prepare a new version of runit that looks for and reaps zombies not > only if it knows that there are some, but also after a 14 seconds > timeout, there seems to be no way around that. Maybe it has sense to check how sysvinit handle zombies? -- WBR, Alex.