From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1530 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sat, 15 Sep 2007 19:02:21 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070915160221.GG30650@home.power> References: <20070912181836.GG12043@home.power> <20070912191346.GH12043@home.power> <20070915133641.GA30650@home.power> <20070915135749.GB30650@home.power> <20070915152804.GD30650@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1189872146 1717 80.91.229.12 (15 Sep 2007 16:02:26 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 15 Sep 2007 16:02:26 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1765-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Sep 15 18:02:23 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IWa6B-0004tv-8m for gcsg-supervision@gmane.org; Sat, 15 Sep 2007 18:02:23 +0200 Original-Received: (qmail 11912 invoked by uid 76); 15 Sep 2007 16:02:44 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 11906 invoked from network); 15 Sep 2007 16:02:44 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.16 (2007-06-09) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1530 Archived-At: Hi! On Sat, Sep 15, 2007 at 11:47:02AM -0400, Charlie Brady wrote: > Yes, runit should reap that status, but that doesn't change the fact that > ssh is wrong. Note also that SIGCHLD is delivered to sshd process, not to > runit, because 14926 terminates before 14925. > > IMO this is a bug in the privilege separation code in openssh. Yep. But this isn't ssh maillist, and I'm not worry much about this ssh bug. I'm worry about zombies. Using this bug as my chance to fix all software over the world which may generate zombies is cool idea, but I've no time for this. Right now, I take another server, also fresh Gentoo installation, without any load, which has 2 day 23 hours uptime and NO zombies. And run this: perl -e '$i=1000; $i-- || exit while fork(); sleep 1' And I got 1000 unreaped zombies. I've reboot this server. And run this again, again and again: perl -e '$i=1000; $i-- || exit while fork(); sleep 1' perl -e '$i=1000; $i-- || exit while fork(); sleep 1' for i in $(seq 1 100); do perl -e '$i=1000; $i-- || exit while fork(); sleep 1'; done for i in $(seq 1 100); do perl -e '$i=1000; $i-- || exit while fork(); sleep 1'; sleep 1; done No effect. All zombies was reaped by runit. Probably I should try to run this every hour... :( maybe this experiment give us additional information. -- WBR, Alex.