From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1439 Path: news.gmane.org!not-for-mail From: Gerrit Pape Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Tue, 19 Jun 2007 18:13:25 +0000 Message-ID: <20070619181325.23252.qmail@a92f927aabd53f.315fe32.mid.smarden.org> References: <46561ABE.7030008@podgorny.cz> <20070526103517.GD24895@home.power> <20070603111056.15978.qmail@3deb4a0e5d8414.315fe32.mid.smarden.org> <20070611131112.GA1576@home.power> <20070618134516.GA1560@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1182276791 14498 80.91.229.12 (19 Jun 2007 18:13:11 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 19 Jun 2007 18:13:11 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1676-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Jun 19 20:13:08 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1I0iCR-0006vH-Cd for gcsg-supervision@gmane.org; Tue, 19 Jun 2007 20:13:07 +0200 Original-Received: (qmail 25869 invoked by uid 76); 19 Jun 2007 18:13:27 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 25864 invoked from network); 19 Jun 2007 18:13:27 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <20070618134516.GA1560@home.power> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1439 Archived-At: On Mon, Jun 18, 2007 at 04:45:16PM +0300, Alex Efros wrote: > On Mon, Jun 11, 2007 at 04:11:12PM +0300, Alex Efros wrote: > > > Can you please test the patch below?: > > > > I've just installed runit-1.7.2 with that patch on my servers. I think if > > after a week or two there will be no zombies, then it working. In last > > week I didn't install it because I gather some statistics: looks like my > > servers start producing uncollected zombies after ~3 days of work. > > No, patch don't fixed this bug. :( > > I've just rebooted my home workstation because of this issue (~6 days > uptime), and my servers (~2.5 days uptime) already started generating > uncollected zombies (640 zombies on one server, 160 on another), so I > expect I should reboot them in about 10-12 hours. > > Rebooting servers every 2-3 days in unacceptable! I need instructions > how to help you debug and fix this issue. Hi Alex, after checking the code, I currently cannot say that or how runit could fail reaping zombies that detached and re-parented to pid 1. On Linux running strace on pid 1 isn't supported AFAIK. To be sure that runit is at fault, can you please check the kernel versions on your two machines, can it be that they have changed at the time the problem popped up? Does upgrading the Linux kernel to a more recent version change anything? Thanks, Gerrit.