From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1460 Path: news.gmane.org!not-for-mail From: Radek Podgorny Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Mon, 02 Jul 2007 13:23:23 +0200 Message-ID: <4688E02B.70108@podgorny.cz> References: <20070526103517.GD24895@home.power> <20070603111056.15978.qmail@3deb4a0e5d8414.315fe32.mid.smarden.org> <20070611131112.GA1576@home.power> <20070618134516.GA1560@home.power> <20070619181325.23252.qmail@a92f927aabd53f.315fe32.mid.smarden.org> <20070619190751.GC27090@home.power> <20070620162325.26345.qmail@7d91355cde742c.315fe32.mid.smarden.org> <20070620165736.GC12963@home.power> <20070620183532.4571.qmail@9f638fd8b69905.315fe32.mid.smarden.org> <46876927.5020108@podgorny.cz> <20070702082801.27191.qmail@b7ca43d472c5fa.315fe32.mid.smarden.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1183375407 7472 80.91.229.12 (2 Jul 2007 11:23:27 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 2 Jul 2007 11:23:27 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1697-gcsg-supervision=m.gmane.org@list.skarnet.org Mon Jul 02 13:23:26 2007 connect(): Connection refused Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1I5K04-00043g-TF for gcsg-supervision@gmane.org; Mon, 02 Jul 2007 13:23:24 +0200 Original-Received: (qmail 2721 invoked by uid 76); 2 Jul 2007 11:23:46 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 2711 invoked from network); 2 Jul 2007 11:23:45 -0000 User-Agent: Thunderbird 2.0.0.4 (X11/20070615) In-Reply-To: <20070702082801.27191.qmail@b7ca43d472c5fa.315fe32.mid.smarden.org> X-Enigmail-Version: 0.95.1 Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1460 Archived-At: Well, actually I'm the original poster. Unfortunately I can't do any tests since the affected are all in production and on my testing setups, the problem does not occur. :-( The number of processes doesn't have to be "huge" and they don't need to be "short lived" either (AFAIK). The parent pid of the zombies is 1. As I said before, I can't do any thorough testing so I can't give you much feedback about the patches. I can only gather "passive" info (versions, ...). All my systems are Gentoo. Some of them are amd64, some x86. The problem appears on both architectures. My "unstable" laptop is amd64 and does not suffer from the problem. So it may seem to be problem of different versions of packages (glibc, whatever...). Unfortunately, some of the "stable" system do not have the problem. :-( So the only difference may be the kernel which I can check if you want... ...or something completely different. :-( Sincerely Radek Podgorny Gerrit Pape wrote: > On Sun, Jul 01, 2007 at 10:43:19AM +0200, Radek Podgorny wrote: >> Hi! What is the status of this? Did the "reap zombies every 5secs" >> help? One of my servers just passed away again and I really need this >> issue to be fixed... :-( > > Not sure, no response from Alex yet. I didn't knew that there's someone > else having this problem. What triggers the problem on your system?, > also a huge amount of short running processes that detached to have > parent pid 1? Did you check ppid of the zombies? Can you give some > information on how it can be reproduced? > > Thanks, Gerrit. >