From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1490 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Mon, 16 Jul 2007 03:09:28 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070716000927.GY23517@home.power> References: <20070620183532.4571.qmail@9f638fd8b69905.315fe32.mid.smarden.org> <20070623044205.GA1594@home.power> <20070626095920.6195.qmail@3e147d410b1c2c.315fe32.mid.smarden.org> <20070715144704.GS23517@home.power> <20070715190757.GW23517@home.power> <20070715201846.GT3925@run.galis.org> <20070715223553.GU3925@run.galis.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1184544574 4595 80.91.229.12 (16 Jul 2007 00:09:34 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 16 Jul 2007 00:09:34 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1727-gcsg-supervision=m.gmane.org@list.skarnet.org Mon Jul 16 02:09:33 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IAE9a-0001lX-JC for gcsg-supervision@gmane.org; Mon, 16 Jul 2007 02:09:30 +0200 Original-Received: (qmail 17314 invoked by uid 76); 16 Jul 2007 00:09:51 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 17305 invoked from network); 16 Jul 2007 00:09:51 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1490 Archived-At: Hi! On Sun, Jul 15, 2007 at 07:23:13PM -0400, Charlie Brady wrote: > So there are two problems there - the processes which are outliving their > parents, and runit as process 1. Most people here seem to be ignoring the > first problem, and instead are just looking for a magic fix by someone > solving problem 2. Ohh. Okay, okay, I think we all agree with you about 'generating zombies' is a Bad Thing (tm). But real world is slightly different from ideal world. In real world we've a 'zombie processes', which are part of *NIX architecture, and which can't be solved by just stopping generating zombies - because there a lot of existing applications (like OpenSSH) which already generate zombies, and because there exists some cases when zombies may and will be generated anyway. In this situation, the Right Thing is solve this issue between runit and linux kernel. So. If this is a race condition bug in linux kernel 2.6.20, how to debug it? Maybe some sort of patch, which will add some debug printf()'s into both kernel AND runit? Maybe this bug not in kernel, but in glibc's wrapper for wait() or something else? I'm not a C programmer, so it's hard enough for me to debug this myself. :( -- WBR, Alex.