From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/2071 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: [LONG] Re: runit not collecting zombies Date: Tue, 15 Feb 2011 17:00:25 +0200 Organization: http://powerman.name/ Message-ID: <20110215150025.GB3430@home.power> References: <20070912181836.GG12043@home.power> <20070912191346.GH12043@home.power> <20070915133641.GA30650@home.power> <20070917075651.8280.qmail@f6989948e15a99.315fe32.mid.smarden.org> <20070917115924.GB1531@home.power> <20070918081441.20488.qmail@1a6f0ddc0befcc.315fe32.mid.smarden.org> <20110215131218.GA18284@skarnet.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1297782036 923 80.91.229.12 (15 Feb 2011 15:00:36 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 15 Feb 2011 15:00:36 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-2305-gcsg-supervision=m.gmane.org@list.skarnet.org Tue Feb 15 16:00:31 2011 Return-path: Envelope-to: gcsg-supervision@lo.gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1PpMOE-0008Bl-Av for gcsg-supervision@lo.gmane.org; Tue, 15 Feb 2011 16:00:30 +0100 Original-Received: (qmail 12422 invoked by uid 76); 15 Feb 2011 15:02:49 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 12414 invoked from network); 15 Feb 2011 15:02:49 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <20110215131218.GA18284@skarnet.org> User-Agent: Mutt/1.5.20 (2009-06-14) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:2071 Archived-At: Hi! On Tue, Feb 15, 2011 at 02:12:18PM +0100, Laurent Bercot wrote: > runit ran perfectly without polling for lots of people except Radek and > Alex. Until Gerrit had to add a polling mechanism just for them. AFAIR this polling mechanism doesn't solved issue for me, but I may be wrong because that was long time ago. Anyway, I'm still using `kill -CONT 1` hack in /etc/cron.hourly/ to work around this issue on all my systems. > I ran the following command while stracing my own process 1 (s4-svscan, > which does not poll) on a Linux 2.6.36.1 kernel: Again, this was long time ago and I may be wrong, but AFAIR this simple trick with two processes wasn't correct example to reproduce this issue. > The problem Radek and Alex had was most likely caused by a kernel bug: > in some cases, when a zombie is reparented to process 1, process 1 does > not get notified with a SIGCHLD, as it should be. Ok, there no harm to trying. I repeated your test with strace - on my current system runit got SIGCHLD. I'm using kernel 2.6.36-hardened-r9 and runit 2.0.0. I've just switched off hourly `kill -CONT 1` workaround, so we'll see is everything fine in a couple of days. If there will be no growing army of zombies on my system after that, I'll be glad to test modified runit version without polling if someone send me the patch. -- WBR, Alex.