From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1426 Path: news.gmane.org!not-for-mail From: Gerrit Pape Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sun, 3 Jun 2007 11:10:56 +0000 Message-ID: <20070603111056.15978.qmail@3deb4a0e5d8414.315fe32.mid.smarden.org> References: <46561ABE.7030008@podgorny.cz> <20070526103517.GD24895@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="7JfCtLOvnd9MIVvH" X-Trace: sea.gmane.org 1180869039 11106 80.91.229.12 (3 Jun 2007 11:10:39 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 3 Jun 2007 11:10:39 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1663-gcsg-supervision=m.gmane.org@list.skarnet.org Sun Jun 03 13:10:38 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1Hunyo-00006H-FO for gcsg-supervision@gmane.org; Sun, 03 Jun 2007 13:10:38 +0200 Original-Received: (qmail 12867 invoked by uid 76); 3 Jun 2007 11:10:57 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 12860 invoked from network); 3 Jun 2007 11:10:57 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <20070526103517.GD24895@home.power> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1426 Archived-At: --7JfCtLOvnd9MIVvH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sat, May 26, 2007 at 01:35:17PM +0300, Alex Efros wrote: > On Fri, May 25, 2007 at 01:07:42AM +0200, Radek Podgorny wrote: > > I'm experiancing a weird behaviour where runit (1.5.0) does not collect > > zombies. AFAIK the init (pid=1) process should take care of these and > > make them quit "correctly". I know that when zombies show up, the > > initiating application is to blame but in the real world, I need a > > stable server where I don't run out of PIDs after just few hours :-(. > > I've just got same issue. This is first time I see runit does not collect > zombies. Server uptime is 28 days, runit 1.5.0 (upgraded from 1.4.1 at > 21 Apr). Server generate huge amount of short-living processes, so maybe > some integer overflow in runit or something similar result in this issue. > > Right now I've 8114 zombies and user which run all these scripts now > unable to start new processes: > bash: fork: Resource temporarily unavailable > > Looks like I've to reboot to fix this issue for some time. :-( Hi Alex, the runit program didn't change from 1.4.1 to 1.5.0 or 1.5.1, so downgrading should not help. I'm not yet completely sure what the actual problem is, but have an idea. Can you please test the patch below?: cd /package/admin/runit patch -p0 0) + if (child == pid) break; /* reget stderr */ if ((ttyfd =open_write("/dev/console")) != -1) { --7JfCtLOvnd9MIVvH--