From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1417 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sat, 26 May 2007 13:35:17 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070526103517.GD24895@home.power> References: <46561ABE.7030008@podgorny.cz> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1180175736 22113 80.91.229.12 (26 May 2007 10:35:36 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 26 May 2007 10:35:36 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1654-gcsg-supervision=m.gmane.org@list.skarnet.org Sat May 26 12:35:35 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1HrtcU-00072W-M0 for gcsg-supervision@gmane.org; Sat, 26 May 2007 12:35:34 +0200 Original-Received: (qmail 11928 invoked by uid 76); 26 May 2007 10:35:54 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 11922 invoked from network); 26 May 2007 10:35:54 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <46561ABE.7030008@podgorny.cz> User-Agent: Mutt/1.5.13 (2006-08-11) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1417 Archived-At: Hi! On Fri, May 25, 2007 at 01:07:42AM +0200, Radek Podgorny wrote: > I'm experiancing a weird behaviour where runit (1.5.0) does not collect > zombies. AFAIK the init (pid=1) process should take care of these and > make them quit "correctly". I know that when zombies show up, the > initiating application is to blame but in the real world, I need a > stable server where I don't run out of PIDs after just few hours :-(. I've just got same issue. This is first time I see runit does not collect zombies. Server uptime is 28 days, runit 1.5.0 (upgraded from 1.4.1 at 21 Apr). Server generate huge amount of short-living processes, so maybe some integer overflow in runit or something similar result in this issue. Right now I've 8114 zombies and user which run all these scripts now unable to start new processes: bash: fork: Resource temporarily unavailable Looks like I've to reboot to fix this issue for some time. :-( I've no idea how to provide more information, strace won't working: # strace -p 1 attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted So only I can provide is partial `ps axf` output: PID TTY STAT TIME COMMAND 1 ? T 2:59 runit 2 ? SN 0:00 [ksoftirqd/0] 3 ? S< 0:00 [events/0] 4 ? S< 0:00 [khelper] 5 ? S< 0:00 [kthread] 7 ? S< 0:34 \_ [kblockd/0] 61 ? S< 0:00 \_ [aio/0] 132 ? S< 0:00 \_ [kseriod] 161 ? S< 0:00 \_ [kpsmoused] 174 ? S< 0:01 \_ [reiserfs/0] 16498 ? S 0:08 \_ [pdflush] 26144 ? S 0:03 \_ [pdflush] 60 ? S 1:53 [kswapd0] 1091 ? Ss 0:00 /bin/sh /etc/runit/2 24888 ? S 0:00 \_ runsvdir /var/service log: ...................... ... 17454 ? ZN 0:00 [SpiderAuto] 5187 ? ZN 0:00 [SpiderAuto] 22364 ? ZN 0:00 [SpiderAuto] 18907 ? ZN 0:00 [SpiderAuto] ... -- WBR, Alex.