From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1422 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sat, 2 Jun 2007 17:55:53 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070602145553.GA1496@home.power> References: <46561ABE.7030008@podgorny.cz> <20070526103517.GD24895@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1180796158 9619 80.91.229.12 (2 Jun 2007 14:55:58 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 2 Jun 2007 14:55:58 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1659-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Jun 02 16:55:57 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1HuV1J-00044w-EB for gcsg-supervision@gmane.org; Sat, 02 Jun 2007 16:55:57 +0200 Original-Received: (qmail 19427 invoked by uid 76); 2 Jun 2007 14:56:17 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 19421 invoked from network); 2 Jun 2007 14:56:16 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1422 Archived-At: Hi! On Sat, May 26, 2007 at 01:01:14PM -0400, Paul Jarc wrote: > Alex Efros wrote: > > So only I can provide is partial `ps axf` output: > > Check "ps -ef" to verify that these zombies have 1 as their PPID. It just happens again!! :-( And yeah, all these processes has PPID=1: UID PID PPID C STIME TTY TIME CMD root 1 0 0 May26 ? 00:00:40 runit ... bets 16454 1 0 04:14 ? 00:00:00 [SpiderAuto] bets 22692 1 0 04:15 ? 00:00:00 [SpiderAuto] bets 2027 1 0 04:15 ? 00:00:00 [SpiderAuto] bets 17471 1 0 04:15 ? 00:00:00 [SpiderAuto] ... bets 21649 1 0 09:25 ? 00:00:01 [SpiderAuto] bets 22188 1 0 09:25 ? 00:00:00 [SpiderAuto] ebook 4304 1 0 09:46 ? 00:00:00 [chpst] ebook 30650 1 0 09:51 ? 00:00:00 [chpst] ... ebook 5492 1 0 12:46 ? 00:00:00 [chpst] sshd 20961 1 0 13:01 ? 00:00:00 [sshd] sshd 20915 1 0 13:01 ? 00:00:00 [sshd] sshd 4653 1 0 13:01 ? 00:00:00 [sshd] ... sshd 18475 1 0 13:05 ? 00:00:00 [sshd] sshd 18954 1 0 13:05 ? 00:00:00 [sshd] ebook 13994 1 0 13:11 ? 00:00:00 [chpst] ebook 27178 1 0 13:31 ? 00:00:00 [chpst] ... I've no idea what to do... reboot server every week isn't good idea... rollback to runit-1.4.1 - I'm not sure it will work well with linux kernel 2.6.20 (I remember there was some discussion about issues with runit and newer linux kernel which was fixed in 1.5.0 if I remember correctly). Maybe somebody can provide me with instructions how to debug this issue next time it happens (strace can't attach to runit)? P.S. Not sure is it important, but currently used kernel is 2.6.16, and I wanna upgrade to 2.6.20 soon. -- WBR, Alex.