From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1420 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sat, 26 May 2007 16:03:15 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070526130315.GG24895@home.power> References: <46561ABE.7030008@podgorny.cz> <20070526103517.GD24895@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1180184602 16727 80.91.229.12 (26 May 2007 13:03:22 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 26 May 2007 13:03:22 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1657-gcsg-supervision=m.gmane.org@list.skarnet.org Sat May 26 15:03:21 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1HrvvR-0001rt-1d for gcsg-supervision@gmane.org; Sat, 26 May 2007 15:03:17 +0200 Original-Received: (qmail 20883 invoked by uid 76); 26 May 2007 13:03:38 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 20877 invoked from network); 26 May 2007 13:03:38 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1420 Archived-At: Hi! On Sat, May 26, 2007 at 08:55:33AM -0400, Charlie Brady wrote: > Post the run script of the service which is creating zombies. SpiderAuto isn't a service. It's just a perl script which is executed by cron. > Do you have a reason for believing that it is runit which is creating and > not reaping zombies rather than a specific service daemon (e.g. > SpiderAuto)? SpiderAuto will generate zombies by it architecture - it work this way: 1) main script start from cron every 1 minute 2) main script analyze current situation - find hang worker processes and kill them, find current tasks in queue and start several worker processes to do these tasks 3) main script exit (and so leave several child/worker processes without parent) Worker processes may work several minutes. I don't thing this architecture is good, but it works for more than 4 years without troubles, and I never seen zombie processes because runit collect them. -- WBR, Alex.