From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1498 Path: news.gmane.org!not-for-mail From: Charlie Brady Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Wed, 12 Sep 2007 10:55:18 -0400 (EDT) Message-ID: References: <20070715190757.GW23517@home.power> <20070715201846.GT3925@run.galis.org> <20070715223553.GU3925@run.galis.org> <20070716000927.GY23517@home.power> <47939.::ffff:77.75.72.5.1189601606.squirrel@mail.podgorny.cz> <20070912143557.GC12043@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Trace: sea.gmane.org 1189608938 31304 80.91.229.12 (12 Sep 2007 14:55:38 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 12 Sep 2007 14:55:38 +0000 (UTC) Cc: supervision@list.skarnet.org To: Alex Efros Original-X-From: supervision-return-1733-gcsg-supervision=m.gmane.org@list.skarnet.org Wed Sep 12 16:55:29 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IVTce-0007Pb-5f for gcsg-supervision@gmane.org; Wed, 12 Sep 2007 16:55:20 +0200 Original-Received: (qmail 2135 invoked by uid 76); 12 Sep 2007 14:55:41 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 2126 invoked from network); 12 Sep 2007 14:55:41 -0000 X-X-Sender: charlieb@e-smith.charlieb.ott.istop.com In-Reply-To: <20070912143557.GC12043@home.power> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1498 Archived-At: On Wed, 12 Sep 2007, Alex Efros wrote: > We all already know what you think about this issue. There IS a bug > somewhere (runit/kernel/somewhere else) and you don't help us to fix it. > The idea is: no matter what user are doing, there shouldn't be increasing > number of unreaped zombies in the system. Sure, but until more detail is known about that exact circumstances where such unreaped zombies appear there's little chance that anyone can fix the bug. > ... because there different software which also > produce unreaped zombies (like ssh). You keep saying that, but I continue to doubt it. If you can document that that occurs, I'm sure that the ssh maintainers will want to fix the bug. > Your recommendation sounds like 'start less short-living processes', which > is idiocy! No, that's not my recommendation. My recommendation is that you do not deploy software which creates zombies. > Server should work, and if it work is to run a lot of > short-living processes - then it should do this in reliable manner without > requiring reboot every several days. Agreed. Short-living processes are fine, and if their parent process reaps their status, they won't become zombies. > Sorry for my emotions - now I've a > lot of Linux servers which work just like Windows - from reboot to reboot - > and that makes me a little angry... My advice is that you don't get angry, but you fix the problem. Please go back to the discussion of your cron script on June 12. I still can't see any reason why you are using cron. Just run runsvdir as a supervised process. Your process tree will be something like: runit \_ runsvdir -P /service log: .................. \_ runsv soft.p \_ runsvdir -P /var/www/soft.p/html/.lib/service \_ runsv soft.p.service1 \_ service1 \_ runsv soft.p.service2 \_ service2 \_ runsv soft.p.service3 \_ service3 \_ runsv soft.q \_ runsvdir -P /var/www/soft.q/html/.lib/service \_ runsv dnscache ... Trying to start a new unsupervised runsvdir in /var/www/soft.p/html/.lib/service via cron (as you were doing) is just asking for troube - as well as doing lots of unnecessary work.