From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1447 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Wed, 20 Jun 2007 19:57:36 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070620165736.GC12963@home.power> References: <46561ABE.7030008@podgorny.cz> <20070526103517.GD24895@home.power> <20070603111056.15978.qmail@3deb4a0e5d8414.315fe32.mid.smarden.org> <20070611131112.GA1576@home.power> <20070618134516.GA1560@home.power> <20070619181325.23252.qmail@a92f927aabd53f.315fe32.mid.smarden.org> <20070619190751.GC27090@home.power> <20070620162325.26345.qmail@7d91355cde742c.315fe32.mid.smarden.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1182358662 15023 80.91.229.12 (20 Jun 2007 16:57:42 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 20 Jun 2007 16:57:42 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1684-gcsg-supervision=m.gmane.org@list.skarnet.org Wed Jun 20 18:57:40 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1I13Uw-00078f-Vy for gcsg-supervision@gmane.org; Wed, 20 Jun 2007 18:57:39 +0200 Original-Received: (qmail 32028 invoked by uid 76); 20 Jun 2007 16:58:00 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 32007 invoked from network); 20 Jun 2007 16:58:00 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <20070620162325.26345.qmail@7d91355cde742c.315fe32.mid.smarden.org> User-Agent: Mutt/1.5.13 (2006-08-11) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1447 Archived-At: Hi! On Wed, Jun 20, 2007 at 04:23:25PM +0000, Gerrit Pape wrote: > # gcc test.c > # ./a.out This test exiting without leaving zombies and don't output anything on my home workstation (if you remember, I had to reboot workstation because of same issue few days ago). But for now this issue don't happens on workstation (yet, I think - uptime is just 2 days and it doesn't generate new processes as often as servers). Then I've executed this test on server, which already has this issue, but it don't have up to 8192 zombies for single user account and so I don't rebooted it yet. Before running test server has: # date; ps ax | grep Z | wc Wed Jun 20 16:42:18 GMT 2007 1259 7555 55496 test has printed several 'f', here is full output: $ ./a.out f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f $ and now there a lot of zombies: # date; ps ax | grep Z | wc Wed Jun 20 16:42:39 GMT 2007 17586 105517 790218 Several minutes later situation doesn't changed: # date; ps ax | grep Z | wc Wed Jun 20 16:49:04 GMT 2007 17587 105523 790263 > If not, can you provide this service daemon that produced these amount > of detached short-living processes? On my home workstation most of zombie processes was 'chpst' executed by dcron every 1 minute using lines like this one: */1 * * * * ( cd /var/www/soft.p/html && exec chpst -L .lib/var/.lock.service runsvdir .lib/service/ &>/dev/null ) & (I use runsvdir to run services in my web projects, and only way to guarantee these services will be started after reboot is cron configuration like this one - I don't like to use root access to start services for web projects.) Also I see a lot of zombie 'sshd' on my servers. So, I don't think this issue is in my perl scripts or other applications - it's somewhere in runit and/or kernel. > And I have another patch to try attached. Thanks, I'll try it. If I understand correctly, I should try this patch instead of previous, not together with previous..? -- WBR, Alex.