From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1529 Path: news.gmane.org!not-for-mail From: Charlie Brady Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sat, 15 Sep 2007 11:58:07 -0400 (EDT) Message-ID: References: <20070912172245.GF12043@home.power> <20070912181836.GG12043@home.power> <20070912191346.GH12043@home.power> <20070915133641.GA30650@home.power> <20070915135749.GB30650@home.power> <20070915153648.GE30650@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Trace: sea.gmane.org 1189871895 912 80.91.229.12 (15 Sep 2007 15:58:15 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 15 Sep 2007 15:58:15 +0000 (UTC) Cc: supervision@list.skarnet.org To: Alex Efros Original-X-From: supervision-return-1764-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Sep 15 17:58:09 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IWa25-0003tI-Hp for gcsg-supervision@gmane.org; Sat, 15 Sep 2007 17:58:09 +0200 Original-Received: (qmail 11594 invoked by uid 76); 15 Sep 2007 15:58:30 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 11588 invoked from network); 15 Sep 2007 15:58:30 -0000 X-X-Sender: charlieb@e-smith.charlieb.ott.istop.com In-Reply-To: <20070915153648.GE30650@home.power> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1529 Archived-At: On Sat, 15 Sep 2007, Alex Efros wrote: > And, Charlie, I wish to repeat this again: forget about ssh. This issue > isn't related to ssh itself. Obviously I need to repeat myself as well. You have two problems. Various programs, including sshd and your silly cron script, are generating zombies, and also runit as proc 1 is not reaping those zombies. Since nobody is solving the runit problem (which could lie in the kernel), you can minimise your inconvenience by reducing the incidence of zombies reparented to proc 1. Note that some zombies may exist temporarily which are not and will not be reparented to proc 1. Those zombies are processes which have exited whose parents are still running and have not yet reaped the child's status. > # ps -ef axf | tail -n 3 > sshd 14804 1 0 13:50 ? Z 0:00 [sshd] > sshd 14926 1 0 15:23 ? Z 0:00 [sshd] > root 14954 1 0 15:31 pts/1 Z 0:00 [perl] > > Starting from some point (usually after 2-7 days uptime), process N1 stop > reaping zombies. Any zombies. After that point. That's all. Nothing about > ssh in this equation. I assume you mean process 1 when you say process N1. I'm not denying that runit as proc 1 seems to have a problem on your system. But since your problem is accumulated zombies, if you stop generating them, that problem becomes unimportant. If you reduce the generation of zombies, the runit problem becomes less important (or at least less urgent). Have I now made myself clear?