From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1521 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sat, 15 Sep 2007 16:36:42 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20070915133641.GA30650@home.power> References: <20070912143557.GC12043@home.power> <20070912150047.GD12043@home.power> <20070912172245.GF12043@home.power> <20070912181836.GG12043@home.power> <20070912191346.GH12043@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1189863412 11038 80.91.229.12 (15 Sep 2007 13:36:52 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 15 Sep 2007 13:36:52 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1756-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Sep 15 15:36:49 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IWXpG-0008F2-1p for gcsg-supervision@gmane.org; Sat, 15 Sep 2007 15:36:46 +0200 Original-Received: (qmail 959 invoked by uid 76); 15 Sep 2007 13:37:05 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 951 invoked from network); 15 Sep 2007 13:37:05 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.16 (2007-06-09) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1521 Archived-At: Hi! On Wed, Sep 12, 2007 at 03:18:02PM -0400, Charlie Brady wrote: >> I don't see how fixing ssh will solve my issue with servers, but I'll try >> to gather more information about ssh next time this issue happens on my >> servers. > It won't, but if you can fix it it will reduce the severity of your problem > with runit process 1. If you fix your runsvdir related cron job problem > (which leaves all the chpst zombies), then that will further reduce the > severity of your problem. Ok, and here is a first results. I've two unused dedicated servers - we buy them, I've installed Gentoo, and they wait until I'll install our projects there. I've installed sysvinit on one of these servers, and reboot BOTH servers, so they've same uptime, they're on same hosting, they are 100% equal expect different IP/MAC and sysvinit/runit. Now, server with runit has 350 ssh zombies (it has only ssh zombies because I've not installed our project with cron/chpst, etc.). Server with sysvinit has no zombies yet. Full `ps -ef axf` output here: http://powerman.asdfgroup.com/tmp/ps.txt I've started `strace -f -ff -p PID` (you can see it in `ps` output), but for now there no connections to ssh, so it's output is empty yet. -- WBR, Alex.