From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1561 Path: news.gmane.org!not-for-mail From: Alex Efros Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit-1.8.0 available Date: Sat, 20 Oct 2007 22:59:50 +0300 Organization: asdfGroup Inc., http://powerman.asdfGroup.com/ Message-ID: <20071020195950.GB25023@home.power> References: <20070921111312.21004.qmail@fd7a06d4d91934.315fe32.mid.smarden.org> <20070922143724.GA1419@home.power> <20070924101904.17022.qmail@42aab7ded663c3.315fe32.mid.smarden.org> <20070926134623.GR21637@home.power> <20070929130351.GC18527@home.power> <20071006054923.GA1665@home.power> <20071013212754.GL1383@home.power> <20071016033818.GE18461@run.duo> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1192910403 1475 80.91.229.12 (20 Oct 2007 20:00:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Oct 2007 20:00:03 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1796-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Oct 20 22:00:03 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IjKUK-0003Kj-UC for gcsg-supervision@gmane.org; Sat, 20 Oct 2007 22:00:01 +0200 Original-Received: (qmail 18521 invoked by uid 76); 20 Oct 2007 20:00:13 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 18516 invoked from network); 20 Oct 2007 20:00:12 -0000 Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <20071016033818.GE18461@run.duo> User-Agent: Mutt/1.5.16 (2007-06-09) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1561 Archived-At: Hi! On Mon, Oct 15, 2007 at 11:38:18PM -0400, George Georgalis wrote: > This thread(s) is so long it's become difficult to follow. Maybe > you could consolidate the important details into a summary. What > is the simplest way to reproduce the problem. What has been tried? > What factors are determined not related. What hypothesis, if any, > for resolution? Only known to me way to reproduce the problem - install new Gentoo server and wait for about a week to see sshd zombies (as result of ssh-worms trying to bruteforce ssh from time to time). Tried? I tried to switch from runit-init to sysvinit, and this solved issue. Also Gerrit suggested a workaround: running 'chmod -x /etc/runit/stopit; kill -CONT 1' on system with unreaped zombies result in two things: first all zombies are reaped, and second runit start reaping zombies again... but after several days it stop reaping zombies again and we need to chmod/kill again. Not related... there several factors determined not related (like grsecurity kernel patches), but that was while I wasn't sure this is bug in runit. Mostly strange thing is this happens as least for two people, at same time after Gentoo upgrade. And that upgrade doesn't touch runit or toolchain - nothing in this upgrade seems suspicious. Only hypothesis I've - this issue related to date/time: it usually happens at same time on all my servers (and looks like this related to global date/time, and not to server uptime), and it usually repeats every 5-7 days. I think easies way to solve this issue - if Gerrit provide test/debug version of runit to me, which for example output it state/actions into log file, and then he'll analyse that log file to find out what is going wrong. Because looks like he unable to find this bug by just looking at the code. -- WBR, Alex.