From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1524 Path: news.gmane.org!not-for-mail From: Charlie Brady Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit not collecting zombies Date: Sat, 15 Sep 2007 11:20:57 -0400 (EDT) Message-ID: References: <20070912150047.GD12043@home.power> <20070912172245.GF12043@home.power> <20070912181836.GG12043@home.power> <20070912191346.GH12043@home.power> <20070915133641.GA30650@home.power> <20070915135749.GB30650@home.power> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Trace: sea.gmane.org 1189869672 27675 80.91.229.12 (15 Sep 2007 15:21:12 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 15 Sep 2007 15:21:12 +0000 (UTC) Cc: supervision@list.skarnet.org To: Alex Efros Original-X-From: supervision-return-1759-gcsg-supervision=m.gmane.org@list.skarnet.org Sat Sep 15 17:21:08 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1IWZS8-0002zC-Jx for gcsg-supervision@gmane.org; Sat, 15 Sep 2007 17:21:00 +0200 Original-Received: (qmail 8481 invoked by uid 76); 15 Sep 2007 15:21:21 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 8475 invoked from network); 15 Sep 2007 15:21:21 -0000 X-X-Sender: charlieb@e-smith.charlieb.ott.istop.com In-Reply-To: <20070915135749.GB30650@home.power> Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1524 Archived-At: On Sat, 15 Sep 2007, Alex Efros wrote: > Strace output with all details about PIDs 939 (ssh server), 14803 and 14804 > (unreaped zombie) is here: http://powerman.asdfgroup.com/tmp/ssh_strace.txt Here's (at least part of) your problem: ... [pid 14804] socket(PF_FILE, SOCK_DGRAM, 0) = 6 [pid 14804] fcntl64(6, F_SETFD, FD_CLOEXEC) = 0 [pid 14804] connect(6, {sa_family=AF_FILE, path="/dev/log"}, 110) = -1 ENOENT (No such file or directory) [pid 14804] close(6) = 0 [pid 14804] exit_group(255) = ? Process 14804 detached [pid 14803] <... read resumed> 0x5f9b54fc, 4) = ? ERESTARTSYS (To be restarted) [pid 14803] --- SIGCHLD (Child exited) @ 0 (0) --- [pid 14803] read(6, "", 4) = 0 [pid 14803] exit_group(255) = ? Process 14803 detached ... You are running sshd with privilege separation. Process 14804 is running chrooted into /var/empty. It's trying to syslog to /dev/log in the chroot, and failing, then exiting. Its parent exits without doing waitpid (when it gets a 0 byte read from the pipe to the child. Tell syslog to listen on /var/empty/dev/log and you'll learn more.