From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/327 Path: main.gmane.org!not-for-mail From: Gerrit Pape Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: runit-1.0.0 release Date: Thu, 19 Feb 2004 16:15:23 +0000 Message-ID: <20040219161550.3252.qmail@32660119fc094a.315fe32.mid.smarden.org> References: <20040210153548.16191.qmail@56dafcc887170f.315fe32.mid.smarden.org> <20040211192355.17755561@rad1.109bean.org.uk> <20040211204212.12241.qmail@771d3fc1654724.315fe32.mid.smarden.org> <20040212174120.26bcb697@rad1.109bean.org.uk> <20040212204520.26049.qmail@c032e53e4f6f05.315fe32.mid.smarden.org> <20040212211510.GA11016@socomep> <20040214122150.5818.qmail@8f3ccfd96ce13e.315fe32.mid.smarden.org> <40313E78.30705@geeks.cl> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1077207348 6186 80.91.224.253 (19 Feb 2004 16:15:48 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 19 Feb 2004 16:15:48 +0000 (UTC) Original-X-From: supervision-return-565-gcsg-supervision=m.gmane.org@list.skarnet.org Thu Feb 19 17:15:33 2004 Return-path: Original-Received: from antah.skarnet.org ([212.43.221.114]) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1Atqpp-0001F7-00 for ; Thu, 19 Feb 2004 17:15:33 +0100 Original-Received: (qmail 8393 invoked by uid 76); 19 Feb 2004 16:15:50 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 8388 invoked from network); 19 Feb 2004 16:15:50 -0000 Original-To: supervision@list.skarnet.org Mail-Followup-To: supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: <40313E78.30705@geeks.cl> Xref: main.gmane.org gmane.comp.sysutils.supervision.general:327 X-Report-Spam: http://spam.gmane.org/gmane.comp.sysutils.supervision.general:327 On Mon, Feb 16, 2004 at 07:04:40PM -0300, Alejandro Mery wrote: > Gerrit Pape wrote: > >On Thu, Feb 12, 2004 at 06:15:10PM -0300, Alejandro Mery wrote: > >>i'm just waiting for the last basic tools of sysvinit, for some > >What exactly are they? I think runit is rather complete, you don't > >necessarily need pidof or local login accounting. Also sulogin is not > >mandatory, you could also use some getty, e.g. fgetty. > well only reboot and halt/poweroff are a must have. because they need to > run inside stages 3 to actually do the thing. If stage 3 exits without halting or rebooting the kernel, the runit program should finally do this. Did you try it out? > >>missing tools from daemontools like fghack (damn nullmailer) and to > >Which ones beside fghack? I personally don't use fghack, as it doesn't > i don't know how fghack works but can i use it to warranty the restart > of the daemon if it crashes? can i use it to kill the daemon when needed? Yes, it will be restarted, no, you cannot control the daemon through signals. > >>finily trace why runit-3 waits for ever the active ssh sessions to > >This is a feature. Most probably you run sshd with a log service. > >runsv, when asked to exit, makes sure that all logs get written by the > >log service it additionally monitors for this service. This is one > >reason why the service directory and the service/log directory are > >monitored by a single supervisor with runit. > > > >If runsv is told to take the sshd service down, it sends a term signal > >to the main sshd process. The main sshd process then exits, but it has > >spawned children for active ssh connections, and leaves them alone. If > >you run the sshd service with a appendant log service, the children have > >inherited filedescriptor 1, which is connected to the log service > >through a pipe. The sshd/log service waits for data as long as there > >still are processes possibly writing data; you can set a timeout through > >svwaitdown though. > > > i have... > echo 'Waiting for services to stop...' > svwaitdown -xk -t350 /service/* > > and i got a *very* unconfortable state when one box had to reboot but it > passed the whole weekend in stage 3 just because one damn user didn't > close his ssh client. Yes, I now see that this is a problem. svwaitdown doesn't properly complete its task. It sends a kill signal to the service daemon after the timeout, but doesn't stop the log service explicitly. For daemons like sshd that fork children which detach and demonize, there possibly still are processes left with a file descriptor to the log pipe open. But why is your stage 3 script still waiting? svwaitdown has returned, so stage 3 should continue and finally exit, causing runit to halt or reboot the kernel. > there is a change to fix svwaitdown to signal grandsons after the Unfortunately sshd creates children so that the supervisor cannot know the process ids of its grandchildren, so killing them does not work, blame the service daemon. But I can change svwaitdown to explicitly take the log service down after the timeout. For now you can run runsvctrl term /var/service/sshd/log right after svwaitdown in your stage 3 script. > >Without a sshd/log service, the behavior should change, but you lose the > >guarantee for your log. > as i see it i completely lost the log Ups, how that? Regards, Gerrit.