From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1380 Path: news.gmane.org!not-for-mail From: "Daniel Clark" Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: How to kill runsv, no matter what? Date: Fri, 23 Feb 2007 12:32:37 -0500 Message-ID: <5422d5e60702230932q609f8ea8n76a3856c8b6cb3cc@mail.gmail.com> References: <5422d5e60702211214q7ecaf23co838e9ff1b9be32de@mail.gmail.com> <5422d5e60702211304g5051747aoad3dd893abaf0b16@mail.gmail.com> <5422d5e60702221951h1abb7e60l77717192900a63a8@mail.gmail.com> <20070223140504.17459.qmail@3f646761ee1f68.315fe32.mid.smarden.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1172251968 13758 80.91.229.12 (23 Feb 2007 17:32:48 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 23 Feb 2007 17:32:48 +0000 (UTC) To: supervision@list.skarnet.org Original-X-From: supervision-return-1616-gcsg-supervision=m.gmane.org@list.skarnet.org Fri Feb 23 18:32:42 2007 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by lo.gmane.org with smtp (Exim 4.50) id 1HKeHi-0006mc-4u for gcsg-supervision@gmane.org; Fri, 23 Feb 2007 18:32:42 +0100 Original-Received: (qmail 16439 invoked by uid 76); 23 Feb 2007 17:33:02 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 16434 invoked from network); 23 Feb 2007 17:33:01 -0000 DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=Ct99uLojpKhJ4fLKkbF4B889glw7wKrhy8mPL6HvnLwGx782EIsg6WCckVEdMwkYwLDlGrK5D4BMckeqoG1Hf7H4o+fvIn10yqh6Kz4ai3kpnXslsVE/HOAkDoQREIiKM5EoFRr8b0zvK8QbIDFa9ExGUEo7DKBONOx7RTbbkaY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=Y0XdgpCirPcfa9sI43/liZAB1DbWbgcGx0Du8Jui6iLJjuMW8ETdqlUkQzDwTM3cEKWMW3Xaudic+3xNJvCUzTlZPtgftUoAoNVdmI1MkqanxaovNbTq6TPQCh9X3+CY1RtDtgA8f30u29i8DlAdKabTqog0pGLGMuLhrFyzuAI= Original-Sender: djbclark@gmail.com In-Reply-To: <20070223140504.17459.qmail@3f646761ee1f68.315fe32.mid.smarden.org> Content-Disposition: inline X-Google-Sender-Auth: 6950550f8aec5260 Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1380 Archived-At: On 2/23/07, Gerrit Pape wrote: > On Thu, Feb 22, 2007 at 10:51:50PM -0500, Daniel Clark wrote: > > I made a simple test case that should make this bug (or my error in > > using the software) easy to reproduce. I'm attaching it since it is so > > tiny; it is also available from > > http://opensysadmin.com/bugs/runit/test1-service.tar.bz2 > > > When asked to exit, the runsv supervisor makes sure that all logs are > written to the log service before terminating; it first sends TERM to > the main service, then waits for it to terminate, and finally waits for > the log service to terminate, before runsv exits itself. > > In the case of your example service, the main run script execs into a > shell script that starts a 'sleep' subprocess. Now when runsv is told > to exit, it sends the service (the ./test1-sv.sh shell script) a TERM > signal, the shell script terminates (fine), but is leaving behind the > 'sleep' subprocess. The log service's run script execs into a svlogd > process, svlogd will terminate as soon as it sees end-of-file on the > pipe connected to its standard input. Because there's still the 'sleep' > subprocess running with its output connected to the pipe, and so to > svlogd's standard input, svlogd will wait; it might well be that there's > still data available on the pipe to be written to the logs. Once the > 'sleep' subprocess exits, runsv should exit too. Ah, that makes a lot of sense. However I'm not seeing how this behavior can mesh with package management systems. e.g.: (a) I install an "runit" package, which starts up a runsvdir process (b) I link some services into my runscvdir /var/service directory; I can't really control if those processes start child processes in many cases; let's say there is a service like my example service among the services (in practice, I'm guessing there is probably some way I can get my shell script to capture TERM and kill the 'sleep' process before exiting itself) (d) I remove the "runit" package. Since I am no longer going to have "runit" installed, I think it follows that all "runit" processes, such as svlogd, need to be gracefully shut down, no matter what their state. (e) Runit is removed, but there are some svlogd processes still around, and therefore also still some files tracking runit state in my /etc/sv directory (f) I install Runit again. (g) I want to re-enable my service, so I again link the service into my /var/service directory. However since there is still a svlogd process running (or I killed it manually), there is still lingering state information in /etc/sv, so runit is confused and complains. So I guess my question is, is there any way to handle the install-remove-install case cleanly with runit? In practice this may not be an issue, but I'm running into it all the time in testing. The previously running svlogd causes failure in 2 ways: (a) the state in /etc/sv/servicename confuses runit, and (b) it wants to write to the same log file as any new svlogd daemons that start up. Actually, wouldn't this also be a problem if I just wanted to force-restart a service that spawns child processes? If the service is restarted but the old logging daemon doesn't get force-killed, don't I run into the same situation as with the install-remove-install (2 conflicting svlogd processes)? Thanks, -- Daniel Clark # http://dclark.us # http://opensysadmin.com