From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1340 Path: news.gmane.org!not-for-mail From: =?iso-8859-2?Q?Dra=BEen_Ka=E8ar?= Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: graceful restart under runit Date: Mon, 20 Nov 2006 19:27:33 +0100 Message-ID: <20061120182733.GA629@fly.srk.fer.hr> References: <20061115160850.GA26987@home.power> <20061116152446.GA4721@fly.srk.fer.hr> <20061117001519.GA652@home.power> <20061117133435.GB2153@home.power> <20061118002245.GB17975@home.power> <20061118123120.GA8388@home.power> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1164047339 3752 80.91.229.2 (20 Nov 2006 18:28:59 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 20 Nov 2006 18:28:59 +0000 (UTC) Original-X-From: supervision-return-1576-gcsg-supervision=m.gmane.org@list.skarnet.org Mon Nov 20 19:28:57 2006 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by ciao.gmane.org with smtp (Exim 4.43) id 1GmDsk-0004v6-8h for gcsg-supervision@gmane.org; Mon, 20 Nov 2006 19:28:38 +0100 Original-Received: (qmail 26549 invoked by uid 76); 20 Nov 2006 18:28:56 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 26543 invoked from network); 20 Nov 2006 18:28:56 -0000 Original-To: supervision@list.skarnet.org Mail-Followup-To: =?iso-8859-2?Q?Dra=BEen_Ka=E8ar?= , supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: X-Face: 'UIE}WabGB0+U>p-#(hp<_+AD2{H],=qR*jHfm$/e]l0(kU3oOYc5lqG6gg>[\h^IOc{'siD6#!T&loIShgmYHz3#+*D38:|`~\BE,(W~Ol9BDfDwk'lKJ;Z{sY8E9(ME.E]'wvNO`$n#,;9Z`tOFcW/nHZq!BOSrM>V?C<5DTw=<${c{M2V+|)0jSUl&!+8%8nIBF(u:E>SZWM^e User-Agent: Mutt/1.4i X-Attribution: Dave X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0 (fly.srk.fer.hr [127.0.0.1]); Mon, 20 Nov 2006 19:27:34 +0100 (MET) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1340 Archived-At: Paul Jarc wrote: > So, for zero-unavailability: there is a metaserver which listens on a And then it's just a small matter of implementing the metaserver? :-) It seems a bit complex to me. You'd have to implement a protocol for starting a new metaserver version (which boils down to passing all those file descriptors to the new metaserver) Then you'd need to implement something to take care of metaserver crashes. Probably a way for servers to pass listening sockets back to the new metaserver. Then servers would need a way to wait a bit if they want to restart while the metaserver is being restarted. Maybe a few more things as well. I suppose it's doable, but it seems like a can of worms and races. > While a server is handling connections, it would have to use > select()/poll() to notice activity on either the listening socket or > the filesystem connection; And that isn't very nice. > If the requestor exits, and no other requestors are around to pass the > listening socket to, the metaserver could close it immediately, or > could keep it open for a few seconds to see if a new requestor show > up. So quick, non-overlapping restarts would be transparent to the > end clients. How is it supposed to know that the requestor exited? > To trigger the switchover, you wouldn't need any signals - just make > runsv forget about the old process using Gerrit's patch. When the new > process starts up and connects to the filesystem socket, that will > trigger everything else. I meant to implement (when the time comes) something simpler. Either a FIFO or a Unix domain socket[1] is used as a communications channel for passing the listening socket, but without additional daemons. The new server starts, acquires all resources necessary to run except the listening socket and the PID file, then tries to connect to the file system channel. If there's no writer, it binds to the network socket, writes the PID file, becomes the writer on the file system channel and starts doing its job. If there is a writer, it's supposed to be an already running server instance. Then the new server reads the PID file, signals the running instance and blocks in read on the file system channel. The running instance receives the signal, passes the listening socket, performs whatever cleanup needs to be done[2] and then either exits or waits for the current sessions to finish and then exits. The new server reads the file descriptor, becomes the writer on the file system channel, writes the new PID file and starts doing its job. [1] FIFOs are nasty because O_RDONLY opens block if there are no writers and O_WRONLY opens block if there are no readers. O_NONBLOCK allows a reader to attach without the writer, but it doesn't allow a writer to attach when there is no reader. I'm not sure if the required mumbo-jumbo can be portably done with FIFOs (also, some OSs have bugs in this area IIRC). But if a FIFO isn't good enough, Unix domain socket should suffice. [2] At least it needs to close the listening socket and the writing part of the file system channel to enable the new server to become a writer there. -- .-. .-. Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | | dave@fly.srk.fer.hr