From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1341 Path: news.gmane.org!not-for-mail From: prj@po.cwru.edu (Paul Jarc) Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: graceful restart under runit Date: Mon, 20 Nov 2006 14:32:40 -0500 Organization: What did you have in mind? A short, blunt, human pyramid? Message-ID: References: <20061115160850.GA26987@home.power> <20061116152446.GA4721@fly.srk.fer.hr> <20061117001519.GA652@home.power> <20061117133435.GB2153@home.power> <20061118002245.GB17975@home.power> <20061118123120.GA8388@home.power> <20061120182733.GA629@fly.srk.fer.hr> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1164051217 18651 80.91.229.2 (20 Nov 2006 19:33:37 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 20 Nov 2006 19:33:37 +0000 (UTC) Cc: supervision@list.skarnet.org Original-X-From: supervision-return-1577-gcsg-supervision=m.gmane.org@list.skarnet.org Mon Nov 20 20:33:26 2006 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by ciao.gmane.org with smtp (Exim 4.43) id 1GmEsp-0004Bt-D1 for gcsg-supervision@gmane.org; Mon, 20 Nov 2006 20:32:48 +0100 Original-Received: (qmail 30217 invoked by uid 76); 20 Nov 2006 19:33:08 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 30212 invoked from network); 20 Nov 2006 19:33:08 -0000 Original-To: =?iso-8859-2?Q?Dra=BEen_Ka=E8ar?= In-Reply-To: <20061120182733.GA629@fly.srk.fer.hr> (=?iso-8859-2?Q?Dra=BEe?= =?iso-8859-2?Q?n_Ka=E8ar's?= message of "Mon, 20 Nov 2006 19:27:33 +0100") Mail-Copies-To: nobody Mail-Followup-To: =?iso-8859-2?Q?Dra=BEen_Ka=E8ar?= , supervision@list.skarnet.org Original-Lines: 101 User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.4 (gnu/linux) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1341 Archived-At: Dra=BEen Ka=E8ar wrote: > And then it's just a small matter of implementing the metaserver? :-) Right. :) > It seems a bit complex to me. You'd have to implement a protocol for > starting a new metaserver version (which boils down to passing all > those file descriptors to the new metaserver) Well, it depends how much downtime you can tolerate. Restarting the metaserver would probably be pretty infrequent - less frequent than restarting the servers that use it - so you might accept some downtime in that event for the sake of making the metaserver simpler. But actually, it could be fairly simple if you're willing to restart all other services when you restart the metaserver. A new instance of the metaserver could request listening sockets from the old one using the same method that other servers use. First it would connect to the old metaserver through the filesystem socket, then listen on a new filesystem socket, and rename() that to atomically replace the old one. Then, since the old metaserver has passed the listening sockets to a new process, it will revoke its leases to the old servers. They will all exit and be automatically restarted, re-requesting their sockets from the new metaserver. But the listening sockets will never be completely closed through all this, so connections will not be rejected. The only new kind of conversation needed over the filesystem connection is for the new metaserver to ask for all open connections, instead of individual connections that may or may not already be open. There is a race condition here if two new metaservers start at the same time when there is no old metaserver already running, but it only results in an extra process hanging around doing nothing, which isn't harmful. > Then you'd need to implement something to take care of metaserver crashes. > Probably a way for servers to pass listening sockets back to the > new metaserver. I think that's beyond the point of diminishing returns. The problem can never be completely solved, since the metaserver and other servers could crash at the same time, or you could lose power, etc. You have to give up at some point. > Then servers would need a way to wait a bit if they want to restart while > the metaserver is being restarted. They could just exit and let supervise/runsv restart them. >> While a server is handling connections, it would have to use >> select()/poll() to notice activity on either the listening socket or >> the filesystem connection; > > And that isn't very nice. Well, I'd probably do that anyway, if I wanted to handle signals, since I'd use the self-pipe technique to notice when signals arrived. >> If the requestor exits, and no other requestors are around to pass the >> listening socket to, the metaserver could close it immediately > > How is it supposed to know that the requestor exited? The connection over the filesystem socket would be closed. That could happen without the server exiting, but the metaserver can treat both cases the same way. If a server closes the filesystem connection and still expects to accept new connections, it's misbehaving. > I meant to implement (when the time comes) something simpler. Either a > FIFO or a Unix domain socket[1] is used as a communications channel for > passing the listening socket, but without additional daemons. That was my first thought too, but I couldn't come up with any satisfying way to handle the race conditions gracefully. Open file descriptors can only be passed over sockets, not pipes. Also, using a socket means you have two-way communication, so you don't need signals or PID files, which are subject to race conditions. But without signals, you'll still have to use select()/poll() even with all the functionality contained in one program, or else when you start a new server to replace the old one, the old one will wait indefinitely for one more client connection before waking up and noticing that it should hand the listening sockets over to the new server. One problem with filesystem sockets is that you have to unlink the socket before listening on it, so the operation "listen on this socket, which may or may not already exist" isn't atomic. If two processes start at the same time, one of them can delete the other's socket without knowing that anything was listening on it. So it may be useful to atomically acquire some other dummy resource as a mutual-exclusion checkpoint before listening on the filesystem socket. Another benefit of making the metaserver a separate program: you can also write a library for LD_PRELOAD that masks the listen() function to make existing programs use the metaserver instead of opening their listening sockets directly. paul