From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.sysutils.supervision.general/1346 Path: news.gmane.org!not-for-mail From: =?iso-8859-2?Q?Dra=BEen_Ka=E8ar?= Newsgroups: gmane.comp.sysutils.supervision.general Subject: Re: graceful restart under runit Date: Thu, 23 Nov 2006 13:25:57 +0100 Message-ID: <20061123122557.GA17067@fly.srk.fer.hr> References: <20061117133435.GB2153@home.power> <20061118002245.GB17975@home.power> <20061118123120.GA8388@home.power> <20061120182733.GA629@fly.srk.fer.hr> <20061122192506.GA24958@fly.srk.fer.hr> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1164284771 26860 80.91.229.2 (23 Nov 2006 12:26:11 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 23 Nov 2006 12:26:11 +0000 (UTC) Original-X-From: supervision-return-1582-gcsg-supervision=m.gmane.org@list.skarnet.org Thu Nov 23 13:26:08 2006 Return-path: Envelope-to: gcsg-supervision@gmane.org Original-Received: from antah.skarnet.org ([212.85.147.14]) by ciao.gmane.org with smtp (Exim 4.43) id 1GnDeU-0008QW-Or for gcsg-supervision@gmane.org; Thu, 23 Nov 2006 13:26:03 +0100 Original-Received: (qmail 29781 invoked by uid 76); 23 Nov 2006 12:26:22 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Archive: Original-Received: (qmail 29775 invoked from network); 23 Nov 2006 12:26:22 -0000 Original-To: supervision@list.skarnet.org Mail-Followup-To: =?iso-8859-2?Q?Dra=BEen_Ka=E8ar?= , supervision@list.skarnet.org Content-Disposition: inline In-Reply-To: X-Face: 'UIE}WabGB0+U>p-#(hp<_+AD2{H],=qR*jHfm$/e]l0(kU3oOYc5lqG6gg>[\h^IOc{'siD6#!T&loIShgmYHz3#+*D38:|`~\BE,(W~Ol9BDfDwk'lKJ;Z{sY8E9(ME.E]'wvNO`$n#,;9Z`tOFcW/nHZq!BOSrM>V?C<5DTw=<${c{M2V+|)0jSUl&!+8%8nIBF(u:E>SZWM^e User-Agent: Mutt/1.4i X-Attribution: Dave X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0 (fly.srk.fer.hr [127.0.0.1]); Thu, 23 Nov 2006 13:25:58 +0100 (MET) Xref: news.gmane.org gmane.comp.sysutils.supervision.general:1346 Archived-At: Paul Jarc wrote: > Dražen Kačar wrote: > > Paul Jarc wrote: > >> Also, using a socket means you have two-way communication, so you > >> don't need signals or PID files, which are subject to race conditions. > > > > Files maybe are, but signals? > > Unless you're sending signals to your own child process, there's a > chance that the process you're signaling has already died, and its PID > has been reused for a new process. This is true no matter how you > obtain the PID to send signals to; PID files are just one case of > that. The only exception is for the parent, which knows that the > child's PID hasn't been recycled because, even if the child has > exited, the parent hasn't wait()ed for the child yet. Ah, that. Well, you'd just have to rely on the usually-not-documented OS feature. PIDs are not recycled fast in practice, so that would have to be good enough. Somewhat unportable guarantee could be obtained via /proc. You know the PID, so you stop the process via /proc or ptrace() or whatever is available for debuggers (something will be available), check that the PID is associated with the correct executable via /proc, send your signal (now the process won't go away) and then detach from the process. Checking whether the PID corresponds to the correct executable is the messy part and I don't know if it can be handled in a reasonable way for this purpose. I'd just live with the race condition and rely on the OS not to reuse PIDs too fast. > > A server binds to the file system socket after it got the network socket, > > either by a direct bind() or by a passover from an existing server. That > > should be enough, I think. It's not atomic, but it's a locking protocol. > > Actually, obtaining the network socket can be atomic enough - bind() I meant for the whole thing: obtain the network socket and then obtain the file system socket. Mandating that the network socket must be obtained first and file system socket second is a locking protocol. And the first lock in the locking protocol must be atomic. > So I guess there's no big advantage either way between the metaserver > vs. keeping all the handoff functionality in one program. Unless LD_PRELOAD method can work. Then the metaserver has a distinct advantage for those who need it. > > That's a good one. But shouldn't that mask the bind() function? > > Probably, but it won't actually work either way, since the server > needs to notice traffic on the filesystem connection. > > > As for the lease problem, couldn't metaserver just SIGTERM the existing > > server? > > That suffers from the PID-recycling problem above. Well, I'd just ignore that problem. Or go through unportable interface for the debuggers when feeling paranoid. -- .-. .-. Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | | dave@fly.srk.fer.hr