Re: graceful restart under runit

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

From: prj@po.cwru.edu (Paul Jarc)
Cc: supervision@list.skarnet.org
Subject: Re: graceful restart under runit
Date: Mon, 20 Nov 2006 14:32:40 -0500	[thread overview]
Message-ID: <m3zmalc3rk.fsf@multivac.cwru.edu> (raw)
In-Reply-To: <20061120182733.GA629@fly.srk.fer.hr> (=?iso-8859-2?Q?Dra=BEe?= =?iso-8859-2?Q?n_Ka=E8ar's?= message of "Mon, 20 Nov 2006 19:27:33 +0100")

Dražen Kačar <dave@fly.srk.fer.hr> wrote:
> And then it's just a small matter of implementing the metaserver? :-)

Right. :)

> It seems a bit complex to me. You'd have to implement a protocol for
> starting a new metaserver version (which boils down to passing all
> those file descriptors to the new metaserver)

Well, it depends how much downtime you can tolerate.  Restarting the
metaserver would probably be pretty infrequent - less frequent than
restarting the servers that use it - so you might accept some downtime
in that event for the sake of making the metaserver simpler.

But actually, it could be fairly simple if you're willing to restart
all other services when you restart the metaserver.  A new instance of
the metaserver could request listening sockets from the old one using
the same method that other servers use.  First it would connect to the
old metaserver through the filesystem socket, then listen on a new
filesystem socket, and rename() that to atomically replace the old
one.  Then, since the old metaserver has passed the listening sockets
to a new process, it will revoke its leases to the old servers.  They
will all exit and be automatically restarted, re-requesting their
sockets from the new metaserver.  But the listening sockets will never
be completely closed through all this, so connections will not be
rejected.  The only new kind of conversation needed over the
filesystem connection is for the new metaserver to ask for all open
connections, instead of individual connections that may or may not
already be open.

There is a race condition here if two new metaservers start at the
same time when there is no old metaserver already running, but it only
results in an extra process hanging around doing nothing, which isn't
harmful.

> Then you'd need to implement something to take care of metaserver crashes.
> Probably a way for servers to pass listening sockets back to the
> new metaserver.

I think that's beyond the point of diminishing returns.  The problem
can never be completely solved, since the metaserver and other servers
could crash at the same time, or you could lose power, etc.  You have
to give up at some point.

> Then servers would need a way to wait a bit if they want to restart while
> the metaserver is being restarted.

They could just exit and let supervise/runsv restart them.

>> While a server is handling connections, it would have to use
>> select()/poll() to notice activity on either the listening socket or
>> the filesystem connection;
>
> And that isn't very nice.

Well, I'd probably do that anyway, if I wanted to handle signals,
since I'd use the self-pipe technique to notice when signals arrived.

>> If the requestor exits, and no other requestors are around to pass the
>> listening socket to, the metaserver could close it immediately
>
> How is it supposed to know that the requestor exited?

The connection over the filesystem socket would be closed.  That could
happen without the server exiting, but the metaserver can treat both
cases the same way.  If a server closes the filesystem connection and
still expects to accept new connections, it's misbehaving.

> I meant to implement (when the time comes) something simpler. Either a
> FIFO or a Unix domain socket[1] is used as a communications channel for
> passing the listening socket, but without additional daemons.

That was my first thought too, but I couldn't come up with any
satisfying way to handle the race conditions gracefully.

Open file descriptors can only be passed over sockets, not pipes.
Also, using a socket means you have two-way communication, so you
don't need signals or PID files, which are subject to race conditions.

But without signals, you'll still have to use select()/poll() even
with all the functionality contained in one program, or else when you
start a new server to replace the old one, the old one will wait
indefinitely for one more client connection before waking up and
noticing that it should hand the listening sockets over to the new
server.

One problem with filesystem sockets is that you have to unlink the
socket before listening on it, so the operation "listen on this
socket, which may or may not already exist" isn't atomic.  If two
processes start at the same time, one of them can delete the other's
socket without knowing that anything was listening on it.  So it may
be useful to atomically acquire some other dummy resource as a
mutual-exclusion checkpoint before listening on the filesystem socket.

Another benefit of making the metaserver a separate program: you can
also write a library for LD_PRELOAD that masks the listen() function
to make existing programs use the metaserver instead of opening their
listening sockets directly.

paul

next prev parent reply	other threads:[~2006-11-20 19:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-15 11:47 Dražen Kačar
2006-11-15 16:08 ` Alex Efros
2006-11-16 15:24   ` Dražen Kačar
2006-11-17  0:15     ` Alex Efros
2006-11-17  0:48       ` Paul Jarc
2006-11-17 13:34         ` Alex Efros
2006-11-17 14:53           ` Charlie Brady
2006-11-17 15:39             ` Gerrit Pape
2006-11-18  0:22             ` Alex Efros
2006-11-18  1:34               ` Charlie Brady
2006-11-18 12:31                 ` Alex Efros
2006-11-18 19:30                   ` Paul Jarc
2006-11-20 18:27                     ` Dražen Kačar
2006-11-20 19:32                       ` Paul Jarc [this message]
2006-11-20 19:43                         ` Paul Jarc
2006-11-22 19:25                         ` Dražen Kačar
2006-11-22 19:51                           ` Paul Jarc
2006-11-23 12:25                             ` Dražen Kačar
2006-11-24 21:22                               ` Paul Jarc
2006-11-17 13:14     ` Gerrit Pape

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3zmalc3rk.fsf@multivac.cwru.edu \
    --to=prj@po.cwru.edu \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).