supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* graceful restart under runit
@ 2006-11-15 11:47 Dražen Kačar
  2006-11-15 16:08 ` Alex Efros
  0 siblings, 1 reply; 20+ messages in thread
From: Dražen Kačar @ 2006-11-15 11:47 UTC (permalink / raw)


Say I have a TCP server which listens on incoming connections on some TCP
port. Occasionaly I'd like to install and run a new version of the server
executable. Server source is under my control, for all intents and
purposes.

Normally I'd use SIGUSR1 to make the server close socket on which it
listens, finish processing current client sessions (depending on the
protocol, that might take seconds, minutes or hours) and exit.

Right after sending SIGUSR1 I'd start the new server version which would
just work for all new client connections.

If the server is managed by runit, things get complicated because runit
won't start the new server until the old one exits, so I either have to
abort existing client connections or suffer some time without service.

Is there a way to get around this?

-- 
 .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
     |
     |        dave@fly.srk.fer.hr


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-15 11:47 graceful restart under runit Dražen Kačar
@ 2006-11-15 16:08 ` Alex Efros
  2006-11-16 15:24   ` Dražen Kačar
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Efros @ 2006-11-15 16:08 UTC (permalink / raw)
  Cc: Dra?en Ka?ar

Hi!

On Wed, Nov 15, 2006 at 12:47:54PM +0100, Dra?en Ka?ar wrote:
> Say I have a TCP server which listens on incoming connections on some TCP
> port. Occasionaly I'd like to install and run a new version of the server
> executable. Server source is under my control, for all intents and
> purposes.
[...] 
> Is there a way to get around this?

Probably you can just fork() after receiving SIGUSR1 and exit from parent
leaving child to process existing connection.

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-15 16:08 ` Alex Efros
@ 2006-11-16 15:24   ` Dražen Kačar
  2006-11-17  0:15     ` Alex Efros
  2006-11-17 13:14     ` Gerrit Pape
  0 siblings, 2 replies; 20+ messages in thread
From: Dražen Kačar @ 2006-11-16 15:24 UTC (permalink / raw)


Alex Efros wrote:
> On Wed, Nov 15, 2006 at 12:47:54PM +0100, Dra?en Ka?ar wrote:
> > Say I have a TCP server which listens on incoming connections on some TCP
> > port. Occasionaly I'd like to install and run a new version of the server
> > executable. Server source is under my control, for all intents and
> > purposes.
> [...] 
> > Is there a way to get around this?
> 
> Probably you can just fork() after receiving SIGUSR1 and exit from parent
> leaving child to process existing connection.

Servers which use process per connection do something like that already
(the parent process signals the children, exits and leaves them to finish
sessions and then they exit too).

However, there are multithreaded monsters which can't do that. fork()
replicates just the calling thread[1], so it's not an option and exit()
will terminate all threads (ie. all sessions).

[1] It's possible to replicate all threads on Solaris, but that's too
    unportable for my purposes. Besides, calling fork() from an MT process
    usually uncovers bugs in various libraries which aren't prepared to
    deal with that.

-- 
 .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
     |
     |        dave@fly.srk.fer.hr


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-16 15:24   ` Dražen Kačar
@ 2006-11-17  0:15     ` Alex Efros
  2006-11-17  0:48       ` Paul Jarc
  2006-11-17 13:14     ` Gerrit Pape
  1 sibling, 1 reply; 20+ messages in thread
From: Alex Efros @ 2006-11-17  0:15 UTC (permalink / raw)
  Cc: Dra?en Ka?ar

Hi!

On Thu, Nov 16, 2006 at 04:24:46PM +0100, Dra?en Ka?ar wrote:
> However, there are multithreaded monsters which can't do that. fork()

:-/

Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term).
In this case runsv will send SIGTERM to your process, which can process it
by just closing listening socket, waiting until existing connection finish
and then exit.
After few (up to 5) seconds runsv will be started again by runsvdir, and
so start second process of that server (which will open listening socket
again).

Probably you can even convert 't' to 'x' using file ./control/t - to be
able to use 't' instead of 'x' for restarting this service just as for any
other services.


P.S. Of course, better solution is not develop multithreaded monsters :)
or split that monster into two processes - one for accepting connections
and second for processing these connections (that architecture also much
better scale because you can run multiple "second" processes, each
multithreaded and process many connections - this proven to have better
performance compared to single multithreaded process or many
singlethreaded processes).

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-17  0:15     ` Alex Efros
@ 2006-11-17  0:48       ` Paul Jarc
  2006-11-17 13:34         ` Alex Efros
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Jarc @ 2006-11-17  0:48 UTC (permalink / raw)
  Cc: Dra?en Ka?ar

Alex Efros <powerman@powerman.asdfGroup.com> wrote:
> Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term).
> In this case runsv will send SIGTERM to your process, which can process it
> by just closing listening socket, waiting until existing connection finish
> and then exit.
> After few (up to 5) seconds runsv will be started again by runsvdir, and
> so start second process of that server (which will open listening socket
> again).

This seems worse than t.  In either case, new connections are refused
while the old process cleans up its current connections, but with x,
new connections are also refused for up to 5 seconds more.


paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-16 15:24   ` Dražen Kačar
  2006-11-17  0:15     ` Alex Efros
@ 2006-11-17 13:14     ` Gerrit Pape
  1 sibling, 0 replies; 20+ messages in thread
From: Gerrit Pape @ 2006-11-17 13:14 UTC (permalink / raw)
  Cc: Dra?en Ka?ar

On Thu, Nov 16, 2006 at 04:24:46PM +0100, Dra?en Ka?ar wrote:
> Alex Efros wrote:
> > On Wed, Nov 15, 2006 at 12:47:54PM +0100, Dra?en Ka?ar wrote:
> > > Say I have a TCP server which listens on incoming connections on some TCP
> > > port. Occasionaly I'd like to install and run a new version of the server
> > > executable. Server source is under my control, for all intents and
> > > purposes.
> > [...] 
> > > Is there a way to get around this?
> > 
> > Probably you can just fork() after receiving SIGUSR1 and exit from parent
> > leaving child to process existing connection.

Yes.

> Servers which use process per connection do something like that already
> (the parent process signals the children, exits and leaves them to finish
> sessions and then they exit too).
> 
> However, there are multithreaded monsters which can't do that. fork()
> replicates just the calling thread[1], so it's not an option and exit()
> will terminate all threads (ie. all sessions).

Hm, even though I too dislike "multithreaded monsters", we could add
some detach support to runsv, e.g. the patch below.  You can test this
with

 $ printf f >./supervise/control

After this, runsv forgets about the child, and considers the service to
be terminated; custom/f, if it exists, will be run before detaching.

Regards, Gerrit.


Index: src/runsv.c
===================================================================
RCS file: /cvs/runit/src/runsv.c,v
retrieving revision 1.26
diff -u -r1.26 runsv.c
--- src/runsv.c 24 Jul 2006 21:01:37 -0000      1.26
+++ src/runsv.c 17 Nov 2006 12:58:34 -0000
@@ -359,6 +359,15 @@
     update_status(s);
     if (! s->pid) startservice(s);
     break;
+  case 'f': /* forget, detach */
+    if (! s->pid) break;
+    custom(s, c);
+    s->pid =0;
+    s->state =S_DOWN;
+    s->ctrl =C_NOOP;
+    pidchanged =1;
+    update_status(s);
+    break;
   case 'a': /* sig alarm */
     if (s->pid && ! custom(s, c)) kill(s->pid, SIGALRM);
     break;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-17  0:48       ` Paul Jarc
@ 2006-11-17 13:34         ` Alex Efros
  2006-11-17 14:53           ` Charlie Brady
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Efros @ 2006-11-17 13:34 UTC (permalink / raw)


Hi!

On Thu, Nov 16, 2006 at 07:48:55PM -0500, Paul Jarc wrote:
> > Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term).
> > In this case runsv will send SIGTERM to your process, which can process it
> > by just closing listening socket, waiting until existing connection finish
> > and then exit.
> > After few (up to 5) seconds runsv will be started again by runsvdir, and
> > so start second process of that server (which will open listening socket
> > again).
> This seems worse than t.  In either case, new connections are refused
> while the old process cleans up its current connections, but with x,
> new connections are also refused for up to 5 seconds more.

If old server continue accepting new connections for 5 seconds after
receiving SIGTERM this solve 'connection refused' issue. (If new server
will be started after 1 second, for example, then in next 4 seconds both
server will have open listening socket and some connections will be
accepted by first server and some by second AFAIK - I don't see something
really wrong with this.)

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-17 13:34         ` Alex Efros
@ 2006-11-17 14:53           ` Charlie Brady
  2006-11-17 15:39             ` Gerrit Pape
  2006-11-18  0:22             ` Alex Efros
  0 siblings, 2 replies; 20+ messages in thread
From: Charlie Brady @ 2006-11-17 14:53 UTC (permalink / raw)
  Cc: supervision


On Fri, 17 Nov 2006, Alex Efros wrote:

> On Thu, Nov 16, 2006 at 07:48:55PM -0500, Paul Jarc wrote:
>>> Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term).
>>> In this case runsv will send SIGTERM to your process, which can process it
>>> by just closing listening socket, waiting until existing connection finish
>>> and then exit.
>>> After few (up to 5) seconds runsv will be started again by runsvdir, and
>>> so start second process of that server (which will open listening socket
>>> again).
>> This seems worse than t.  In either case, new connections are refused
>> while the old process cleans up its current connections, but with x,
>> new connections are also refused for up to 5 seconds more.
>
> If old server continue accepting new connections for 5 seconds after
> receiving SIGTERM this solve 'connection refused' issue. (If new server
> will be started after 1 second, for example, then in next 4 seconds both
> server will have open listening socket and some connections will be
> accepted by first server and some by second AFAIK - I don't see something
> really wrong with this.)

The new server will get an "Address in use" error when it attempts to open 
the socket, if it is still in use by the old server. It will likely then 
die, and you will have to wait again for runsv to start a new one. You 
will still have a period of time when connections will not be accepted.

Gerrit, tcpsvd man page doesn't mention how tcpsvd responds to signals, 
but I would guess it doesn't go into the background and die without 
terminating its children, in response to SIGUSR1. Would you consider 
adding that behaviour?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-17 14:53           ` Charlie Brady
@ 2006-11-17 15:39             ` Gerrit Pape
  2006-11-18  0:22             ` Alex Efros
  1 sibling, 0 replies; 20+ messages in thread
From: Gerrit Pape @ 2006-11-17 15:39 UTC (permalink / raw)


On Fri, Nov 17, 2006 at 09:53:28AM -0500, Charlie Brady wrote:
> Gerrit, tcpsvd man page doesn't mention how tcpsvd responds to signals, 
> but I would guess it doesn't go into the background and die without 
> terminating its children, in response to SIGUSR1. Would you consider 
> adding that behaviour?

tcpsvd forks for each connection, if it receives a TERM (or USR1)
signal, it terminates, and leaves its children handling existing
connections running.  

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-17 14:53           ` Charlie Brady
  2006-11-17 15:39             ` Gerrit Pape
@ 2006-11-18  0:22             ` Alex Efros
  2006-11-18  1:34               ` Charlie Brady
  1 sibling, 1 reply; 20+ messages in thread
From: Alex Efros @ 2006-11-18  0:22 UTC (permalink / raw)


Hi!

On Fri, Nov 17, 2006 at 09:53:28AM -0500, Charlie Brady wrote:
> The new server will get an "Address in use" error when it attempts to open 
> the socket, if it is still in use by the old server. It will likely then 
> die, and you will have to wait again for runsv to start a new one. You 
> will still have a period of time when connections will not be accepted.

All servers usually use setsockopt(SO_REUSEADDR) to work around this.

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-18  0:22             ` Alex Efros
@ 2006-11-18  1:34               ` Charlie Brady
  2006-11-18 12:31                 ` Alex Efros
  0 siblings, 1 reply; 20+ messages in thread
From: Charlie Brady @ 2006-11-18  1:34 UTC (permalink / raw)
  Cc: supervision


On Sat, 18 Nov 2006, Alex Efros wrote:

> On Fri, Nov 17, 2006 at 09:53:28AM -0500, Charlie Brady wrote:
>> The new server will get an "Address in use" error when it attempts to open
>> the socket, if it is still in use by the old server. It will likely then
>> die, and you will have to wait again for runsv to start a new one. You
>> will still have a period of time when connections will not be accepted.
>
> All servers usually use setsockopt(SO_REUSEADDR) to work around this.

Not as I understand it. SO_REUSEADDR will allow the socket to be reused 
when in TIME_WAIT state. It won't allow multiple processes to bind to the 
socket and listen to connections.

-bash-3.00$ tcpsvd localhost 5000 echo foo &
[1] 7520
-bash-3.00$ tcpsvd localhost 5000 echo foo
tcpsvd: fatal: unable to bind socket: address already used
-bash-3.00$



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-18  1:34               ` Charlie Brady
@ 2006-11-18 12:31                 ` Alex Efros
  2006-11-18 19:30                   ` Paul Jarc
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Efros @ 2006-11-18 12:31 UTC (permalink / raw)


Hi!

On Fri, Nov 17, 2006 at 08:34:18PM -0500, Charlie Brady wrote:
> Not as I understand it. SO_REUSEADDR will allow the socket to be reused 
> when in TIME_WAIT state. It won't allow multiple processes to bind to the 
> socket and listen to connections.

Yep, looks like you right. Looks like I've confused this case and case
when process with listening socket doing fork and so result in two
listening sockets for same ip/port in two different processes.

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-18 12:31                 ` Alex Efros
@ 2006-11-18 19:30                   ` Paul Jarc
  2006-11-20 18:27                     ` Dražen Kačar
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Jarc @ 2006-11-18 19:30 UTC (permalink / raw)


Alex Efros <powerman@powerman.asdfGroup.com> wrote:
> Yep, looks like you right. Looks like I've confused this case and case
> when process with listening socket doing fork and so result in two
> listening sockets for same ip/port in two different processes.

Now that you mention this, I think there is a way to hand off to a new
server process with no unavailability at all.  But if you can tolerate
a small outage during the switchover, you may be better off with the
simpler method of sending a signal to make the old process close its
listening socket, then forgetting the old process using Gerrit's patch
and starting a new one, which will just open a listening socket as
usual.

So, for zero-unavailability: there is a metaserver which listens on a
filesystem socket and takes care of opening other listening sockets
for other servers.  Instead of opening a listening socket directly,
another server would connect to this filesystem socket and ask the
metaserver to open it.  If there is not yet any socket open for the
requested address, the metaserver opens one, and passes the descriptor
to the requestor over the filesystem socket connection with sendmsg().
(The metaserver also keeps the listening socket open for itself, but
never calls accept() on it.)  Both sides keep the filesystem socket
connection open for as long as the requestor is accepting connections.
If the metaserver receives a request for a socket that is already
open, it notifies the previous requestor, which still holds the
listening socket, over the filesystem connection.  The old process can
then close its listening socket (new connections will only be delayed
at this point, not refused, since the metaserver still has the
listening socket open as well), close the filesystem connection,
finish servicing its current connections, and exit.  Once the
metaserver sees that the old requestor's filesystem connection has
been closed, it sends the listening socket to the new requestor.

While a server is handling connections, it would have to use
select()/poll() to notice activity on either the listening socket or
the filesystem connection; it couldn't just block on accept() in a
loop.  (Well, it could, but that would mean that when a switchover
starts, it wouldn't be completed until the next client connected.)

If the requestor exits, and no other requestors are around to pass the
listening socket to, the metaserver could close it immediately, or
could keep it open for a few seconds to see if a new requestor show
up.  So quick, non-overlapping restarts would be transparent to the
end clients.

To trigger the switchover, you wouldn't need any signals - just make
runsv forget about the old process using Gerrit's patch.  When the new
process starts up and connects to the filesystem socket, that will
trigger everything else.


paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-18 19:30                   ` Paul Jarc
@ 2006-11-20 18:27                     ` Dražen Kačar
  2006-11-20 19:32                       ` Paul Jarc
  0 siblings, 1 reply; 20+ messages in thread
From: Dražen Kačar @ 2006-11-20 18:27 UTC (permalink / raw)


Paul Jarc wrote:

> So, for zero-unavailability: there is a metaserver which listens on a

And then it's just a small matter of implementing the metaserver? :-)

It seems a bit complex to me. You'd have to implement a protocol for
starting a new metaserver version (which boils down to passing all
those file descriptors to the new metaserver)

Then you'd need to implement something to take care of metaserver crashes.
Probably a way for servers to pass listening sockets back to the
new metaserver.

Then servers would need a way to wait a bit if they want to restart while
the metaserver is being restarted.

Maybe a few more things as well. I suppose it's doable, but it seems like
a can of worms and races.

> While a server is handling connections, it would have to use
> select()/poll() to notice activity on either the listening socket or
> the filesystem connection;

And that isn't very nice.

> If the requestor exits, and no other requestors are around to pass the
> listening socket to, the metaserver could close it immediately, or
> could keep it open for a few seconds to see if a new requestor show
> up.  So quick, non-overlapping restarts would be transparent to the
> end clients.

How is it supposed to know that the requestor exited?

> To trigger the switchover, you wouldn't need any signals - just make
> runsv forget about the old process using Gerrit's patch.  When the new
> process starts up and connects to the filesystem socket, that will
> trigger everything else.

I meant to implement (when the time comes) something simpler. Either a
FIFO or a Unix domain socket[1] is used as a communications channel for
passing the listening socket, but without additional daemons.

The new server starts, acquires all resources necessary to run except the
listening socket and the PID file, then tries to connect to the file
system channel.

If there's no writer, it binds to the network socket, writes the PID file,
becomes the writer on the file system channel and starts doing its job.

If there is a writer, it's supposed to be an already running server
instance. Then the new server reads the PID file, signals the running
instance and blocks in read on the file system channel.

The running instance receives the signal, passes the listening socket,
performs whatever cleanup needs to be done[2] and then either exits or
waits for the current sessions to finish and then exits.

The new server reads the file descriptor, becomes the writer on the file
system channel, writes the new PID file and starts doing its job.

[1] FIFOs are nasty because O_RDONLY opens block if there are no writers
    and O_WRONLY opens block if there are no readers. O_NONBLOCK allows a
    reader to attach without the writer, but it doesn't allow a writer to
    attach when there is no reader. I'm not sure if the required
    mumbo-jumbo can be portably done with FIFOs (also, some OSs have bugs
    in this area IIRC). But if a FIFO isn't good enough, Unix domain
    socket should suffice.

[2] At least it needs to close the listening socket and the writing part
    of the file system channel to enable the new server to become a writer
    there.

-- 
 .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
     |
     |        dave@fly.srk.fer.hr


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-20 18:27                     ` Dražen Kačar
@ 2006-11-20 19:32                       ` Paul Jarc
  2006-11-20 19:43                         ` Paul Jarc
  2006-11-22 19:25                         ` Dražen Kačar
  0 siblings, 2 replies; 20+ messages in thread
From: Paul Jarc @ 2006-11-20 19:32 UTC (permalink / raw)
  Cc: supervision

Dražen Kačar <dave@fly.srk.fer.hr> wrote:
> And then it's just a small matter of implementing the metaserver? :-)

Right. :)

> It seems a bit complex to me. You'd have to implement a protocol for
> starting a new metaserver version (which boils down to passing all
> those file descriptors to the new metaserver)

Well, it depends how much downtime you can tolerate.  Restarting the
metaserver would probably be pretty infrequent - less frequent than
restarting the servers that use it - so you might accept some downtime
in that event for the sake of making the metaserver simpler.

But actually, it could be fairly simple if you're willing to restart
all other services when you restart the metaserver.  A new instance of
the metaserver could request listening sockets from the old one using
the same method that other servers use.  First it would connect to the
old metaserver through the filesystem socket, then listen on a new
filesystem socket, and rename() that to atomically replace the old
one.  Then, since the old metaserver has passed the listening sockets
to a new process, it will revoke its leases to the old servers.  They
will all exit and be automatically restarted, re-requesting their
sockets from the new metaserver.  But the listening sockets will never
be completely closed through all this, so connections will not be
rejected.  The only new kind of conversation needed over the
filesystem connection is for the new metaserver to ask for all open
connections, instead of individual connections that may or may not
already be open.

There is a race condition here if two new metaservers start at the
same time when there is no old metaserver already running, but it only
results in an extra process hanging around doing nothing, which isn't
harmful.

> Then you'd need to implement something to take care of metaserver crashes.
> Probably a way for servers to pass listening sockets back to the
> new metaserver.

I think that's beyond the point of diminishing returns.  The problem
can never be completely solved, since the metaserver and other servers
could crash at the same time, or you could lose power, etc.  You have
to give up at some point.

> Then servers would need a way to wait a bit if they want to restart while
> the metaserver is being restarted.

They could just exit and let supervise/runsv restart them.

>> While a server is handling connections, it would have to use
>> select()/poll() to notice activity on either the listening socket or
>> the filesystem connection;
>
> And that isn't very nice.

Well, I'd probably do that anyway, if I wanted to handle signals,
since I'd use the self-pipe technique to notice when signals arrived.

>> If the requestor exits, and no other requestors are around to pass the
>> listening socket to, the metaserver could close it immediately
>
> How is it supposed to know that the requestor exited?

The connection over the filesystem socket would be closed.  That could
happen without the server exiting, but the metaserver can treat both
cases the same way.  If a server closes the filesystem connection and
still expects to accept new connections, it's misbehaving.

> I meant to implement (when the time comes) something simpler. Either a
> FIFO or a Unix domain socket[1] is used as a communications channel for
> passing the listening socket, but without additional daemons.

That was my first thought too, but I couldn't come up with any
satisfying way to handle the race conditions gracefully.

Open file descriptors can only be passed over sockets, not pipes.
Also, using a socket means you have two-way communication, so you
don't need signals or PID files, which are subject to race conditions.

But without signals, you'll still have to use select()/poll() even
with all the functionality contained in one program, or else when you
start a new server to replace the old one, the old one will wait
indefinitely for one more client connection before waking up and
noticing that it should hand the listening sockets over to the new
server.

One problem with filesystem sockets is that you have to unlink the
socket before listening on it, so the operation "listen on this
socket, which may or may not already exist" isn't atomic.  If two
processes start at the same time, one of them can delete the other's
socket without knowing that anything was listening on it.  So it may
be useful to atomically acquire some other dummy resource as a
mutual-exclusion checkpoint before listening on the filesystem socket.

Another benefit of making the metaserver a separate program: you can
also write a library for LD_PRELOAD that masks the listen() function
to make existing programs use the metaserver instead of opening their
listening sockets directly.


paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-20 19:32                       ` Paul Jarc
@ 2006-11-20 19:43                         ` Paul Jarc
  2006-11-22 19:25                         ` Dražen Kačar
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Jarc @ 2006-11-20 19:43 UTC (permalink / raw)
  Cc: supervision

I wrote:
> Another benefit of making the metaserver a separate program: you can
> also write a library for LD_PRELOAD that masks the listen() function
> to make existing programs use the metaserver instead of opening their
> listening sockets directly.

Oops, that won't work, since the server has to notice when its lease
on the listening socket has expired, and clean up accordingly.


paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-20 19:32                       ` Paul Jarc
  2006-11-20 19:43                         ` Paul Jarc
@ 2006-11-22 19:25                         ` Dražen Kačar
  2006-11-22 19:51                           ` Paul Jarc
  1 sibling, 1 reply; 20+ messages in thread
From: Dražen Kačar @ 2006-11-22 19:25 UTC (permalink / raw)


Paul Jarc wrote:
> Dražen Kačar <dave@fly.srk.fer.hr> wrote:

> Well, it depends how much downtime you can tolerate.
[...]

> > Then you'd need to implement something to take care of metaserver crashes.
> > Probably a way for servers to pass listening sockets back to the
> > new metaserver.
> 
> I think that's beyond the point of diminishing returns.  The problem
> can never be completely solved, since the metaserver and other servers
> could crash at the same time, or you could lose power, etc.  You have
> to give up at some point.

Well, I'm thinking about systems with 99.999% availability (for fun you
can calculate how many seconds per year that is :-). Clustered systems can
do that. There's a hartbeat and if one component fails then the processing
waits until the living ones reach the recovery point, but after that
processing continues and the client only sees a brief (or not) pause.

But that's for distributed systems where you just don't have a single
point on which you can rely to work properly. On one machine there's the
kernel. If it crashes, all processes will also crash and burn, so that
would be the point at which I'd give up. :-)

> >> While a server is handling connections, it would have to use
> >> select()/poll() to notice activity on either the listening socket or
> >> the filesystem connection;
> >
> > And that isn't very nice.
> 
> Well, I'd probably do that anyway, if I wanted to handle signals,
> since I'd use the self-pipe technique to notice when signals arrived.

I'm trying to use sig_atomic_t flag in signal handlers whenever I can.
Things are a bit simpler that way, at least to me.

For threaded code there'd be a signal handling thread, so that's allegedly
a non-issue. Just a small matter if inter-thread syncronization (yuck).

> > I meant to implement (when the time comes) something simpler. Either a
> > FIFO or a Unix domain socket[1] is used as a communications channel for
> > passing the listening socket, but without additional daemons.
> 
> That was my first thought too, but I couldn't come up with any
> satisfying way to handle the race conditions gracefully.
> 
> Open file descriptors can only be passed over sockets, not pipes.

Right. Passing them through a FIFO is a SYSV feature. I forgot it was not
portable.

> Also, using a socket means you have two-way communication, so you
> don't need signals or PID files, which are subject to race conditions.

Files maybe are, but signals? You can end up with losing some signals if
they are sent in a rapid succesion, but for this purpose you just need one
to trigger an action and shortly after receiving it the server is supposed
to exit, so the possibility of losing other signals (of the same kind)
doesn't matter.

> But without signals, you'll still have to use select()/poll() even
> with all the functionality contained in one program, or else when you
> start a new server to replace the old one, the old one will wait
> indefinitely for one more client connection before waking up and
> noticing that it should hand the listening sockets over to the new
> server.

I'd use a signal to get out of accept(). :-)

> One problem with filesystem sockets is that you have to unlink the
> socket before listening on it, so the operation "listen on this
> socket, which may or may not already exist" isn't atomic.  If two
> processes start at the same time, one of them can delete the other's
> socket without knowing that anything was listening on it.  So it may
> be useful to atomically acquire some other dummy resource as a
> mutual-exclusion checkpoint before listening on the filesystem socket.

A server binds to the file system socket after it got the network socket,
either by a direct bind() or by a passover from an existing server. That
should be enough, I think. It's not atomic, but it's a locking protocol.

My description had: "If there's no writer [on a file system channel], it
binds to the network socket, writes the PID file [...]"

If the bind to the network socket fails because something else is
listening, then it can try again on the file system and bail out with an
error if there's no writer. After all, that's not supposed to happen.

> Another benefit of making the metaserver a separate program: you can
> also write a library for LD_PRELOAD that masks the listen() function
> to make existing programs use the metaserver instead of opening their
> listening sockets directly.

That's a good one. But shouldn't that mask the bind() function?

As for the lease problem, couldn't metaserver just SIGTERM the existing
server? It needs to know the PID, but that can be passed to it when the
server connects to the file system socket and before it gets the network
socket from the metaserver.

-- 
 .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
     |
     |        dave@fly.srk.fer.hr


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-22 19:25                         ` Dražen Kačar
@ 2006-11-22 19:51                           ` Paul Jarc
  2006-11-23 12:25                             ` Dražen Kačar
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Jarc @ 2006-11-22 19:51 UTC (permalink / raw)
  Cc: supervision

Dražen Kačar <dave@fly.srk.fer.hr> wrote:
> Paul Jarc wrote:
>> Also, using a socket means you have two-way communication, so you
>> don't need signals or PID files, which are subject to race conditions.
>
> Files maybe are, but signals?

Unless you're sending signals to your own child process, there's a
chance that the process you're signaling has already died, and its PID
has been reused for a new process.  This is true no matter how you
obtain the PID to send signals to; PID files are just one case of
that.  The only exception is for the parent, which knows that the
child's PID hasn't been recycled because, even if the child has
exited, the parent hasn't wait()ed for the child yet.

> A server binds to the file system socket after it got the network socket,
> either by a direct bind() or by a passover from an existing server. That
> should be enough, I think. It's not atomic, but it's a locking protocol.

Actually, obtaining the network socket can be atomic enough - bind()
is atomic, and each server process can limit itself to passing the
network socket to at most one other server, so there's no chance of
one server getting a network socket while another is in the middle of
receiving it in a handoff.  That's all the atomicity we need.

It can get messy if two servers start at the same time - A will get
the network socket, so B will try to connect to the filesystem socket,
but A might not be listening there yet.  B could handle this by
looping, waiting for A to either die and free up the network socket,
or start listening on the filesystem socket.  That might work, but
looping doesn't feel right.  I'm not sure if there are any problems
lurking there.

> If the bind to the network socket fails because something else is
> listening, then it can try again on the file system and bail out with an
> error if there's no writer. After all, that's not supposed to happen.

That could work too.  Looping has the advantage that if A dies just
after binding the network socket, B will go back and try again, so the
service will come up as long as B survives.  But with supervision, it
will get restarted anyway, so it's not really a big difference.

So I guess there's no big advantage either way between the metaserver
vs. keeping all the handoff functionality in one program.

>> Another benefit of making the metaserver a separate program: you can
>> also write a library for LD_PRELOAD that masks the listen() function
>> to make existing programs use the metaserver instead of opening their
>> listening sockets directly.
>
> That's a good one. But shouldn't that mask the bind() function?

Probably, but it won't actually work either way, since the server
needs to notice traffic on the filesystem connection.

> As for the lease problem, couldn't metaserver just SIGTERM the existing
> server?

That suffers from the PID-recycling problem above.


paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-22 19:51                           ` Paul Jarc
@ 2006-11-23 12:25                             ` Dražen Kačar
  2006-11-24 21:22                               ` Paul Jarc
  0 siblings, 1 reply; 20+ messages in thread
From: Dražen Kačar @ 2006-11-23 12:25 UTC (permalink / raw)


Paul Jarc wrote:
> Dražen Kačar <dave@fly.srk.fer.hr> wrote:
> > Paul Jarc wrote:
> >> Also, using a socket means you have two-way communication, so you
> >> don't need signals or PID files, which are subject to race conditions.
> >
> > Files maybe are, but signals?
> 
> Unless you're sending signals to your own child process, there's a
> chance that the process you're signaling has already died, and its PID
> has been reused for a new process.  This is true no matter how you
> obtain the PID to send signals to; PID files are just one case of
> that.  The only exception is for the parent, which knows that the
> child's PID hasn't been recycled because, even if the child has
> exited, the parent hasn't wait()ed for the child yet.

Ah, that. Well, you'd just have to rely on the usually-not-documented OS
feature. PIDs are not recycled fast in practice, so that would have to be
good enough.

Somewhat unportable guarantee could be obtained via /proc.  You know the
PID, so you stop the process via /proc or ptrace() or whatever is
available for debuggers (something will be available), check that the PID
is associated with the correct executable via /proc, send your signal (now
the process won't go away) and then detach from the process.

Checking whether the PID corresponds to the correct executable is the
messy part and I don't know if it can be handled in a reasonable way for
this purpose.

I'd just live with the race condition and rely on the OS not to reuse PIDs
too fast.

> > A server binds to the file system socket after it got the network socket,
> > either by a direct bind() or by a passover from an existing server. That
> > should be enough, I think. It's not atomic, but it's a locking protocol.
> 
> Actually, obtaining the network socket can be atomic enough - bind()

I meant for the whole thing: obtain the network socket and then obtain the
file system socket. Mandating that the network socket must be obtained
first and file system socket second is a locking protocol. And the first
lock in the locking protocol must be atomic.

> So I guess there's no big advantage either way between the metaserver
> vs. keeping all the handoff functionality in one program.

Unless LD_PRELOAD method can work. Then the metaserver has a distinct
advantage for those who need it.

> > That's a good one. But shouldn't that mask the bind() function?
> 
> Probably, but it won't actually work either way, since the server
> needs to notice traffic on the filesystem connection.
> 
> > As for the lease problem, couldn't metaserver just SIGTERM the existing
> > server?
> 
> That suffers from the PID-recycling problem above.

Well, I'd just ignore that problem. Or go through unportable interface for
the debuggers when feeling paranoid.

-- 
 .-.   .-.    Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
     |
     |        dave@fly.srk.fer.hr


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: graceful restart under runit
  2006-11-23 12:25                             ` Dražen Kačar
@ 2006-11-24 21:22                               ` Paul Jarc
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Jarc @ 2006-11-24 21:22 UTC (permalink / raw)
  Cc: supervision

Dražen Kačar <dave@fly.srk.fer.hr> wrote:
> PIDs are not recycled fast in practice, so that would have to be
> good enough.

They certainly are recycled fast in practice, although maybe not
often.  For example, OpenBSD can assign PIDs in random order instead
of sequentially, so a PID has a chance of being reused for the very
next process after it exits.  The same problem can hit any OS if it
spawns short-lived processes at a high rate.

For me, at least, it's well worth using poll()/select() to avoid this
risk.  It's a one-time task for the programmer, but PID recycling is a
constant danger for every user.

> Somewhat unportable guarantee could be obtained via /proc.  You know the
> PID, so you stop the process via /proc or ptrace() or whatever is
> available for debuggers (something will be available), check that the PID
> is associated with the correct executable via /proc,

Even if it's the right program, that doesn't guarantee it's the right
process.  This seems like more work than poll()/select(), with worse
results.


paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-11-24 21:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-15 11:47 graceful restart under runit Dražen Kačar
2006-11-15 16:08 ` Alex Efros
2006-11-16 15:24   ` Dražen Kačar
2006-11-17  0:15     ` Alex Efros
2006-11-17  0:48       ` Paul Jarc
2006-11-17 13:34         ` Alex Efros
2006-11-17 14:53           ` Charlie Brady
2006-11-17 15:39             ` Gerrit Pape
2006-11-18  0:22             ` Alex Efros
2006-11-18  1:34               ` Charlie Brady
2006-11-18 12:31                 ` Alex Efros
2006-11-18 19:30                   ` Paul Jarc
2006-11-20 18:27                     ` Dražen Kačar
2006-11-20 19:32                       ` Paul Jarc
2006-11-20 19:43                         ` Paul Jarc
2006-11-22 19:25                         ` Dražen Kačar
2006-11-22 19:51                           ` Paul Jarc
2006-11-23 12:25                             ` Dražen Kačar
2006-11-24 21:22                               ` Paul Jarc
2006-11-17 13:14     ` Gerrit Pape

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).