supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: prj@po.cwru.edu (Paul Jarc)
Cc: supervision@list.skarnet.org
Subject: Re: graceful restart under runit
Date: Wed, 22 Nov 2006 14:51:13 -0500	[thread overview]
Message-ID: <m3ac2j9s53.fsf@multivac.cwru.edu> (raw)
In-Reply-To: <20061122192506.GA24958@fly.srk.fer.hr> (=?iso-8859-2?Q?Dra?= =?iso-8859-2?Q?=BEen_Ka=E8ar's?= message of "Wed, 22 Nov 2006 20:25:06 +0100")

Dražen Kačar <dave@fly.srk.fer.hr> wrote:
> Paul Jarc wrote:
>> Also, using a socket means you have two-way communication, so you
>> don't need signals or PID files, which are subject to race conditions.
>
> Files maybe are, but signals?

Unless you're sending signals to your own child process, there's a
chance that the process you're signaling has already died, and its PID
has been reused for a new process.  This is true no matter how you
obtain the PID to send signals to; PID files are just one case of
that.  The only exception is for the parent, which knows that the
child's PID hasn't been recycled because, even if the child has
exited, the parent hasn't wait()ed for the child yet.

> A server binds to the file system socket after it got the network socket,
> either by a direct bind() or by a passover from an existing server. That
> should be enough, I think. It's not atomic, but it's a locking protocol.

Actually, obtaining the network socket can be atomic enough - bind()
is atomic, and each server process can limit itself to passing the
network socket to at most one other server, so there's no chance of
one server getting a network socket while another is in the middle of
receiving it in a handoff.  That's all the atomicity we need.

It can get messy if two servers start at the same time - A will get
the network socket, so B will try to connect to the filesystem socket,
but A might not be listening there yet.  B could handle this by
looping, waiting for A to either die and free up the network socket,
or start listening on the filesystem socket.  That might work, but
looping doesn't feel right.  I'm not sure if there are any problems
lurking there.

> If the bind to the network socket fails because something else is
> listening, then it can try again on the file system and bail out with an
> error if there's no writer. After all, that's not supposed to happen.

That could work too.  Looping has the advantage that if A dies just
after binding the network socket, B will go back and try again, so the
service will come up as long as B survives.  But with supervision, it
will get restarted anyway, so it's not really a big difference.

So I guess there's no big advantage either way between the metaserver
vs. keeping all the handoff functionality in one program.

>> Another benefit of making the metaserver a separate program: you can
>> also write a library for LD_PRELOAD that masks the listen() function
>> to make existing programs use the metaserver instead of opening their
>> listening sockets directly.
>
> That's a good one. But shouldn't that mask the bind() function?

Probably, but it won't actually work either way, since the server
needs to notice traffic on the filesystem connection.

> As for the lease problem, couldn't metaserver just SIGTERM the existing
> server?

That suffers from the PID-recycling problem above.


paul


  reply	other threads:[~2006-11-22 19:51 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-15 11:47 Dražen Kačar
2006-11-15 16:08 ` Alex Efros
2006-11-16 15:24   ` Dražen Kačar
2006-11-17  0:15     ` Alex Efros
2006-11-17  0:48       ` Paul Jarc
2006-11-17 13:34         ` Alex Efros
2006-11-17 14:53           ` Charlie Brady
2006-11-17 15:39             ` Gerrit Pape
2006-11-18  0:22             ` Alex Efros
2006-11-18  1:34               ` Charlie Brady
2006-11-18 12:31                 ` Alex Efros
2006-11-18 19:30                   ` Paul Jarc
2006-11-20 18:27                     ` Dražen Kačar
2006-11-20 19:32                       ` Paul Jarc
2006-11-20 19:43                         ` Paul Jarc
2006-11-22 19:25                         ` Dražen Kačar
2006-11-22 19:51                           ` Paul Jarc [this message]
2006-11-23 12:25                             ` Dražen Kačar
2006-11-24 21:22                               ` Paul Jarc
2006-11-17 13:14     ` Gerrit Pape

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3ac2j9s53.fsf@multivac.cwru.edu \
    --to=prj@po.cwru.edu \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).