supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* [announce] perp-2.03: persistent process supervision
@ 2011-03-14 10:39 Wayne Marshall
  2011-03-14 13:17 ` Laurent Bercot
  0 siblings, 1 reply; 14+ messages in thread
From: Wayne Marshall @ 2011-03-14 10:39 UTC (permalink / raw)
  To: supervision

Announcing the latest release of perp, perp-2.03, a persistent
process supervisor:

 http://b0llix.net/perp/

Tarball:

 http://b0llix.net/perp/distfiles/perp-2.03.tar.gz


What's New (As if You Care):

The big news for the "second generation" perp-2.* series:

 * scanner/supervisor/controller runs as a single process

 * all context switching for multiple supervisor processes is
   eliminated

 * ipc for control/status clients now via single domain socket

 * perpd(8) creates a mere two file system objects at startup --
   a lockfile and domain socket -- and otherwise generates no
   disk activity during runtime, perfect for read-only file
   systems and embedded applications!


About (The Usual Outrageous Claims and Assertions):

perp is a service supervisor similar in purpose to the venerable
daemontools package, providing a modern update with many
advantages:

 * easy configuration: in place service activation and no
   symlinks!

 * everthing administered in /etc/perp

 * fully FHS compatible

 * service reset capability

 * pretty good troff -man documentation

 * colorized(!) service lister, readable timestamps...

 * no slashpackage, no slashcommand, no slashdoc...


Contact (Hah!):

 perp[At Sign]b0llix[Dot]net


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 10:39 [announce] perp-2.03: persistent process supervision Wayne Marshall
@ 2011-03-14 13:17 ` Laurent Bercot
  2011-03-14 14:02   ` Wayne Marshall
  0 siblings, 1 reply; 14+ messages in thread
From: Laurent Bercot @ 2011-03-14 13:17 UTC (permalink / raw)
  To: supervision


 Hi Wayne,

> Announcing the latest release of perp, perp-2.03, a persistent
> process supervisor:

 Good news ! :)


 I just have a question about your design:

>  * easy configuration: in place service activation and no
>    symlinks!

 Does that mean that perpd stores all the service states in memory ?
To control or check on services, perpctl and other utilities connect
to perpd via the Unix domain socket, right ?

 So... the dreaded question... what happens if perpd dies ? Will
perpboot restore a sane supervision tree ?

-- 
 Laurent


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 13:17 ` Laurent Bercot
@ 2011-03-14 14:02   ` Wayne Marshall
  2011-03-14 14:23     ` Robin Bowes
  0 siblings, 1 reply; 14+ messages in thread
From: Wayne Marshall @ 2011-03-14 14:02 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: supervision

Hi Laurent,

>  I just have a question about your design:
> 
> >  * easy configuration: in place service activation and no
> >    symlinks!
> 
>  Does that mean that perpd stores all the service states in
> memory ? To control or check on services, perpctl and other
> utilities connect to perpd via the Unix domain socket, right ?

Yes.
 
>  So... the dreaded question... what happens if perpd dies ?
> Will perpboot restore a sane supervision tree ?
> 

Yes.

First, perpd(8) won't die :)

If perpd(8) receives SIGTERM, it runs a controlled shutdown of
all services under its supervision, and then terminates itself.

Under normal (default) configurations, whenever perpd(8)
terminates it is restarted by either perpboot(8), or init(8) with
a "respawn" configuration in inittab(5).  perpd(8) then
restarts all services marked for activation in /etc/perp.

Wayne


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 14:02   ` Wayne Marshall
@ 2011-03-14 14:23     ` Robin Bowes
  2011-03-14 14:34       ` Wayne Marshall
  2011-03-14 15:03       ` Charlie Brady
  0 siblings, 2 replies; 14+ messages in thread
From: Robin Bowes @ 2011-03-14 14:23 UTC (permalink / raw)
  To: supervision

On 14/03/11 14:02, Wayne Marshall wrote:

> Under normal (default) configurations, whenever perpd(8)
> terminates it is restarted by either perpboot(8), or init(8) with
> a "respawn" configuration in inittab(5).  perpd(8) then
> restarts all services marked for activation in /etc/perp.

So, if I have a service that is normally running, ie. starts at boot,
but I have taken it down manually for whatever reason, and perpd dies,
then my service will also be re-started?

R.
-- 
"Feed that ego and you starve the soul" - Colonel J.D. Wilkes
http://www.theshackshakers.com/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 14:23     ` Robin Bowes
@ 2011-03-14 14:34       ` Wayne Marshall
  2011-03-14 16:47         ` Laurent Bercot
  2011-03-14 15:03       ` Charlie Brady
  1 sibling, 1 reply; 14+ messages in thread
From: Wayne Marshall @ 2011-03-14 14:34 UTC (permalink / raw)
  To: Robin Bowes; +Cc: supervision

> 
> > Under normal (default) configurations, whenever perpd(8)
> > terminates it is restarted by either perpboot(8), or init(8)
> > with a "respawn" configuration in inittab(5).  perpd(8) then
> > restarts all services marked for activation in /etc/perp.
> 
> So, if I have a service that is normally running, ie. starts
> at boot, but I have taken it down manually for whatever
> reason, and perpd dies, then my service will also be
> re-started?
>

First, perpd(8) will not die (TM).

If you deactivate your service (chmod -t myservice), or delete
it from /etc/perp, or touch flag.down in the service directory,
then it will not be restarted.

If you take your service down with perpctl(8) -- without doing
any of the above -- and in the interim perpd(8) is restarted,
then the service will be restarted.

So it is up to you to decide the intent of taking a service down.

Wayne


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 14:23     ` Robin Bowes
  2011-03-14 14:34       ` Wayne Marshall
@ 2011-03-14 15:03       ` Charlie Brady
  2011-03-14 15:35         ` Wayne Marshall
  2011-03-14 17:02         ` Laurent Bercot
  1 sibling, 2 replies; 14+ messages in thread
From: Charlie Brady @ 2011-03-14 15:03 UTC (permalink / raw)
  To: Robin Bowes; +Cc: supervision


On Mon, 14 Mar 2011, Robin Bowes wrote:

> On 14/03/11 14:02, Wayne Marshall wrote:
> 
> > Under normal (default) configurations, whenever perpd(8)
> > terminates it is restarted by either perpboot(8), or init(8) with
> > a "respawn" configuration in inittab(5).  perpd(8) then
> > restarts all services marked for activation in /etc/perp.
> 
> So, if I have a service that is normally running, ie. starts at boot,
> but I have taken it down manually for whatever reason, and perpd dies,
> then my service will also be re-started?

And presumably the converse will apply as well. This is a problem with 
runit (and daemontools) - if a service has a 'down' file, but has been 
later started, a dying runsv (e.g. if killed by the OoM killer, or by a 
service which kills its process group) will be replaced by runsvdir, but 
the service will stay down.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 15:03       ` Charlie Brady
@ 2011-03-14 15:35         ` Wayne Marshall
  2011-03-14 17:02         ` Laurent Bercot
  1 sibling, 0 replies; 14+ messages in thread
From: Wayne Marshall @ 2011-03-14 15:35 UTC (permalink / raw)
  To: Charlie Brady; +Cc: supervision

On Mon, 14 Mar 2011 11:03:45 -0400 (EDT)
Charlie Brady <charlieb-supervision@budge.apana.org.au> wrote:

> 
> On Mon, 14 Mar 2011, Robin Bowes wrote:
> 
> > On 14/03/11 14:02, Wayne Marshall wrote:
> > 
> > > Under normal (default) configurations, whenever perpd(8)
> > > terminates it is restarted by either perpboot(8), or
> > > init(8) with a "respawn" configuration in inittab(5).
> > > perpd(8) then restarts all services marked for activation
> > > in /etc/perp.
> > 
> > So, if I have a service that is normally running, ie. starts
> > at boot, but I have taken it down manually for whatever
> > reason, and perpd dies, then my service will also be
> > re-started?
> 
> And presumably the converse will apply as well. This is a
> problem with runit (and daemontools) - if a service has a
> 'down' file, but has been later started, a dying runsv (e.g.
> if killed by the OoM killer, or by a service which kills its
> process group) will be replaced by runsvdir, but the service
> will stay down.
> 

This is not so much a "problem" of design, but rather of
adminsistrative clarity.

Use "flag.down" only when you don't want a service to start
immediately with perpd, but do want it activated and available to
perpctl administration.

As an example, I use a wpa_supplicant service definition on my
laptop.  It is defined with "flag.down", because I don't care for
a wireless connection in all circumstances.  Other network
scripts may then call:

  perpctl up wpa_supplicant

or

  perpctl down wpa_supplicant

as necessary.

Otherwise -- and generally for any truly persistent process
service -- administrators will avoid using the "flag.down"
mechanism in favor of the easy, in-place service
activation/deactivation mechanism that perpd provides with the
service directory sticky bit.

Wayne


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 14:34       ` Wayne Marshall
@ 2011-03-14 16:47         ` Laurent Bercot
  2011-03-14 17:39           ` Wayne Marshall
  0 siblings, 1 reply; 14+ messages in thread
From: Laurent Bercot @ 2011-03-14 16:47 UTC (permalink / raw)
  To: supervision

> First, perpd(8) will not die (TM).

 Of course it will not - not in normal circumstances.
 Neither will svscan, or runsvdir, or s6-svscan.
 I trust your programming ability in that matter as much as mine - this
is not a concern at all.

 The concern is that you don't always have the say. There's this playful
thing called the Linux OOM killer. I hear the heuristics have been fixed
in recent kernel releases, but for a long time, the OOM killer had the
amusing habit of shooting processes at random, and very much failing to
locate the process that is actually responsible for the memory outage.
 There are still a whole lot of broken OOM killers out there.

 Of course, this is not a normal condition, and under careful administration
it never happens. But the point is, when you are designing a supervision
tool, you should assume that you can get a random SIGKILL (Headshot. Do
not pass Go. Do not call your cleanup routines.) at any time.

 Because if a supervision tool can't recover from an OOM event and keep
vital services running until the sysadmin finishes his coffee and can
manually repair things, then what is it good for ?

 That is why I asked my question. In other supervision schemes, tasks are
de-centralized, so if one process randomly dies, it generally does not have
much impact on the rest of the system. (If runsvdir dies, it's annoying,
but things keep working until the admin can come clean things up.)
 perpd, however, looks like a neural hub, centralizing a lot of info into
its memory. IOW, a SPOF, and you can be sure that the next broken system
tool will love to play Doom with it.

 Is your supervision chain SIGKILL-resistant ?

-- 
 Laurent


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 15:03       ` Charlie Brady
  2011-03-14 15:35         ` Wayne Marshall
@ 2011-03-14 17:02         ` Laurent Bercot
  2011-03-14 17:42           ` Charlie Brady
  1 sibling, 1 reply; 14+ messages in thread
From: Laurent Bercot @ 2011-03-14 17:02 UTC (permalink / raw)
  To: supervision

>> So, if I have a service that is normally running, ie. starts at boot,
>> but I have taken it down manually for whatever reason, and perpd dies,
>> then my service will also be re-started?
> 
> And presumably the converse will apply as well. This is a problem with 
> runit (and daemontools) - if a service has a 'down' file, but has been 
> later started, a dying runsv (e.g. if killed by the OoM killer, or by a 
> service which kills its process group) will be replaced by runsvdir, but 
> the service will stay down.

 We've already discussed this. The default state of a service is controlled
by the absence or presence of a 'down' file. The actual state of a service
can be changed either manually or via a script, but this state *cannot be
strongly guaranteed* if it does not match the default. This is an
unavoidable limit of daemontools-like supervision schemes; do not blame it
on perp's design.

-- 
 Laurent


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 16:47         ` Laurent Bercot
@ 2011-03-14 17:39           ` Wayne Marshall
  2011-03-14 17:52             ` Paul Jarc
  2011-03-14 18:34             ` Laurent Bercot
  0 siblings, 2 replies; 14+ messages in thread
From: Wayne Marshall @ 2011-03-14 17:39 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: supervision

On Mon, 14 Mar 2011 17:47:41 +0100
Laurent Bercot <ska-supervision@skarnet.org> wrote:

> > First, perpd(8) will not die (TM).
> 
>  Of course it will not - not in normal circumstances.
>  Neither will svscan, or runsvdir, or s6-svscan.
>  I trust your programming ability in that matter as much as
> mine - this is not a concern at all.
> 
>  The concern is that you don't always have the say. There's
> this playful thing called the Linux OOM killer. I hear the
> heuristics have been fixed in recent kernel releases, but for
> a long time, the OOM killer had the amusing habit of shooting
> processes at random, and very much failing to locate the
> process that is actually responsible for the memory outage.
> There are still a whole lot of broken OOM killers out there.

It is like worrying, what if init(8) should die?

> 
>  Of course, this is not a normal condition, and under careful
> administration it never happens. But the point is, when you
> are designing a supervision tool, you should assume that you
> can get a random SIGKILL (Headshot. Do not pass Go. Do not
> call your cleanup routines.) at any time.
>

If a system is delivering random SIGKILL, one should select
another system.  There is no peaceful, confident sleeping at
night otherwise, no matter what supervisory framework you choose.
 
>  That is why I asked my question. In other supervision
> schemes, tasks are de-centralized, so if one process randomly
> dies, it generally does not have much impact on the rest of
> the system. (If runsvdir dies, it's annoying, but things keep
> working until the admin can come clean things up.) perpd,
> however, looks like a neural hub, centralizing a lot of info
> into its memory. IOW, a SPOF, and you can be sure that the
> next broken system tool will love to play Doom with it.
>

If we talk in terms of daemontools, svscan(8) already keeps a
table of supervise(8) processes, and svscan itself functions as a
supervisor of those multiple supervise(8)s.  So it is not much of
a conceptual jump, nor extra info, to simply eliminate the
supervise(8) "middlemen", and have svscan supervise the services
directly.

This is all that perpd(8) does (as well as what init/minit/ninit
do, too.)

perpd does provide redundant supervision with perpboot/inittab
by default when installed with perp-setup(8).  Imagining any
extra security from additional layers of supervision is merely a
placebo, but you are certainly welcome to it if your base system
is so fundamentally flawed.  For example, you can run one perpd
instance per service if you like.  Or you can setup your
perpetrate(5) service definitions to exec services under
supervision of rundeux(8).

Of course you can always revert to perp-0.00, too, if you prefer.
It has all the same perp usability, but with a supervisory
architecture that may be more familiar to you.

Wayne


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 17:02         ` Laurent Bercot
@ 2011-03-14 17:42           ` Charlie Brady
  0 siblings, 0 replies; 14+ messages in thread
From: Charlie Brady @ 2011-03-14 17:42 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: supervision


On Mon, 14 Mar 2011, Laurent Bercot wrote:

> >> So, if I have a service that is normally running, ie. starts at boot,
> >> but I have taken it down manually for whatever reason, and perpd dies,
> >> then my service will also be re-started?
> > 
> > And presumably the converse will apply as well. This is a problem with 
> > runit (and daemontools) - if a service has a 'down' file, but has been 
> > later started, a dying runsv (e.g. if killed by the OoM killer, or by a 
> > service which kills its process group) will be replaced by runsvdir, but 
> > the service will stay down.
> 
>  We've already discussed this. The default state of a service is controlled
> by the absence or presence of a 'down' file. The actual state of a service
> can be changed either manually or via a script, but this state *cannot be
> strongly guaranteed* if it does not match the default. This is an
> unavoidable limit of daemontools-like supervision schemes; do not blame it
> on perp's design.

I don't. I would just pointing it out, as one case of what can happen when 
state is in memory.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 17:39           ` Wayne Marshall
@ 2011-03-14 17:52             ` Paul Jarc
  2011-03-14 18:43               ` Wayne Marshall
  2011-03-14 18:34             ` Laurent Bercot
  1 sibling, 1 reply; 14+ messages in thread
From: Paul Jarc @ 2011-03-14 17:52 UTC (permalink / raw)
  To: supervision

Wayne Marshall <wcm@b0llix.net> wrote:
> It is like worrying, what if init(8) should die?

If process 1 dies, the system halts, and we reboot it.  But perpd
doesn't run as process 1, right?  So if it did receive SIGKILL, for
whatever reason, it's not so obvious what would happen.

> Imagining any extra security from additional layers of supervision
> is merely a placebo, but you are certainly welcome to it if your
> base system is so fundamentally flawed.

No one has suggested adding layers.  The separation of duties in
daemontools and other systems doesn't determine the behavior when a
process dies; it just makes it easier for us to *know* what will
happen when a given process dies.


paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 17:39           ` Wayne Marshall
  2011-03-14 17:52             ` Paul Jarc
@ 2011-03-14 18:34             ` Laurent Bercot
  1 sibling, 0 replies; 14+ messages in thread
From: Laurent Bercot @ 2011-03-14 18:34 UTC (permalink / raw)
  To: supervision

> It is like worrying, what if init(8) should die?

 No, not exactly, because init(8) is process 1.
 If perpd is meant to be run as process 1, then I have no more questions
- it just will not die, as you say. But if it is not, it is legitimate
to wonder about it dying.

 And btw, I do worry about process 1 dying, not from murder, which is not
possible, but from simple illness, i.e. bugs.
 That is why I do not trust Upstart, or MacOS X launchd, or even
sysvinit's init: those programs are too complex for anyone to be able
to guarantee that they can't die, and they don't leak memory, and they
don't have any other bug of the kind. Process 1, the ultimate long-lived
process on a system, should be *proven* to work, and complexity is
antagonistic to that.


> If a system is delivering random SIGKILL, one should select
> another system.  There is no peaceful, confident sleeping at
> night otherwise, no matter what supervisory framework you choose.

 But the point of supervision is to make up for deficiencies in the
real world !
 I don't need the daemons I write to be supervised, because I know how
to write daemons, and they just Do Not Die (TM). Who needs automatic
respawning when there is no bug in your programs and they just work ?
 In a perfect world, none of the work we're doing here would be necessary !
Unfortunately, we're not living in a perfect world, and stuff happens.
We're just building additional safeguards to ensure that even when the
improbable happens, our systems keep working.

 If a SIGKILL hitting perpd is just too improbable for you and you do not
want to cover that case ("perp offers no guarantee against acts of God,
malevolent or stupid root account holders, or buggy Linux OOMs"), that's
perfectly fine, and perp will still be basically usable about 100% of the
time. But, well, I like to turn the paranoia to the max and be able to
say "it still works". :)


> If we talk in terms of daemontools, svscan(8) already keeps a
> table of supervise(8) processes, and svscan itself functions as a
> supervisor of those multiple supervise(8)s.  So it is not much of
> a conceptual jump, nor extra info, to simply eliminate the
> supervise(8) "middlemen", and have svscan supervise the services
> directly.

 Oh, I am not attacking perp's design at all. I welcome alternatives
in the world of supervision suites. My own take on the matter, s6 (to
be released as soon as the doc is written, which means... someday), is
a daemontools-like design, and we were lacking an init-like design.
Variety is good, and I don't think perpd's design is a fundamental flaw
- I have reasons for liking daemontools' design better, but they're
mostly maintainability- and aesthetics-related. If every Unix admin
in the world used perp, or runit, I would be a happy man. Anything
we have here is so much better than what mainstream offers.


> perpd does provide redundant supervision with perpboot/inittab
> by default when installed with perp-setup(8).  Imagining any
> extra security from additional layers of supervision is merely a
> placebo

 It's not about adding layers, it's about dividing responsibilities.
I'll elaborate on this later, right now I have a bus to catch.

-- 
 Laurent


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [announce] perp-2.03: persistent process supervision
  2011-03-14 17:52             ` Paul Jarc
@ 2011-03-14 18:43               ` Wayne Marshall
  0 siblings, 0 replies; 14+ messages in thread
From: Wayne Marshall @ 2011-03-14 18:43 UTC (permalink / raw)
  To: Paul Jarc; +Cc: supervision

On Mon, 14 Mar 2011 13:52:58 -0400
prj@po.cwru.edu (Paul Jarc) wrote:

> Wayne Marshall <wcm@b0llix.net> wrote:
> > It is like worrying, what if init(8) should die?
> 
> If process 1 dies, the system halts, and we reboot it.  But
> perpd doesn't run as process 1, right?  So if it did receive
> SIGKILL, for whatever reason, it's not so obvious what would
> happen.
>

It is as deterministic as if svscan is SIGKILLed: the system is
unstable.

Wayne


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-03-14 18:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-14 10:39 [announce] perp-2.03: persistent process supervision Wayne Marshall
2011-03-14 13:17 ` Laurent Bercot
2011-03-14 14:02   ` Wayne Marshall
2011-03-14 14:23     ` Robin Bowes
2011-03-14 14:34       ` Wayne Marshall
2011-03-14 16:47         ` Laurent Bercot
2011-03-14 17:39           ` Wayne Marshall
2011-03-14 17:52             ` Paul Jarc
2011-03-14 18:43               ` Wayne Marshall
2011-03-14 18:34             ` Laurent Bercot
2011-03-14 15:03       ` Charlie Brady
2011-03-14 15:35         ` Wayne Marshall
2011-03-14 17:02         ` Laurent Bercot
2011-03-14 17:42           ` Charlie Brady

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).