supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* External health Check Process
@ 2020-10-22 12:28 Oliver Schad
  2020-10-22 15:34 ` Laurent Bercot
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Schad @ 2020-10-22 12:28 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]

Hi everybody,

we have cases, where processes are still there, but doesn't work
anymore. This is a common problem with runtime environments like java
or golang, where the memory management might have problems and internal
routines doesn't work anymore. That is really a common problem in that
area (heap too small, too frequently garbage collection, ...)

I know you can model a service in s6, which watches another service and
kills it, so in fact the problem is solved outside of s6. But I wanted
to ask to develop a feature to get a simple way to model that within s6.
Usually it's good enough to call a external command with a timeout and
watches exit code.

Yes, that means polling but in a datacenter is a polling health check
not a big energy problem.

Is that something, you can imagine in the future, supporting an
external health check?

Best Regards
Oli

-- 
Automatic-Server AG •••••
Oliver Schad
Geschäftsführer
Turnerstrasse 2
9000 St. Gallen | Schweiz

www.automatic-server.com | oliver.schad@automatic-server.com
Tel: +41 71 511 31 11 | Mobile: +41 76 330 03 47

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: External health Check Process
  2020-10-22 12:28 External health Check Process Oliver Schad
@ 2020-10-22 15:34 ` Laurent Bercot
  2020-10-22 15:46   ` Casper Ti. Vector
  2020-10-23  0:03   ` Steve Litt
  0 siblings, 2 replies; 8+ messages in thread
From: Laurent Bercot @ 2020-10-22 15:34 UTC (permalink / raw)
  To: Oliver Schad, supervision

>I know you can model a service in s6, which watches another service and
>kills it, so in fact the problem is solved outside of s6. But I wanted
>to ask to develop a feature to get a simple way to model that within s6.
>Usually it's good enough to call a external command with a timeout and
>watches exit code.

  Hi Oliver,

  The s6-idiomatic way of doing it would be, as you say, to have a
separate service that calls an external command (the health checker,
which is daemon-specific) with a timeout and watches the exit code.
It is trivial to do in shell, which is why I haven't written any
particular binary for that.

  I could add a program that does it for you so you don't have to write
a 3-line shell script, and a command that creates a s6 service directory
(or even a s6-rc source definition directory) that watches another
service using the aforementioned program, it would not be hard.
However, I am concerned about scope creep, and a common criticism I
hear from distros is that s6 is "too big" - which is unfair considering
that integrated init systems providing the same level of functionality
are 5x-10x bigger, but is really a way of saying that there are a lot of
exposed binaries with miscellaneous functionality and it's difficult to
wrap one's head around it. So I'm trying not to add to the problem, and
the direction I'm going these days is more towards integration and
high-level management than towards adding building blocks to help with
various tasks, so if something is doable with a bit of scripting, then
I'd rather let users do it that way.

  I'm pretty sure that people in the community already have run script
models for healthchecker services, if they could contribute them it
would be awesome ;)

--
  Laurent


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: External health Check Process
  2020-10-22 15:34 ` Laurent Bercot
@ 2020-10-22 15:46   ` Casper Ti. Vector
  2020-10-23  0:03   ` Steve Litt
  1 sibling, 0 replies; 8+ messages in thread
From: Casper Ti. Vector @ 2020-10-22 15:46 UTC (permalink / raw)
  To: supervision

On Thu, Oct 22, 2020 at 03:34:37PM +0000, Laurent Bercot wrote:
>  I'm pretty sure that people in the community already have run script
> models for healthchecker services, if they could contribute them it
> would be awesome ;)

Just in case anyone finds this a useful prototype:
<https://gitea.com/CasperVector/slew/src/branch/master/lib/rate.rc>
Mainly used here as of now:
<https://gitea.com/CasperVector/slew/src/branch/master/base/sshhole.>

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: External health Check Process
  2020-10-22 15:34 ` Laurent Bercot
  2020-10-22 15:46   ` Casper Ti. Vector
@ 2020-10-23  0:03   ` Steve Litt
  2020-10-23  7:27     ` Oliver Schad
  1 sibling, 1 reply; 8+ messages in thread
From: Steve Litt @ 2020-10-23  0:03 UTC (permalink / raw)
  To: supervision

On Thu, 22 Oct 2020 15:34:37 +0000
"Laurent Bercot" <ska-supervision@skarnet.org> wrote:


>   Hi Oliver,
> 
>   The s6-idiomatic way of doing it would be, as you say, to have a
> separate service that calls an external command (the health checker,
> which is daemon-specific) with a timeout and watches the exit code.
> It is trivial to do in shell, which is why I haven't written any
> particular binary for that.
> 
>   I could add a program that does it for you so you don't have to
> write a 3-line shell script, and a command that creates a s6 service
> directory (or even a s6-rc source definition directory) that watches
> another service using the aforementioned program, it would not be
> hard. However, I am concerned about scope creep, and a common
> criticism I hear from distros is that s6 is "too big" - which is
> unfair considering that integrated init systems providing the same
> level of functionality are 5x-10x bigger, but is really a way of
> saying that there are a lot of exposed binaries with miscellaneous
> functionality and it's difficult to wrap one's head around it. 

Laurent, I agree with you. My main attraction to daemontools, runit and
s6 is they're simple and understandable. There's almost nothing I can't
do with them if I get creative with shellscripts. I understand you
insistence on PID1 supervising the real supervisor: That's worth the
added complexity. I understand your desire to order process
instantiation at boot and to intermix run-once and long-run processes,
think that's worth the added complexity, and in fact this is one of the
few things I missed in daemontools and runit. 

But most of the other suggestions that in my opinion are just answers
to systemd weenie's "but s6 doesn't have _____" arguments, and don't
add nearly enough functionality or convenience for the complexity, or
just plain size added to the user manual, to justify.

The OP already stated there's a way to do it currently. Why complexify
s6 to do something already doable?

SteveT

Steve Litt 
Autumn 2020 featured book: Thriving in Tough Times
http://www.troubleshooters.com/thrive

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: External health Check Process
  2020-10-23  0:03   ` Steve Litt
@ 2020-10-23  7:27     ` Oliver Schad
  2020-10-23  9:15       ` Steve Litt
  2020-10-23 13:44       ` Laurent Bercot
  0 siblings, 2 replies; 8+ messages in thread
From: Oliver Schad @ 2020-10-23  7:27 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

On Thu, 22 Oct 2020 20:03:17 -0400
Steve Litt <slitt@troubleshooters.com> wrote:

> But most of the other suggestions that in my opinion are just answers
> to systemd weenie's "but s6 doesn't have _____" arguments, and don't
> add nearly enough functionality or convenience for the complexity, or
> just plain size added to the user manual, to justify.
> 
> The OP already stated there's a way to do it currently. Why complexify
> s6 to do something already doable?

I just miss the elegance of the solution: I personally want to model
one service with one s6 service. For me it would mean thinking about a
wrapper around s6 to get that. Maybe I get now the slew thing.

And it's ok to need a wrapper to get useability, but the
advertisement of that should be better on the website that you SHOULD
use that wrapper (and for me this wrapper should be part of the s6
project).

Best Regards
Oli

-- 
Automatic-Server AG •••••
Oliver Schad
Geschäftsführer
Turnerstrasse 2
9000 St. Gallen | Schweiz

www.automatic-server.com | oliver.schad@automatic-server.com
Tel: +41 71 511 31 11 | Mobile: +41 76 330 03 47

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: External health Check Process
  2020-10-23  7:27     ` Oliver Schad
@ 2020-10-23  9:15       ` Steve Litt
  2020-10-23 13:44       ` Laurent Bercot
  1 sibling, 0 replies; 8+ messages in thread
From: Steve Litt @ 2020-10-23  9:15 UTC (permalink / raw)
  To: supervision

On Fri, 23 Oct 2020 09:27:53 +0200
Oliver Schad <oliver.schad@automatic-server.com> wrote:

> On Thu, 22 Oct 2020 20:03:17 -0400
> Steve Litt <slitt@troubleshooters.com> wrote:
> 
> > But most of the other suggestions that in my opinion are just
> > answers to systemd weenie's "but s6 doesn't have _____" arguments,
> > and don't add nearly enough functionality or convenience for the
> > complexity, or just plain size added to the user manual, to justify.
> > 
> > The OP already stated there's a way to do it currently. Why
> > complexify s6 to do something already doable?  
> 
> I just miss the elegance of the solution: 

I get that. But there's a pretty significant cost. Every new feature
added to a piece of software makes it harder to understand, creates new
nooks and crannies for bugs to hide out in, and increases the number of
interactions very significantly. To see interactions at their worst,
see my systemd cartoon:

http://troubleshooters.com/linux/systemd/lol_systemd.htm

I'm not saying s6 is anywhere near that yet. But in my opinion, every
feature complexifies the software even more than the last one, and
every feature should be evaluated similar to a new purchase of a
possession:

1) Where am I going to keep it? How much will it clutter the house?

2) What will I not buy to free up money to buy this thing.

> I personally want to model
> one service with one s6 service. 

I'm not sure what you mean by "model". I thought this was about
checking the health of each service. Anyway, I understand that you
personally want to match the healthchecks one to one with the services,
and that would be nice, but not if it adds complexity.

[snip the rest of the email, which I didn't understand at all]

I'm on probably 25 software mailing lists, and have this discussion on
every one of them. Somebody wants some feature. I write back that you
can already do that by doing <whatever>. They write back saying my idea
is a kludge. I write back and say I like a nice, simple program that
can be written and maintained by one person, features tend to wreck
that, all sorts of people want their pet features, and those features
are usually unimportant (for instance, way to do it with existing
software) to the suggester and *absolutely* unimportant to everyone
else. Features clutter up software, and should be done only if they're
very important to a large swath of users.

With healthchecks, it would be trivial for you to create a shellscript
called healthcheck in every service directory that required a
healthcheck, then have a program that loops around all the service
directories, runs the healthcheck shellscript, and if unhealthy,
performs actions listed in the healthcheck subscript. If you do this
for awhile, you'll slowly evolve the thing into a more and more
convenient form, until others use it. I mean, you'd need to roll it
into a tarball and write a bit of documentation, but nothing like
changing the whole program.

The real beauty of this approach is that, as more and more people use
your system and more and more people contribute feedback, sooner or
later it reaches a state where it would be much easier to add it as a
feature of the whole program, with an interface people like.

 SteveT

Steve Litt 
Autumn 2020 featured book: Thriving in Tough Times
http://www.troubleshooters.com/thrive

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: External health Check Process
  2020-10-23  7:27     ` Oliver Schad
  2020-10-23  9:15       ` Steve Litt
@ 2020-10-23 13:44       ` Laurent Bercot
  2020-10-23 17:03         ` Steve Litt
  1 sibling, 1 reply; 8+ messages in thread
From: Laurent Bercot @ 2020-10-23 13:44 UTC (permalink / raw)
  To: Oliver Schad, supervision


>I just miss the elegance of the solution: I personally want to model
>one service with one s6 service. For me it would mean thinking about a
>wrapper around s6 to get that. Maybe I get now the slew thing.

  The thing is, s6 is a *process supervision* suite, so one s6 "service"
is really one long-running process. When you want health checks, you
have two long-running processes: your daemon, and your health checker.
So two s6 "services" is really the most elegant, most idiomatic and
most natural solution.

  What you could have, on the other hand, is a s6-rc bundle, that 
contains
both your daemon and your health checker: so you would be able to
handle both the daemon and the health checker (2 longruns) with a
single s6-rc/svctl command, using the name of the bundle.

  It's probably something that I can add to the next version of s6-rc:
a command or an option to automatically add a health checker service to
a longrun that is declared in the database, so you wouldn't have to
write the health checker longrun manually. How does that sound?


>And it's ok to need a wrapper to get useability, but the
>advertisement of that should be better on the website that you SHOULD
>use that wrapper (and for me this wrapper should be part of the s6
>project).

  This is indeed a UI problem and I'm still working on it. ;)

--
  Laurent


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: External health Check Process
  2020-10-23 13:44       ` Laurent Bercot
@ 2020-10-23 17:03         ` Steve Litt
  0 siblings, 0 replies; 8+ messages in thread
From: Steve Litt @ 2020-10-23 17:03 UTC (permalink / raw)
  To: supervision

On Fri, 23 Oct 2020 13:44:55 +0000
"Laurent Bercot" <ska-supervision@skarnet.org> wrote:

> >I just miss the elegance of the solution: I personally want to model
> >one service with one s6 service. For me it would mean thinking about
> >a wrapper around s6 to get that. Maybe I get now the slew thing.  
> 
>   The thing is, s6 is a *process supervision* suite, so one s6
> "service" is really one long-running process. When you want health
> checks, you have two long-running processes: your daemon, and your
> health checker. So two s6 "services" is really the most elegant, most
> idiomatic and most natural solution.
> 
>   What you could have, on the other hand, is a s6-rc bundle, that 
> contains
> both your daemon and your health checker: so you would be able to
> handle both the daemon and the health checker (2 longruns) with a
> single s6-rc/svctl command, using the name of the bundle.
> 
>   It's probably something that I can add to the next version of s6-rc:
> a command or an option to automatically add a health checker service
> to a longrun that is declared in the database, so you wouldn't have to
> write the health checker longrun manually. How does that sound?

I'd poll s6 users, and if less than 1/2 eagerly want this new feature,
I'd leave well enough alone.
  
SteveT

Steve Litt 
Autumn 2020 featured book: Thriving in Tough Times
http://www.troubleshooters.com/thrive

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-23 17:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 12:28 External health Check Process Oliver Schad
2020-10-22 15:34 ` Laurent Bercot
2020-10-22 15:46   ` Casper Ti. Vector
2020-10-23  0:03   ` Steve Litt
2020-10-23  7:27     ` Oliver Schad
2020-10-23  9:15       ` Steve Litt
2020-10-23 13:44       ` Laurent Bercot
2020-10-23 17:03         ` Steve Litt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).