supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: "Laurent Bercot" <ska-supervision@skarnet.org>
To: "Xavier Stonestreet" <xstonestreet@gmail.com>,
	supervision@list.skarnet.org
Subject: Re: s6-rc: timeout questions
Date: Wed, 18 Nov 2020 19:06:09 +0000
Message-ID: <emf88efb9a-6e7a-4057-8541-03a0cda4480d@elzian> (raw)
In-Reply-To: <CAK-rGM+FKVsA_ok5-C4CdCrzjxPK4Or7C2P3ZF-yGmgLSerhAQ@mail.gmail.com>

>Could you elaborate a little more about the state transition failures
>of oneshots caused by timeouts?
>
>Let's say for example the oneshot's up script times out, so the
>transition fails. From s6-rc's point of view the oneshot is still
>down. What actually happens to the process running the up script? Is
>it left running in the background? If yes, is it correct to assume
>that since s6-rc considers it down, another invocation of the s6-rc -u
>change command on the same oneshot will spawn another instance of the
>up script? If not, is it killed, and how?

  It is correct to assume that another instance will be spawned, yes.
It was a difficult decision to make, and I'm still not sure it is the
right one. There are advantages and drawbacks to both approaches, but
at the end of the day it all comes down to: what set of actions will
leave the system in the *least* unknown state?

  s6-rc's design assumes that timeouts, if they exist, are properly
calibrated; if a service times out, then it's not that the timeout is
too short, it's that something is really going wrong. So it considers
the transition failed. Now what should it do about the existing
process? kill it or not?
  If the process is allowed to live on, it may succeed, in which case
s6-rc's vision of the service will be wrong, but 1. it doesn't matter
because services should always be written as idempotent, and 2. it means
that the timeout was badly calibrated in the first place. Or it may
fail and s6-rc's vision will be correct.
  If the process is killed, chances are that it will add to the problem
instead of solving it. For instance, if the process is hanging in D
state, killing it won't do anything except make the system more 
unstable.
If the process is doing some complex operation and not properly
sequencing its operations, sending it a signal may trigger a bug. etc.
  In the end I weighed that sending a signal would potentially cause more
harm than good, but I don't think using the opposite approach would be
wrong either.


>Test 2:
>s1 is down
>s2 is down
>s6-rc -u change s2
>s6-rc: fatal: timed out
>s6-svlisten1: fatal: timed out
>
>Timeout failure. Unexpected. I thought timeout-up and timeout-down
>applied to each atomic service individually, not to the entire
>dependency chain to bring it up or down.

  Yes, it should be behaving as you say, and I suspect you have
uncovered a bug - not in the timeout management for a dependency chain,
but in the management of s6-rc's *global* timeout, which is the one
that is triggering here. I suspect I'm taking incorrect shortcuts wrt
timeout management, and will take a look. Thanks!

--
  Laurent


  reply	other threads:[~2020-11-18 19:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 15:53 Xavier Stonestreet
2020-11-17 21:53 ` Laurent Bercot
2020-11-18 17:49   ` Xavier Stonestreet
2020-11-18 19:06     ` Laurent Bercot [this message]
2020-11-20 10:30       ` Xavier Stonestreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=emf88efb9a-6e7a-4057-8541-03a0cda4480d@elzian \
    --to=ska-supervision@skarnet.org \
    --cc=supervision@list.skarnet.org \
    --cc=xstonestreet@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.vuxu.org/supervision

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 supervision supervision/ http://inbox.vuxu.org/supervision \
		subscribe@list.skarnet.org
	public-inbox-index supervision

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.supervision.general


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git