supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* s6-permafailon not acting as expected
@ 2020-11-16 18:01 Xavier Stonestreet
  2020-11-16 22:05 ` Laurent Bercot
  0 siblings, 1 reply; 8+ messages in thread
From: Xavier Stonestreet @ 2020-11-16 18:01 UTC (permalink / raw)
  To: supervision

Hello,

I'm seeing unexpected results with s6-permafailon.
I have this line in my finish script:

s6-permafailon 120 2 1-255,SIGBUS,SIGSEGV exit 0

If I kill the service twice in a row with s6-svc -t, s6-permafailon
triggers a permanent failure with the message:

s6-permafailon: info: PERMANENT FAILURE triggered after 2 events
involving exit code 0 in the last 120 seconds.

The service did die with exit code 0, but s6-permafail is wrong, since
exit code 0 is not specified in the filter.

s6-svdt reports:
2020-11-16 16:00:17.451308493 exitcode 0
2020-11-16 16:00:28.495276767 exitcode 0

Looking at the source code in s6-permafailon.c, it seems to me that
the codes[32] bitarray is not initialized with zeroes before calling
bitarray_set(n). But not being intimately familiar with the code I'm
probably looking at it incorrectly. And I'm not instrumented nor
knowledgeable enough to run a debugger.

Could you shed some light on this unexpected behavior and tell me what
I'm doing wrong, as the case may be.

Great supervision and service management toolbox by the way - I love it! :)

Thanks,
X.

s6 2.9.2.0
skalibs 2.9.3.0
execline 2.6.1.1
built from source, linked with musl libc on Linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: s6-permafailon not acting as expected
  2020-11-16 18:01 s6-permafailon not acting as expected Xavier Stonestreet
@ 2020-11-16 22:05 ` Laurent Bercot
  2020-11-17 15:32   ` Xavier Stonestreet
  0 siblings, 1 reply; 8+ messages in thread
From: Laurent Bercot @ 2020-11-16 22:05 UTC (permalink / raw)
  To: Xavier Stonestreet, supervision

>The service did die with exit code 0, but s6-permafail is wrong, since
>exit code 0 is not specified in the filter.
>
>Looking at the source code in s6-permafailon.c, it seems to me that
>the codes[32] bitarray is not initialized with zeroes before calling
>bitarray_set(n).

  You're probably right that it is the issue! (codes was a static
variable in the first iteration; static variables do not need to be
0-initialized. But then I reworked the code, made codes an automatic
variable, and failed to initialize it to zero. And that is why relying
on magic features is a bad thing, and I should stop doing this out of
laziness.)

  I fixed it in the latest git of s6; if you prefer numbered releases,
I'm working on a bigger release (with some interface changes and a
major version bump) that should happen before the end of the year, but
not right now ^^'

  Please try the fixed version. If you don't want to use the git head,
you can apply the very simple diff:
  
https://git.skarnet.org/cgi-bin/cgit.cgi/s6/diff/src/supervision/s6-permafailon.c

  If that version doesn't solve your problem, then I'll investigate
further.

  Thanks for the report!

--
  Laurent


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: s6-permafailon not acting as expected
  2020-11-16 22:05 ` Laurent Bercot
@ 2020-11-17 15:32   ` Xavier Stonestreet
  2020-11-17 18:43     ` Xavier Stonestreet
  0 siblings, 1 reply; 8+ messages in thread
From: Xavier Stonestreet @ 2020-11-17 15:32 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: supervision

>   Please try the fixed version. If you don't want to use the git head,
> you can apply the very simple diff:

Thanks for your quick reply and for the patch. It's working as it
should. I'm glad I was lucky to stumble upon this randomly occurring
bug.

> I'm working on a bigger release (with some interface changes and a
> major version bump) that should happen before the end of the year, but
> not right now ^^'

Sounds good. So what kind of interface changes should we expect? I
mean will they require updating the service definitions, run/finish
scripts, that sort of thing?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: s6-permafailon not acting as expected
  2020-11-17 15:32   ` Xavier Stonestreet
@ 2020-11-17 18:43     ` Xavier Stonestreet
  2020-11-17 20:06       ` Xavier Stonestreet
  2020-11-17 21:42       ` Laurent Bercot
  0 siblings, 2 replies; 8+ messages in thread
From: Xavier Stonestreet @ 2020-11-17 18:43 UTC (permalink / raw)
  To: supervision

Hold on, I've now encountered the same issue for another service, but
with signals :)

In servicedir's finish:
s6-permafailon 120 2 1-255,SIGBUS,SIGSEGV exit 0

svc -t servicedir
svc -t servicedir

2020-11-17 19:20:24.465424531  s6-permafailon: info: PERMANENT FAILURE
triggered after 2 events involving signal 15 in the last 120 seconds

s6-svdt servicedir | s6-tai64nlocal
2020-11-17 19:19:30.449842687 signal SIGTERM
2020-11-17 19:20:24.419594225 signal SIGTERM

Back to s6-permafailon.c: the sigset_t sigs is not initialized
either... Looks like it needs a call to sigemptyset().

Hope this helps :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: s6-permafailon not acting as expected
  2020-11-17 18:43     ` Xavier Stonestreet
@ 2020-11-17 20:06       ` Xavier Stonestreet
  2020-11-17 21:42       ` Laurent Bercot
  1 sibling, 0 replies; 8+ messages in thread
From: Xavier Stonestreet @ 2020-11-17 20:06 UTC (permalink / raw)
  To: supervision

Upon further investigation, it turns out my events filter is
incorrect, because exit codes greater than 128 imply that the service
has been killed by a signal (128 + signal number), so the exit code
range 1-255 actually means exit codes 1 to 127 or any signal.

The correct specification for my example is 1-127,SIGBUS,SIGSEGV.

I believe a call to sigemptyset() in s6-permafailon.c may still be
warranted however, because the POSIX spec states the results of
sigismember() are otherwise undefined.

On Tue, Nov 17, 2020 at 7:43 PM Xavier Stonestreet
<xstonestreet@gmail.com> wrote:
>
> Hold on, I've now encountered the same issue for another service, but
> with signals :)
>
> In servicedir's finish:
> s6-permafailon 120 2 1-255,SIGBUS,SIGSEGV exit 0
>
> svc -t servicedir
> svc -t servicedir
>
> 2020-11-17 19:20:24.465424531  s6-permafailon: info: PERMANENT FAILURE
> triggered after 2 events involving signal 15 in the last 120 seconds
>
> s6-svdt servicedir | s6-tai64nlocal
> 2020-11-17 19:19:30.449842687 signal SIGTERM
> 2020-11-17 19:20:24.419594225 signal SIGTERM
>
> Back to s6-permafailon.c: the sigset_t sigs is not initialized
> either... Looks like it needs a call to sigemptyset().
>
> Hope this helps :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: s6-permafailon not acting as expected
  2020-11-17 18:43     ` Xavier Stonestreet
  2020-11-17 20:06       ` Xavier Stonestreet
@ 2020-11-17 21:42       ` Laurent Bercot
  2020-11-18 17:22         ` Xavier Stonestreet
  1 sibling, 1 reply; 8+ messages in thread
From: Laurent Bercot @ 2020-11-17 21:42 UTC (permalink / raw)
  To: Xavier Stonestreet, supervision

>Back to s6-permafailon.c: the sigset_t sigs is not initialized
>either... Looks like it needs a call to sigemptyset().

  Gah! -_- Fixed. Thanks.

--
  Laurent


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: s6-permafailon not acting as expected
  2020-11-17 21:42       ` Laurent Bercot
@ 2020-11-18 17:22         ` Xavier Stonestreet
  2020-11-18 18:35           ` Laurent Bercot
  0 siblings, 1 reply; 8+ messages in thread
From: Xavier Stonestreet @ 2020-11-18 17:22 UTC (permalink / raw)
  To: supervision

In case you missed it, I'm reposting my question. If you're too busy
to respond, no worries, I'll understand. Thanks.

> I'm working on a bigger release (with some interface changes and a
> major version bump) that should happen before the end of the year, but
> not right now ^^'

Sounds good. So what kind of interface changes should we expect? I
mean will they require updating the service definitions, run/finish
scripts, that sort of thing?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: s6-permafailon not acting as expected
  2020-11-18 17:22         ` Xavier Stonestreet
@ 2020-11-18 18:35           ` Laurent Bercot
  0 siblings, 0 replies; 8+ messages in thread
From: Laurent Bercot @ 2020-11-18 18:35 UTC (permalink / raw)
  To: Xavier Stonestreet, supervision

>Sounds good. So what kind of interface changes should we expect? I
>mean will they require updating the service definitions, run/finish
>scripts, that sort of thing?

  (Sorry, I missed it indeed.)

  Nothing that drastic; service definitions will not change. They are the
core of s6, and such a change would warrant a lifetime version bump,
I think.

  No, the next release will be about a long overdue cleanup of s6-svscan
and s6-svscanctl invocations: command-line options will change, and
default semantics of signals received by s6-svscan will also change.

  The point is to remove some clutter that is either:
  - legacy behaviours from daemontools's svscan that have not shown they
were useful
  - ad-hoc stuff that was added to support early iterations of running
s6-svscan as pid 1, and that isn't used anymore (in part thanks to the
existence of s6-linux-init 1.x)
  - options that experience has shown to be traps (i.e. ideas that seem
good at first but really encourage bad patterns)

and make the s6-svscan/ctl system less confusing to new users by only
keeping a few relevant and useful configuration switches.
  There will also be QoL improvements to s6-svscan for people who like
to nest supervision trees; also, no promises, but I may come up with
some way to officially support user trees, which are a relatively
common request.

--
  Laurent


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-18 18:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16 18:01 s6-permafailon not acting as expected Xavier Stonestreet
2020-11-16 22:05 ` Laurent Bercot
2020-11-17 15:32   ` Xavier Stonestreet
2020-11-17 18:43     ` Xavier Stonestreet
2020-11-17 20:06       ` Xavier Stonestreet
2020-11-17 21:42       ` Laurent Bercot
2020-11-18 17:22         ` Xavier Stonestreet
2020-11-18 18:35           ` Laurent Bercot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).