Hello, I'm seeing unexpected results with s6-permafailon. I have this line in my finish script: s6-permafailon 120 2 1-255,SIGBUS,SIGSEGV exit 0 If I kill the service twice in a row with s6-svc -t, s6-permafailon triggers a permanent failure with the message: s6-permafailon: info: PERMANENT FAILURE triggered after 2 events involving exit code 0 in the last 120 seconds. The service did die with exit code 0, but s6-permafail is wrong, since exit code 0 is not specified in the filter. s6-svdt reports: 2020-11-16 16:00:17.451308493 exitcode 0 2020-11-16 16:00:28.495276767 exitcode 0 Looking at the source code in s6-permafailon.c, it seems to me that the codes[32] bitarray is not initialized with zeroes before calling bitarray_set(n). But not being intimately familiar with the code I'm probably looking at it incorrectly. And I'm not instrumented nor knowledgeable enough to run a debugger. Could you shed some light on this unexpected behavior and tell me what I'm doing wrong, as the case may be. Great supervision and service management toolbox by the way - I love it! :) Thanks, X. s6 2.9.2.0 skalibs 2.9.3.0 execline 2.6.1.1 built from source, linked with musl libc on Linux
>The service did die with exit code 0, but s6-permafail is wrong, since >exit code 0 is not specified in the filter. > >Looking at the source code in s6-permafailon.c, it seems to me that >the codes[32] bitarray is not initialized with zeroes before calling >bitarray_set(n). You're probably right that it is the issue! (codes was a static variable in the first iteration; static variables do not need to be 0-initialized. But then I reworked the code, made codes an automatic variable, and failed to initialize it to zero. And that is why relying on magic features is a bad thing, and I should stop doing this out of laziness.) I fixed it in the latest git of s6; if you prefer numbered releases, I'm working on a bigger release (with some interface changes and a major version bump) that should happen before the end of the year, but not right now ^^' Please try the fixed version. If you don't want to use the git head, you can apply the very simple diff: https://git.skarnet.org/cgi-bin/cgit.cgi/s6/diff/src/supervision/s6-permafailon.c If that version doesn't solve your problem, then I'll investigate further. Thanks for the report! -- Laurent
> Please try the fixed version. If you don't want to use the git head, > you can apply the very simple diff: Thanks for your quick reply and for the patch. It's working as it should. I'm glad I was lucky to stumble upon this randomly occurring bug. > I'm working on a bigger release (with some interface changes and a > major version bump) that should happen before the end of the year, but > not right now ^^' Sounds good. So what kind of interface changes should we expect? I mean will they require updating the service definitions, run/finish scripts, that sort of thing?
Hold on, I've now encountered the same issue for another service, but with signals :) In servicedir's finish: s6-permafailon 120 2 1-255,SIGBUS,SIGSEGV exit 0 svc -t servicedir svc -t servicedir 2020-11-17 19:20:24.465424531 s6-permafailon: info: PERMANENT FAILURE triggered after 2 events involving signal 15 in the last 120 seconds s6-svdt servicedir | s6-tai64nlocal 2020-11-17 19:19:30.449842687 signal SIGTERM 2020-11-17 19:20:24.419594225 signal SIGTERM Back to s6-permafailon.c: the sigset_t sigs is not initialized either... Looks like it needs a call to sigemptyset(). Hope this helps :)
Upon further investigation, it turns out my events filter is
incorrect, because exit codes greater than 128 imply that the service
has been killed by a signal (128 + signal number), so the exit code
range 1-255 actually means exit codes 1 to 127 or any signal.
The correct specification for my example is 1-127,SIGBUS,SIGSEGV.
I believe a call to sigemptyset() in s6-permafailon.c may still be
warranted however, because the POSIX spec states the results of
sigismember() are otherwise undefined.
On Tue, Nov 17, 2020 at 7:43 PM Xavier Stonestreet
<xstonestreet@gmail.com> wrote:
>
> Hold on, I've now encountered the same issue for another service, but
> with signals :)
>
> In servicedir's finish:
> s6-permafailon 120 2 1-255,SIGBUS,SIGSEGV exit 0
>
> svc -t servicedir
> svc -t servicedir
>
> 2020-11-17 19:20:24.465424531 s6-permafailon: info: PERMANENT FAILURE
> triggered after 2 events involving signal 15 in the last 120 seconds
>
> s6-svdt servicedir | s6-tai64nlocal
> 2020-11-17 19:19:30.449842687 signal SIGTERM
> 2020-11-17 19:20:24.419594225 signal SIGTERM
>
> Back to s6-permafailon.c: the sigset_t sigs is not initialized
> either... Looks like it needs a call to sigemptyset().
>
> Hope this helps :)
>Back to s6-permafailon.c: the sigset_t sigs is not initialized
>either... Looks like it needs a call to sigemptyset().
Gah! -_- Fixed. Thanks.
--
Laurent
In case you missed it, I'm reposting my question. If you're too busy
to respond, no worries, I'll understand. Thanks.
> I'm working on a bigger release (with some interface changes and a
> major version bump) that should happen before the end of the year, but
> not right now ^^'
Sounds good. So what kind of interface changes should we expect? I
mean will they require updating the service definitions, run/finish
scripts, that sort of thing?
>Sounds good. So what kind of interface changes should we expect? I
>mean will they require updating the service definitions, run/finish
>scripts, that sort of thing?
(Sorry, I missed it indeed.)
Nothing that drastic; service definitions will not change. They are the
core of s6, and such a change would warrant a lifetime version bump,
I think.
No, the next release will be about a long overdue cleanup of s6-svscan
and s6-svscanctl invocations: command-line options will change, and
default semantics of signals received by s6-svscan will also change.
The point is to remove some clutter that is either:
- legacy behaviours from daemontools's svscan that have not shown they
were useful
- ad-hoc stuff that was added to support early iterations of running
s6-svscan as pid 1, and that isn't used anymore (in part thanks to the
existence of s6-linux-init 1.x)
- options that experience has shown to be traps (i.e. ideas that seem
good at first but really encourage bad patterns)
and make the s6-svscan/ctl system less confusing to new users by only
keeping a few relevant and useful configuration switches.
There will also be QoL improvements to s6-svscan for people who like
to nest supervision trees; also, no promises, but I may come up with
some way to officially support user trees, which are a relatively
common request.
--
Laurent