supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* Generic interrupt command?
@ 2019-02-02  2:36 Steve Litt
  2019-02-02  9:07 ` Laurent Bercot
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Litt @ 2019-02-02  2:36 UTC (permalink / raw)
  To: supervision

Hi all,

I think a cool addition to runit program sv and s6's s6-svc would be a
command to send an arbitrary signal to the daemon being supervised.
Let's say a -z was added as an arg to s6-svc or a "genericinterrupt" was
added as an arg to sv. Now you could say:

sv genericinterrupt SIGIO myspecialdaemon

s6-svc -z SIGIO myspecialdaemon

The supervisor already knows the PID of what's being supervised, so it
would be an easy way to get an arbitrary signal into a daemon, for
those daemons that have non-standard signal usage.

SteveT

-- 

Steve Litt 
January 2019 featured book: Troubleshooting: Just the Facts
http://www.troubleshooters.com/tjust


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-02  2:36 Generic interrupt command? Steve Litt
@ 2019-02-02  9:07 ` Laurent Bercot
  2019-02-02 19:30   ` Steve Litt
  0 siblings, 1 reply; 13+ messages in thread
From: Laurent Bercot @ 2019-02-02  9:07 UTC (permalink / raw)
  To: supervision

>I think a cool addition to runit program sv and s6's s6-svc would be a
>command to send an arbitrary signal to the daemon being supervised.

Yes, that would be a nice feature. I've been thinking about it for
some time.
Unfortunately, that's not at all suited to the way the control
program communicates with the supervisor, and adding this feature,
as simple as it seems, would require significant work.

There is probably a (dirty, hackish) way to make it work with
normal signals (<128). But there's absolutely no way to ever make it
work with real-time signals or anything with a signal number over 128
without rewriting the supervisor state machine and making it more
complex and more brittle. Which is an instant nope from me.

Restricting the feature to normal signals would probably be enough,
but even then, I'm not comfortable with the level of hackiness it
would require. I don't think the feature is worth it; I'd rather add
more signals to the explicitly supported list.

--
Laurent



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-02  9:07 ` Laurent Bercot
@ 2019-02-02 19:30   ` Steve Litt
  2019-02-02 21:08     ` Colin Booth
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Litt @ 2019-02-02 19:30 UTC (permalink / raw)
  To: supervision

On Sat, 02 Feb 2019 09:07:31 +0000
"Laurent Bercot" <ska-supervision@skarnet.org> wrote:

> >I think a cool addition to runit program sv and s6's s6-svc would be
> >a command to send an arbitrary signal to the daemon being
> >supervised.  
> 
> Yes, that would be a nice feature. I've been thinking about it for
> some time.
> Unfortunately, that's not at all suited to the way the control
> program communicates with the supervisor, and adding this feature,
> as simple as it seems, would require significant work.
> 
> There is probably a (dirty, hackish) way to make it work with
> normal signals (<128). But there's absolutely no way to ever make it
> work with real-time signals or anything with a signal number over 128
> without rewriting the supervisor state machine and making it more
> complex and more brittle. Which is an instant nope from me.

Yes. If I liked complex and brittle, I'd just use systemd.

I wasn't aware there were interrupts higher than 128. When I perform
kill -L on my machine, the biggest number is 64.

> 
> Restricting the feature to normal signals would probably be enough,
> but even then, I'm not comfortable with the level of hackiness it
> would require. 

No problem, watch this (done in runit because I have no running s6 right
now):

====================================================
kill -s SIGKILL `sv status agetty-tty6 | \
  sed -e"s/.*(pid\s*//"    -e"s/).*//"
====================================================

So I can already get what I was asking for. What would make life a
little more convenient would be if sv had a "pid" command that would be
just like the "status" command except it prints only the PID. Then the
preceding command simplifies to:

====================================================
kill -s SIGKILL `sv pid agetty-tty6`
====================================================

I don't know if the s6-svc command already has the equivalent of a
hypothetical sv "pid" command, but if it doesn't, I imagine it would be
easy to put in and very helpful to those forging shellscripts.

By adding this little addition to s6-svc (and hopefully sv if Gerrit
can scrape together the time), no hackiness would be added to s6 or
runit: Any hackiness would be in the shellscript created by the
programmer using s6 or runit. 

SteveT
--
Steve Litt 
January 2019 featured book: Troubleshooting: Just the Facts
http://www.troubleshooters.com/tjust


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-02 19:30   ` Steve Litt
@ 2019-02-02 21:08     ` Colin Booth
  2019-02-02 21:40       ` Steve Litt
  2019-02-02 22:31       ` Jonathan de Boyne Pollard
  0 siblings, 2 replies; 13+ messages in thread
From: Colin Booth @ 2019-02-02 21:08 UTC (permalink / raw)
  To: supervision

On Sat, Feb 02, 2019 at 02:30:14PM -0500, Steve Litt wrote:
> On Sat, 02 Feb 2019 09:07:31 +0000
> "Laurent Bercot" <ska-supervision@skarnet.org> wrote:
> 
> ====================================================
> kill -s SIGKILL `sv pid agetty-tty6`
> ====================================================
> 
> I don't know if the s6-svc command already has the equivalent of a
> hypothetical sv "pid" command, but if it doesn't, I imagine it would be
> easy to put in and very helpful to those forging shellscripts.
> 
> By adding this little addition to s6-svc (and hopefully sv if Gerrit
> can scrape together the time), no hackiness would be added to s6 or
> runit: Any hackiness would be in the shellscript created by the
> programmer using s6 or runit. 
> 
As documented here: https://www.skarnet.org/software/s6/s6-svstat.html

s6-svstat -p /path/to/service | xargs kill SIGNAL

-- 
Colin Booth


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-02 21:08     ` Colin Booth
@ 2019-02-02 21:40       ` Steve Litt
  2019-02-05  3:09         ` John O'Meara
  2019-02-02 22:31       ` Jonathan de Boyne Pollard
  1 sibling, 1 reply; 13+ messages in thread
From: Steve Litt @ 2019-02-02 21:40 UTC (permalink / raw)
  To: supervision

On Sat, 2 Feb 2019 21:08:10 +0000
Colin Booth <colin@heliocat.net> wrote:

> On Sat, Feb 02, 2019 at 02:30:14PM -0500, Steve Litt wrote:
> > On Sat, 02 Feb 2019 09:07:31 +0000
> > "Laurent Bercot" <ska-supervision@skarnet.org> wrote:
> > 
> > ====================================================
> > kill -s SIGKILL `sv pid agetty-tty6`
> > ====================================================
> > 
> > I don't know if the s6-svc command already has the equivalent of a
> > hypothetical sv "pid" command, but if it doesn't, I imagine it
> > would be easy to put in and very helpful to those forging
> > shellscripts.
> > 
> > By adding this little addition to s6-svc (and hopefully sv if Gerrit
> > can scrape together the time), no hackiness would be added to s6 or
> > runit: Any hackiness would be in the shellscript created by the
> > programmer using s6 or runit. 
> >   
> As documented here: https://www.skarnet.org/software/s6/s6-svstat.html
> 
> s6-svstat -p /path/to/service | xargs kill SIGNAL
> 

Cool. That's all that's needed.

SteveT
-- 
Steve Litt 
January 2019 featured book: Troubleshooting: Just the Facts
http://www.troubleshooters.com/tjust


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-02 21:08     ` Colin Booth
  2019-02-02 21:40       ` Steve Litt
@ 2019-02-02 22:31       ` Jonathan de Boyne Pollard
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan de Boyne Pollard @ 2019-02-02 22:31 UTC (permalink / raw)
  To: Supervision

[-- Attachment #1: Type: text/plain, Size: 776 bytes --]

Colin Booth:

> As documented here: https://www.skarnet.org/software/s6/s6-svstat.html
>
> s6-svstat -p /path/to/service | xargs kill SIGNAL
>
You can thank Jos Backus for similar functionality in the nosh toolset 
since 2013, with program-readable output that one can use existing tools 
to pull arbitrary fields from.

    % svshow --json /var/sv/* 2>/dev/null | jq '."/var/sv/bcron-sched".MainPID'
    1861
    % svshow --json /var/sv/* 2>/dev/null | jq '."/var/sv/bcron-sched".RunTimestamp'
    4611686019976326000
    % svshow --json /var/sv/* 2>/dev/null | jq '."/var/sv/bcron-sched".DaemontoolsEncoreState'
    "running"
    % svshow --json /var/sv/* 2>/dev/null | jq '."/var/sv/bcron-sched"."Wanted-By"'
    [
       "/etc/service-bundles/targets/server"
    ]
    %


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-02 21:40       ` Steve Litt
@ 2019-02-05  3:09         ` John O'Meara
  2019-02-05  4:15           ` Roger Pate
  2019-02-05  7:20           ` Laurent Bercot
  0 siblings, 2 replies; 13+ messages in thread
From: John O'Meara @ 2019-02-05  3:09 UTC (permalink / raw)
  To: Steve Litt; +Cc: supervision

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

On Sat, Feb 2, 2019, 4:40 PM Steve Litt <slitt@troubleshooters.com> wrote:

> On Sat, 2 Feb 2019 21:08:10 +0000
> Colin Booth <colin@heliocat.net> wrote:
>
> > s6-svstat -p /path/to/service | xargs kill SIGNAL
> >
>
> Cool. That's all that's needed.
>
> SteveT
> --
>

Be careful, though. If the service is down, kill will use -1 for the PID,
and will probably signal everything in your system except PID 1.

-- 
John O'Meara

>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-05  3:09         ` John O'Meara
@ 2019-02-05  4:15           ` Roger Pate
  2019-02-05  7:20           ` Laurent Bercot
  1 sibling, 0 replies; 13+ messages in thread
From: Roger Pate @ 2019-02-05  4:15 UTC (permalink / raw)
  To: John O'Meara; +Cc: Steve Litt, supervision

On Mon, Feb 4, 2019 at 10:09 PM John O'Meara <john.fr.omeara@gmail.com> wrote:
> On Sat, Feb 2, 2019, 4:40 PM Steve Litt <slitt@troubleshooters.com> wrote:
>> On Sat, 2 Feb 2019 21:08:10 +0000 Colin Booth <colin@heliocat.net> wrote:
>>> s6-svstat -p /path/to/service | xargs kill SIGNAL
>>
>> Cool. That's all that's needed.
>
> Be careful, though. If the service is down, kill will use -1 for the PID,
> and will probably signal everything in your system except PID 1.

pid="$(s6-svstat -p /path/to/service)" && kill SIGNAL "$pid"
# avoid gratuitous xargs?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-05  3:09         ` John O'Meara
  2019-02-05  4:15           ` Roger Pate
@ 2019-02-05  7:20           ` Laurent Bercot
  2019-02-05 14:16             ` John O'Meara
  1 sibling, 1 reply; 13+ messages in thread
From: Laurent Bercot @ 2019-02-05  7:20 UTC (permalink / raw)
  To: supervision

>Be careful, though. If the service is down, kill will use -1 for the PID,
>and will probably signal everything in your system except PID 1.

  That's a good point. Should s6-svstat use 0 as the "service is down"
pid value instead, to avoid this ?

--
  Laurent



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-05  7:20           ` Laurent Bercot
@ 2019-02-05 14:16             ` John O'Meara
  2019-02-05 19:30               ` Laurent Bercot
  0 siblings, 1 reply; 13+ messages in thread
From: John O'Meara @ 2019-02-05 14:16 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: supervision

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

On Tue, Feb 5, 2019, 2:20 AM Laurent Bercot <ska-supervision@skarnet.org>
wrote:

> >Be careful, though. If the service is down, kill will use -1 for the PID,
> >and will probably signal everything in your system except PID 1.
>
>   That's a good point. Should s6-svstat use 0 as the "service is down"
> pid value instead, to avoid this ?
>

0 behaves better for this use case, but can still produce unexpected
behavior.

The construction "echo 0 | xargs kill -STOP" for example leaves behind a
paused background task that needs to be cleaned by hand.

The construction "kill -STOP $(echo 0)" hangs the terminal until someone
resumes the user's shell.

Most other "kill -whatever $(echo 0)" results in the shell exiting and the
user having to log back in.

So, 0 is a lot better than -1, but still not great.

Not outputting anything causes kill (on my system at least) to exit non 0
and give some diagnostic ("`' not a pid or valid pid spec", "you need to
specify whom to kill", or the usage message). That's nice, but would
probably break other scripting that expects a value, especially for
s6-svstat showing multiple fields.

I can't think of a safe and simple way to do this. For example, we could
suggest people do something like this (based on Roger Pate's post):

   pid=$(s6-svstat -p /my/service) && [ "$pid" -ne -1 ] && kill -SIGNAL $pid

but that's a lot of typing and requires that people see and remember the
suggestion, so not quite simple :-/

-- 
John O'Meara

>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-05 14:16             ` John O'Meara
@ 2019-02-05 19:30               ` Laurent Bercot
  2019-02-10  4:14                 ` John O'Meara
  0 siblings, 1 reply; 13+ messages in thread
From: Laurent Bercot @ 2019-02-05 19:30 UTC (permalink / raw)
  To: supervision

>Not outputting anything causes kill (on my system at least) to exit non 
>0

  Not outputting anything isn't an option, for the case where -o pid is
used in addition to other fields. The field number and order must be
respected.

  It's probably best to use some OOB indicator. How about NA, which I
already use for non-numeric fields? it makes kill correctly choke.
Would it be better to use NA in all the numeric fields, too?

--
  Laurent



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-05 19:30               ` Laurent Bercot
@ 2019-02-10  4:14                 ` John O'Meara
  2019-02-10 11:41                   ` Laurent Bercot
  0 siblings, 1 reply; 13+ messages in thread
From: John O'Meara @ 2019-02-10  4:14 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: supervision

[-- Attachment #1: Type: text/plain, Size: 1974 bytes --]

On Tue, Feb 5, 2019, 2:30 PM Laurent Bercot <ska-supervision@skarnet.org
wrote:

> >Not outputting anything causes kill (on my system at least) to exit non
> >0
>
>   Not outputting anything isn't an option, for the case where -o pid is
> used in addition to other fields. The field number and order must be
> respected.
>

Agreed; I didn't mean to suggest that as an option, I just wanted to be
thorough with testing.

  It's probably best to use some OOB indicator. How about NA, which I
> already use for non-numeric fields? it makes kill correctly choke.
> Would it be better to use NA in all the numeric fields, too?
>

That's a tough call. On the one hand, it makes simple constructs safer. On
the other, it adds complexity to interpreting the data programmatically (
the test / [ program errors for integer comparisons with text, and using
scanf() to pull in the values for libc style programs wouldn't be so simple
anymore).

Maybe adding a flag like -n as an output modifier to keep the relevant
output numeric when wanted? Then the default could be NA. Of course, that
adds complexity to the svstat program, it's interface and documentation. It
is also incompatible with existing programs that may be using svstat
already, requiring the flag for new versions of svstat and not the flag for
old versions.

Also, while thinking about this, I wonder the risk of signaling the wrong
program. When svc does it via supervise, it can know the right program gets
the signal because it handles the cleaning of the child PID. In a script,
there is a chance the child has exited and been replaced between the time
the PID was queried by svstat and the time the kill command gets executed.
I don't know how likely a new program might get the old PID in that time,
this receiving the signal intended for the original child. I think itis a
low number, but not zero. This line of thinking unfortunately brings us
back to the original post in this thread :-(

-- 
John O'Meara

>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic interrupt command?
  2019-02-10  4:14                 ` John O'Meara
@ 2019-02-10 11:41                   ` Laurent Bercot
  0 siblings, 0 replies; 13+ messages in thread
From: Laurent Bercot @ 2019-02-10 11:41 UTC (permalink / raw)
  To: supervision

>That's a tough call. On the one hand, it makes simple constructs safer. 
>On the other, it adds complexity to interpreting the data 
>programmatically ( the test / [ program errors for integer comparisons 
>with text, and using scanf() to pull in the values for libc style 
>programs wouldn't be so simple anymore).

  That was my thought process originally, but if it makes it riskier
or more annoying for programs to use the result of s6-svstat,
especially in scripts which are its likely users, I'm willing to
change that.


>Also, while thinking about this, I wonder the risk of signaling the 
>wrong program. When svc does it via supervise, it can know the right 
>program gets the signal because it handles the cleaning of the child 
>PID. In a script, there is a chance the child has exited and been 
>replaced between the time the PID was queried by svstat and the time 
>the kill command gets executed. I don't know how likely a new program 
>might get the old PID in that time, this receiving the signal intended 
>for the original child.

  Well that is one of the reasons for using a supervisor in the first
place. Only the parent of a process can reliably send signals to it.
Any time you're trying to signal a program and you're not a parent,
you are subject to that risk condition. The only 100% safe way is
using s6-svc, there's no changing that.

  So far the only real need to customize a signal has been for the
signal that brings a service down, which is now achieved via
./down-signal. I haven't been told of any real use case where
sending a non-supported signal, without intending to terminate the
service, was necessary.

--
  Laurent



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-02-10 11:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-02  2:36 Generic interrupt command? Steve Litt
2019-02-02  9:07 ` Laurent Bercot
2019-02-02 19:30   ` Steve Litt
2019-02-02 21:08     ` Colin Booth
2019-02-02 21:40       ` Steve Litt
2019-02-05  3:09         ` John O'Meara
2019-02-05  4:15           ` Roger Pate
2019-02-05  7:20           ` Laurent Bercot
2019-02-05 14:16             ` John O'Meara
2019-02-05 19:30               ` Laurent Bercot
2019-02-10  4:14                 ` John O'Meara
2019-02-10 11:41                   ` Laurent Bercot
2019-02-02 22:31       ` Jonathan de Boyne Pollard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).