How to kill runsv, no matter what?

supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed

* How to kill runsv, no matter what?
@ 2007-02-21 20:14 Daniel Clark
  2007-02-21 21:04 ` Daniel Clark
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Clark @ 2007-02-21 20:14 UTC (permalink / raw)
  To: supervision

I'm integrating runit support into bcfg2 [1], both as something bcfg2
can control, and as an encap [2] package [3]. I'm replacing
daemontools, as djb's annoying redistribution policies wouldn't allow
me to distribute Xen images or LiveCDs (as I patched daemontools since
there hasn't been a release in a very long time, but there have been
bugs).

I'm trying to get to a state where I can add and remove the runit
package without leaving any state behind (I'm using it for runsvdir,
not as an init replacement). When the package is removed, all of the
runit services should stop, and state about what services were started
is saved somewhere; on reinstall, that state should be reintroduced,
and any runit services should be restarted.

In theory this should be pretty trivial (assuming I am RTFMing
correctly); I think something like this in the removal stage:

test -d /usr/local/var/service/.disabled || mkdir
/usr/local/var/service/.disabled
mv /usr/local/var/service/* /usr/local/var/service/.disabled/ 2>/dev/null \
    || printf "No services to disable.\n"
printf "Waiting 7 seconds for runsv processes to die...\n"
sleep 7
# ... (Code that stops runsvdir) ...
for service in `ls /usr/local/etc/sv`; do
    test -d /usr/local/etc/sv/$service/supervise \
        && rm -rf /usr/local/etc/sv/$service/supervise
    test -d /usr/local/etc/sv/$service/log/supervise \
        && rm -rf /usr/local/etc/sv/$service/log/supervise
done

However in practice there are some services that continue to have a
"runsv" process even after I remove them from the directory "runsvdir"
is monitoring and wait >5 seconds. Below is an example of such a
service that refuses to die. With daemontools I had a script called
svrm that did this (below), but the same idiom doesn't seem to work
with runit/runsvdir. Am I doing something wrong, or is this a bug?

----------------------------------------------------------------------

root@pawn:/usr/local/etc/sv# cat bcfg2-client/run
#!/bin/sh
exec 2>&1
printf "*** exec /usr/local/bin/chpst -e
/usr/local/etc/default/bcfg2-client/env ./bcfg2-client.sh ...\n"
exec /usr/local/bin/chpst -e /usr/local/etc/default/bcfg2-client/env
./bcfg2-client.sh

----------------------------------------------------------------------

root@pawn:/usr/local/etc/sv# cat bcfg2-client/bcfg2-client.sh
#!/bin/sh

# note: variables provided from environment with chpst -e:
#       /usr/local/etc/default/bcfg2-client/env/OPTIONS
#       /usr/local/etc/default/bcfg2-client/env/RUN_INTERVAL_SECONDS

ENVDIR="/usr/local/etc/default/bcfg2-client/env"

# make sure we have options
if [ ! -f ${ENVDIR}/OPTIONS ]; then
    printf "WARNING: ${ENVDIR}/OPTIONS\n"
    printf "WARNING: does not exist. Using default of \"-q -v -d -n\"\n"
    OPTIONS="-q -v -d -n"
fi

# make sure we have a sleep variable
if [ "${RUN_INTERVAL_SECONDS}x" = "x" ]; then
    printf "WARNING: ${ENVDIR}/RUN_INTERVAL_SECONDS\n"
    printf "WARNING: does not exist or has no value.\n"
    printf "WARNING: Using default of 3600 seconds between runs.\n"
    RUN_INTERVAL_SECONDS=3600
fi

# loop forever
while :
do
    printf "*** starting /usr/local/bin/bcfg2 ${OPTIONS} ...\n"
    /usr/local/bin/bcfg2 ${OPTIONS}
    printf "*** sleeping ${RUN_INTERVAL_SECONDS} seconds ...\n"
    sleep ${RUN_INTERVAL_SECONDS}
done

exit 0

----------------------------------------------------------------------

<include_file name="bin/svrm" mode="0755"><![CDATA[
#!/bin/sh
# Remove a daemontools service
PATH=/command:$PATH
export PATH
if [ "${1}x" = "x" -o "${2}x" != "x" ]; then
    printf "Usage: svrm [SERVICE]\n"
    exit 1
fi
SERVICE="`basename ${1}`"
if [ ! -h "/service/$SERVICE" -a ! -f "/service/$SERVICE" ]; then
    printf "Service \"${SERVICE}\" not installed. Installed services:\n"
    svstat /service/*
    exit 1
else
    cd /service/$SERVICE
    REALDIR=`pwd -P`
    rm /service/$SERVICE
    svc -dx . log
    sleep 1
    test -f ${REALDIR}/supervise/status && rm ${REALDIR}/supervise/status
    test -d ${REALDIR}/supervise && rm -rf ${REALDIR}/supervise
    test -f ${REALDIR}/log/supervise/status && rm
${REALDIR}/log/supervise/status
    test -d ${REALDIR}/log/supervise && rm -rf ${REALDIR}/log/supervise
fi
exit 0
]]></include_file>

----------------------------------------------------------------------

[1] http://www.bcfg2.org
[2] http://www.encap.org
[3] http://www.bcfg2.org/browser/trunk/bcfg2/encap/src/encap-profiles/runit-1.7.2.ep

Thanks for any help,
-- 
Daniel Clark # http://dclark.us # http://opensysadmin.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-21 20:14 How to kill runsv, no matter what? Daniel Clark
@ 2007-02-21 21:04 ` Daniel Clark
  2007-02-23  3:51   ` Daniel Clark
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Clark @ 2007-02-21 21:04 UTC (permalink / raw)
  To: supervision

On 2/21/07, Daniel Clark <dclark@pobox.com> wrote:
> I'm trying to get to a state where I can add and remove the runit
> package without leaving any state behind (I'm using it for runsvdir,
> not as an init replacement). When the package is removed, all of the
> runit services should stop, and state about what services were started
> is saved somewhere; on reinstall, that state should be reintroduced,
> and any runit services should be restarted.
>
> In theory this should be pretty trivial (assuming I am RTFMing
> correctly); I think something like this in the removal stage:
>
> test -d /usr/local/var/service/.disabled || mkdir
> /usr/local/var/service/.disabled
> mv /usr/local/var/service/* /usr/local/var/service/.disabled/ 2>/dev/null \
>     || printf "No services to disable.\n"
> printf "Waiting 7 seconds for runsv processes to die...\n"
> sleep 7
> # ... (Code that stops runsvdir) ...
> for service in `ls /usr/local/etc/sv`; do
>     test -d /usr/local/etc/sv/$service/supervise \
>         && rm -rf /usr/local/etc/sv/$service/supervise
>     test -d /usr/local/etc/sv/$service/log/supervise \
>         && rm -rf /usr/local/etc/sv/$service/log/supervise
> done

I happened upon an earlier mailing list thread, "sv exit doesn't seem
to work properly" [1], and changed the code to stop runsvdir and then
do a "sv exit" on each service [2], however it didn't help at all.

For simplification the basic question is why "sv exit" doesn't stop
runsv and runsv's associated processes with this particular service,
bcfg2-client; if that can be fixed, then the rest of the problem is
also solved.

The run code for bcfg2-client was in my previous email, or you can see
it here [3] and the script that it calls is here [4].

[1] Re: sv exit doesn't seem to work properly
http://article.gmane.org/gmane.comp.sysutils.supervision.general/1259

[2] runit-1.7.2.ep: preremove script
http://www.bcfg2.org/browser/branches/feature/runit/encap/src/encap-profiles/runit-1.7.2.ep#L182

[3] bcfg2-client run script
http://www.bcfg2.org/browser/branches/feature/runit/encap/src/encap-profiles/bcfg2-0.9.2.ep#L409

[4] Script that the bcfg2-client run script kicks off
http://www.bcfg2.org/browser/branches/feature/runit/encap/src/encap-profiles/bcfg2-0.9.2.ep#L373


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-21 21:04 ` Daniel Clark
@ 2007-02-23  3:51   ` Daniel Clark
  2007-02-23 12:02     ` Laurent Bercot
  2007-02-23 14:05     ` Gerrit Pape
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Clark @ 2007-02-23  3:51 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 1656 bytes --]

I made a simple test case that should make this bug (or my error in
using the software) easy to reproduce. I'm attaching it since it is so
tiny; it is also available from
http://opensysadmin.com/bugs/runit/test1-service.tar.bz2

Below is a transcript of using it to demonstrate the problem:

root@cmlab:/tmp# tar xfj test1-service.tar.bz2
root@cmlab:/tmp# cd test1-service/
root@cmlab:/tmp/test1-service# ./runsvdir-here
^C
root@cmlab:/tmp/test1-service# ps auxw | grep [s]v
root     19882  0.0  0.0   2516   348 ?        Ss   22:28   0:00 runsv
test1-service
root     19883  0.0  0.0   2656   368 ?        S    22:28   0:00
/usr/local/bin/svlogd -tt ./logs
root     19884  0.0  0.0  10060  1408 ?        S    22:28   0:00
/bin/sh ./test1-sv.sh
root@cmlab:/tmp/test1-service# sv exit
/tmp/test1-service/var-service/test1-service
root@cmlab:/tmp/test1-service# sleep 7
root@cmlab:/tmp/test1-service# ps auxw | grep [s]v
root     19882  0.0  0.0   2516   348 ?        Ss   22:28   0:00 runsv
test1-service
root     19883  0.0  0.0   2656   368 ?        S    22:28   0:00
/usr/local/bin/svlogd -tt ./logs
root@cmlab:/tmp/test1-service# rm var-service/test1-service
root@cmlab:/tmp/test1-service# sleep 7
root@cmlab:/tmp/test1-service# ps auxw | grep [s]v
root     19882  0.0  0.0   2516   348 ?        Ss   22:28   0:00 runsv
test1-service
root     19883  0.0  0.0   2656   368 ?        S    22:28   0:00
/usr/local/bin/svlogd -tt ./logs

(I would think runsv and svlogd should not be showing up here, because
runsvdir is no longer running, sv exit has been called, and the run
director has been removed, with >5 second pauses between the removal
and the ps)

[-- Attachment #2: test1-service.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23  3:51   ` Daniel Clark
@ 2007-02-23 12:02     ` Laurent Bercot
  2007-02-23 14:05     ` Gerrit Pape
  1 sibling, 0 replies; 14+ messages in thread
From: Laurent Bercot @ 2007-02-23 12:02 UTC (permalink / raw)
  To: supervision

> I made a simple test case that should make this bug (or my error in
> using the software) easy to reproduce. I'm attaching it since it is so
> tiny; it is also available from
> http://opensysadmin.com/bugs/runit/test1-service.tar.bz2

 Please try not to send binaries to the list... if it's so tiny, then
some attached text files could do - and if it's not, well, you did the
right thing anyway (i.e. make the tarball available on the Web) so
there's no point in sending the binary to the list...

 Thank you. ;)

-- 
 Laurent


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23  3:51   ` Daniel Clark
  2007-02-23 12:02     ` Laurent Bercot
@ 2007-02-23 14:05     ` Gerrit Pape
  2007-02-23 14:24       ` Alex Efros
  2007-02-23 17:32       ` Daniel Clark
  1 sibling, 2 replies; 14+ messages in thread
From: Gerrit Pape @ 2007-02-23 14:05 UTC (permalink / raw)
  To: supervision

On Thu, Feb 22, 2007 at 10:51:50PM -0500, Daniel Clark wrote:
> I made a simple test case that should make this bug (or my error in
> using the software) easy to reproduce. I'm attaching it since it is so
> tiny; it is also available from
> http://opensysadmin.com/bugs/runit/test1-service.tar.bz2
> 
> Below is a transcript of using it to demonstrate the problem:
> 
> root@cmlab:/tmp# tar xfj test1-service.tar.bz2
> root@cmlab:/tmp# cd test1-service/
> root@cmlab:/tmp/test1-service# ./runsvdir-here
> ^C
> root@cmlab:/tmp/test1-service# ps auxw | grep [s]v
> root     19882  0.0  0.0   2516   348 ?        Ss   22:28   0:00 runsv
> test1-service
> root     19883  0.0  0.0   2656   368 ?        S    22:28   0:00
> /usr/local/bin/svlogd -tt ./logs
> root     19884  0.0  0.0  10060  1408 ?        S    22:28   0:00
> /bin/sh ./test1-sv.sh
> root@cmlab:/tmp/test1-service# sv exit
> /tmp/test1-service/var-service/test1-service
> root@cmlab:/tmp/test1-service# sleep 7
> root@cmlab:/tmp/test1-service# ps auxw | grep [s]v
> root     19882  0.0  0.0   2516   348 ?        Ss   22:28   0:00 runsv
> test1-service
> root     19883  0.0  0.0   2656   368 ?        S    22:28   0:00
> /usr/local/bin/svlogd -tt ./logs
> root@cmlab:/tmp/test1-service# rm var-service/test1-service
> root@cmlab:/tmp/test1-service# sleep 7
> root@cmlab:/tmp/test1-service# ps auxw | grep [s]v
> root     19882  0.0  0.0   2516   348 ?        Ss   22:28   0:00 runsv
> test1-service
> root     19883  0.0  0.0   2656   368 ?        S    22:28   0:00
> /usr/local/bin/svlogd -tt ./logs
> 
> (I would think runsv and svlogd should not be showing up here, because
> runsvdir is no longer running, sv exit has been called, and the run
> director has been removed, with >5 second pauses between the removal
> and the ps)

When asked to exit, the runsv supervisor makes sure that all logs are
written to the log service before terminating; it first sends TERM to
the main service, then waits for it to terminate, and finally waits for
the log service to terminate, before runsv exits itself.

In the case of your example service, the main run script execs into a
shell script that starts a 'sleep' subprocess.  Now when runsv is told
to exit, it sends the service (the ./test1-sv.sh shell script) a TERM
signal, the shell script terminates (fine), but is leaving behind the
'sleep' subprocess.  The log service's run script execs into a svlogd
process, svlogd will terminate as soon as it sees end-of-file on the
pipe connected to its standard input.  Because there's still the 'sleep'
subprocess running with its output connected to the pipe, and so to
svlogd's standard input, svlogd will wait; it might well be that there's
still data available on the pipe to be written to the logs.  Once the
'sleep' subprocess exits, runsv should exit too.

HTH, Gerrit.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 14:05     ` Gerrit Pape
@ 2007-02-23 14:24       ` Alex Efros
  2007-02-23 17:40         ` Daniel Clark
  2007-02-23 17:32       ` Daniel Clark
  1 sibling, 1 reply; 14+ messages in thread
From: Alex Efros @ 2007-02-23 14:24 UTC (permalink / raw)
  To: supervision

Hi!

On Fri, Feb 23, 2007 at 02:05:03PM +0000, Gerrit Pape wrote:
> to exit, it sends the service (the ./test1-sv.sh shell script) a TERM
> signal, the shell script terminates (fine), but is leaving behind the

There one another similar issue: if service run interactive bash
(getty-like services) then it also will not stop.

    # sv t getty1

send SIGTERM while bash require SIGHUP or SIGKILL instead of SIGTERM.
Moreover, if you run mc - it will run it's own bash which also should
be killed to restart getty service... and same is true for things like su.
To solve this I create script /usr/local/bin/term-getty-service:

---cut---
#!/bin/bash
bashs() { while [[ -n "$1" ]]; do pgrep -P $1 bash; bashs $(pgrep -P $1); shift; done; }
bashs="$( bashs $(<supervise/pid) )"
[[ -n "$bashs" ]] && kill -HUP $bashs
exit 1 # runsv must send TERM to getty if user don't logged in this console
---cut---

You should create symlink to it from service's ./control/t: 

    # ln -s /usr/local/bin/term-getty-service \
	/var/service/getty-tty1/control/t

-- 
			WBR, Alex.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 14:05     ` Gerrit Pape
  2007-02-23 14:24       ` Alex Efros
@ 2007-02-23 17:32       ` Daniel Clark
  2007-02-23 17:39         ` Paul Jarc
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Clark @ 2007-02-23 17:32 UTC (permalink / raw)
  To: supervision

On 2/23/07, Gerrit Pape <pape@smarden.org> wrote:
> On Thu, Feb 22, 2007 at 10:51:50PM -0500, Daniel Clark wrote:
> > I made a simple test case that should make this bug (or my error in
> > using the software) easy to reproduce. I'm attaching it since it is so
> > tiny; it is also available from
> > http://opensysadmin.com/bugs/runit/test1-service.tar.bz2
> >
> When asked to exit, the runsv supervisor makes sure that all logs are
> written to the log service before terminating; it first sends TERM to
> the main service, then waits for it to terminate, and finally waits for
> the log service to terminate, before runsv exits itself.
>
> In the case of your example service, the main run script execs into a
> shell script that starts a 'sleep' subprocess.  Now when runsv is told
> to exit, it sends the service (the ./test1-sv.sh shell script) a TERM
> signal, the shell script terminates (fine), but is leaving behind the
> 'sleep' subprocess.  The log service's run script execs into a svlogd
> process, svlogd will terminate as soon as it sees end-of-file on the
> pipe connected to its standard input.  Because there's still the 'sleep'
> subprocess running with its output connected to the pipe, and so to
> svlogd's standard input, svlogd will wait; it might well be that there's
> still data available on the pipe to be written to the logs.  Once the
> 'sleep' subprocess exits, runsv should exit too.

Ah, that makes a lot of sense. However I'm not seeing how this
behavior can mesh with package management systems. e.g.:

(a) I install an "runit" package, which starts up a runsvdir process
(b) I link some services into my runscvdir /var/service directory; I
can't really control if those processes start child processes in many
cases; let's say there is a service like my example service among the
services (in practice, I'm guessing there is probably some way I can
get my shell script to capture TERM and kill the 'sleep' process
before exiting itself)
(d) I remove the "runit" package. Since I am no longer going to have
"runit" installed, I think it follows that all "runit" processes, such
as svlogd, need to be gracefully shut down, no matter what their
state.
(e) Runit is removed, but there are some svlogd processes still
around, and therefore also still some files tracking runit state in my
/etc/sv directory
(f) I install Runit again.
(g) I want to re-enable my service, so I again link the service into
my /var/service directory. However since there is still a svlogd
process running (or I killed it manually), there is still lingering
state information in /etc/sv, so runit is confused and complains.

So I guess my question is, is there any way to handle the
install-remove-install case cleanly with runit?

In practice this may not be an issue, but I'm running into it all the
time in testing. The previously running svlogd causes failure in 2
ways: (a) the state in /etc/sv/servicename confuses runit, and (b) it
wants to write to the same log file as any new svlogd daemons that
start up.

Actually, wouldn't this also be a problem if I just wanted to
force-restart a service that spawns child processes? If the service is
restarted but the old logging daemon doesn't get force-killed, don't I
run into the same situation as with the install-remove-install (2
conflicting svlogd processes)?

Thanks,
-- 
Daniel Clark # http://dclark.us # http://opensysadmin.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 17:32       ` Daniel Clark
@ 2007-02-23 17:39         ` Paul Jarc
  2007-02-23 17:46           ` Daniel Clark
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Jarc @ 2007-02-23 17:39 UTC (permalink / raw)
  To: Daniel Clark; +Cc: supervision

"Daniel Clark" <dclark@pobox.com> wrote:
> I can't really control if those processes start child processes in
> many cases

It's fine if they start child processes, but if they don't clean up
their children when exiting, that's a bug in those services.


paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 14:24       ` Alex Efros
@ 2007-02-23 17:40         ` Daniel Clark
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Clark @ 2007-02-23 17:40 UTC (permalink / raw)
  To: supervision

On 2/23/07, Alex Efros <powerman@powerman.asdfgroup.com> wrote:
> Hi!
>
> On Fri, Feb 23, 2007 at 02:05:03PM +0000, Gerrit Pape wrote:
> > to exit, it sends the service (the ./test1-sv.sh shell script) a TERM
> > signal, the shell script terminates (fine), but is leaving behind the
>
> There one another similar issue: if service run interactive bash
> (getty-like services) then it also will not stop.
>
>     # sv t getty1
>
> send SIGTERM while bash require SIGHUP or SIGKILL instead of SIGTERM.
> Moreover, if you run mc - it will run it's own bash which also should
> be killed to restart getty service... and same is true for things like su.
> To solve this I create script /usr/local/bin/term-getty-service:
>
> ---cut---
> #!/bin/bash
> bashs() { while [[ -n "$1" ]]; do pgrep -P $1 bash; bashs $(pgrep -P $1); shift; done; }
> bashs="$( bashs $(<supervise/pid) )"
> [[ -n "$bashs" ]] && kill -HUP $bashs
> exit 1 # runsv must send TERM to getty if user don't logged in this console
> ---cut---
>
> You should create symlink to it from service's ./control/t:
>
>     # ln -s /usr/local/bin/term-getty-service \
>         /var/service/getty-tty1/control/t

That looks very inventive (and dense :-), but not very cross-platform,
which is the primary reason I am interested in runit (e.g. I want to
maintain non-vendor services on AIX, GNU/Linux, Solaris, *BSD etc. in
the same way -- many of these systems don't come standard with bash).

Perhaps a "kill with extreme prejudice" type flag implemented in the
runit code itself is in order? I really like to have commands
available that are deterministic (e.g. if I tell sv to kill something
with this flag, it dies, don't pass go, don't collect $200)

-- 
Daniel Clark # http://dclark.us # http://opensysadmin.com


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 17:39         ` Paul Jarc
@ 2007-02-23 17:46           ` Daniel Clark
  2007-02-23 17:59             ` Paul Jarc
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Clark @ 2007-02-23 17:46 UTC (permalink / raw)
  To: supervision

On 2/23/07, Paul Jarc <prj@po.cwru.edu> wrote:
> "Daniel Clark" <dclark@pobox.com> wrote:
> > I can't really control if those processes start child processes in
> > many cases
>
> It's fine if they start child processes, but if they don't clean up
> their children when exiting, that's a bug in those services.

I don't know enough about services to know if that is correct - Alex
seems to have a counterexample - but the original daemontools seems to
work with services with this "bug", and both daemontools and (I think)
runit have a suite of tools to hack around issues with services that
aren't designed to work with the supervision model of service control
(e.g. the thing that forces processes to stay in the foreground).

Actually, perhaps that would be the best way to deal with this - some
small binary that can be used instead of exec in "run" scripts that
has the property of killing all of its child processes when it dies -
would something like that be feasible?

-- 
Daniel Clark # http://dclark.us # http://opensysadmin.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 17:46           ` Daniel Clark
@ 2007-02-23 17:59             ` Paul Jarc
  2007-02-23 18:25               ` Daniel Clark
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Jarc @ 2007-02-23 17:59 UTC (permalink / raw)
  To: Daniel Clark; +Cc: supervision

"Daniel Clark" <dclark@pobox.com> wrote:
> the original daemontools seems to work with services with this "bug"

Yes, but only because it makes no attempt to ensure that log data is
written before the logger is shut down.

> and both daemontools and (I think) runit have a suite of tools to
> hack around issues with services that aren't designed to work with
> the supervision model of service control (e.g. the thing that forces
> processes to stay in the foreground).

That's true, but those are just the easy workarounds, and they're
imperfect.  (E.g., they don't relay signals.)  Reliably handling log
data and simultaneously working around services that leave stray child
processes is a hard problem, and the easiest solution known so far is
to fix each individual service.

> Actually, perhaps that would be the best way to deal with this - some
> small binary that can be used instead of exec in "run" scripts that
> has the property of killing all of its child processes when it dies -
> would something like that be feasible?

That could work for some cases (but, like pgrphack et al., it would be
sandwiched between exec and the real service, not used in place of
exec).  It would have to initially put itself in its own process
group, relay SIGTERM to every process in that process group, and relay
other signals to its immediate child.  But this won't help if the
service or its children put themselves in their own process group.
Also, SIGKILL and SIGSTOP can't be relayed, so you lose functionality
there too.  So fixing the service still remains an attractive option.

paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 17:59             ` Paul Jarc
@ 2007-02-23 18:25               ` Daniel Clark
  2007-02-23 18:32                 ` Paul Jarc
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Clark @ 2007-02-23 18:25 UTC (permalink / raw)
  To: supervision

On 2/23/07, Paul Jarc <prj@po.cwru.edu> wrote:
> "Daniel Clark" <dclark@pobox.com> wrote:
> > the original daemontools seems to work with services with this "bug"
>
> Yes, but only because it makes no attempt to ensure that log data is
> written before the logger is shut down.
>
> > and both daemontools and (I think) runit have a suite of tools to
> > hack around issues with services that aren't designed to work with
> > the supervision model of service control (e.g. the thing that forces
> > processes to stay in the foreground).
>
> That's true, but those are just the easy workarounds, and they're
> imperfect.  (E.g., they don't relay signals.)  Reliably handling log
> data and simultaneously working around services that leave stray child
> processes is a hard problem, and the easiest solution known so far is
> to fix each individual service.

Okay, so let's assume we have a service that does not have this "bug",
but that is running and shouldn't be force killed (e.g. we want to
wait until sleep times out, or until some non-atomic process is
complete). Is there any way to block until that happens? When "sv
exit" returns with a 0 exit code and no text, I tend to think that it
was actually successful in killing all of the processes associated
with a service; ditto for using rm to remove a service link. I think
this is what the "principle of least surprise" would dictate as well.

I guess what I dislike most about the current behavior is that the
dangling runsv/svlogd processes seem to have no connection to anything
any more - you've removed the /var/services/servicename link (and
perhaps the /etc/sv/servicename directory as well), and you have these
zombie-like background processes running, for which there is no longer
any (obvious to me) way to get information on with the runit tools; so
if you want to make sure you can reinstall a service cleanly, or
remove and then reinstall runit, you have to grep through the output
of ps, which is exactly the kind of thing that the supervision scheme
was created to avoid.

-- 
Daniel Clark # http://dclark.us # http://opensysadmin.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 18:25               ` Daniel Clark
@ 2007-02-23 18:32                 ` Paul Jarc
  2007-02-28 23:24                   ` Daniel Clark
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Jarc @ 2007-02-23 18:32 UTC (permalink / raw)
  To: Daniel Clark; +Cc: supervision

"Daniel Clark" <dclark@pobox.com> wrote:
> Okay, so let's assume we have a service that does not have this "bug",
> but that is running and shouldn't be force killed (e.g. we want to
> wait until sleep times out, or until some non-atomic process is
> complete). Is there any way to block until that happens?

sv -v
http://smarden.org/runit/sv.8.html


paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to kill runsv, no matter what?
  2007-02-23 18:32                 ` Paul Jarc
@ 2007-02-28 23:24                   ` Daniel Clark
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Clark @ 2007-02-28 23:24 UTC (permalink / raw)
  To: supervision

On 2/23/07, Paul Jarc <prj@po.cwru.edu> wrote:
> "Daniel Clark" <dclark@pobox.com> wrote:
> > Okay, so let's assume we have a service that does not have this "bug",
> > but that is running and shouldn't be force killed (e.g. we want to
> > wait until sleep times out, or until some non-atomic process is
> > complete). Is there any way to block until that happens?
>
> sv -v
> http://smarden.org/runit/sv.8.html

Thanks; I now have a package of runit that I can
install/uninstall/reinstall consistently without leaving anything
behind. It uses a combination of sv -v  (to avoid the problem) on
package remove, and a kill pipeline (not yet tested on *nix other than
GNU/Linux) on install. Sort of ugly, but it works.

If anyone else uses encap, the package is up at:

http://tinyurl.com/2nrdx7

It works for running runit's runsvdir under inittab or upstart control.

-- 
Daniel Clark # http://dclark.us # http://opensysadmin.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-02-28 23:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-21 20:14 How to kill runsv, no matter what? Daniel Clark
2007-02-21 21:04 ` Daniel Clark
2007-02-23  3:51   ` Daniel Clark
2007-02-23 12:02     ` Laurent Bercot
2007-02-23 14:05     ` Gerrit Pape
2007-02-23 14:24       ` Alex Efros
2007-02-23 17:40         ` Daniel Clark
2007-02-23 17:32       ` Daniel Clark
2007-02-23 17:39         ` Paul Jarc
2007-02-23 17:46           ` Daniel Clark
2007-02-23 17:59             ` Paul Jarc
2007-02-23 18:25               ` Daniel Clark
2007-02-23 18:32                 ` Paul Jarc
2007-02-28 23:24                   ` Daniel Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).