From: Mike Buland <mike@geekgene.com>
To: supervision@list.skarnet.org
Subject: Re: sv term handling with a slow child
Date: Wed, 16 Jan 2008 16:04:45 -0700 [thread overview]
Message-ID: <200801161604.45554.mike@geekgene.com> (raw)
In-Reply-To: <200801161441.29193.rwoodrum@avvo.com>
Hi
I went ahead and ran a few tests, including your ruby script. I can't
apparently repreduce the behaviour you describe.
On linux (and POSIX systems) there is a default signal handler for many of the
signals. The terminate signal normally ends the process. At least in my
tests the ruby program is indeed terminated, the process ends, and the status
in runit is set to 'd' or down. It is set to down, but the program is gone.
When I wrote my own test in C:
----
#include <stdlib.h>
int main()
{
sleep( 50000 );
}
----
to test the behaviour of TERM everything works as expected. No term signal
handler is registered, sending the program a term on the command line
(kill -15 $pid) terminates the program. Then I tried ignoring term:
----
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
int main()
{
signal( 15, SIG_IGN );
sleep( 50000 );
}
----
And the program kept running. Testing both of these programs with runit gave
the expected results. The program using the default signal handler exited as
soon as runit sent it term, and the status of the service was set
accordingly, for the second program term was ignored and runit went
into "want down, got TERM" state.
On your system, are you 100% sure that the ruby test program you're using
isn't just exiting appropriately? I can't find anything that mimics the
described bahaviour. I.E. runit is behaving the way you describe, but the
process does end.
--Mike
On Wednesday 16 January 2008 03:41:29 pm Ryan Woodrum wrote:
> Hello!
>
> I believe I have found a possible bug/oddity in the behavior of sv
> using runsv. I happened upon this particular scenario in a test
> environment, but was actually able to repro it in my production
> environment as well as in a primitive case. The issue involves slow
> children or children whose TERM handler isn't registered soon enough.
>
> Here's the setup:
> I create a simplistic base service configuration under which I will
> run a ruby application. The ruby app looks like so:
> slow_signal.rb
> ---
> sleep(10)
>
> puts "registering term handler..."
> trap("TERM") do
> puts "got term"
> exit
> end
>
> while(true) do
> puts "looping and sleeping..."
> sleep 2
> end
> ---
>
> I run this under my run svdir with:
> #!/bin/sh
> exec 2>&1
> exec /usr/bin/ruby /home/rwoodrum/tmp/slow_signal.rb
>
>
> The premise of the primitive ruby application is to emulate a slow-ish
> loading base of code that has a term handler registered early in the
> life of the process.
>
> If I invoke:
> /etc/init.d/slow_signal start
>
> followed within the 10 second sleep period by:
> /etc/init.d/slow_signal stop
>
> (/etc/init.d/slow_signal is a symlink to /usr/bin/sv)
>
> The process does not handle the signal but its state is set to 'd';
> down. In subsequent calls to control() within sv.c, it will no longer
> write to the pipe because it thinks there is no need. With no further
> writes to the pipe, another TERM will never get sent and so the
> process cannot be shut down via sv/runsv, at least not with TERM.
>
> It took me awhile to learn how everything was work and to track down
> just where this check was happening. The source I worked against was
> the source available via the debian package v1.8.0 (`apt-get source runit`
> under debian sid). (I looked for a repo but did not find a public
> one.)
>
> Two solutions I can think of are not to set svstatus[17] unless you're
> sure the process actually went down, but this is more complicated
> (perhaps more correct?) than a second solution. Inside of control() in
> sv.c, a modification to always send a TERM can be made like so:
> -----
> 247c247,248
> < if (svstatus[17] == *a) return(0);
> ---
>
> > /* Write a TERM to the pipe even if we already have. Slow TERM
> > handler perhaps? What about other cases?*/
> > if (svstatus[17] == *a && *a != 'd') return(0);
>
> -----
>
> In this case, we simply decide that, if we want to issue a TERM via sv
> stop, down etc., we will go ahead and write again to the pipe. Even
> if we think we don't need to. This way, we're not stuck in "want down,
> got TERM."
>
> So with an answer in hand... is this behavior by design? It seems
> that a particularly slow child shouldn't immunize itself from a TERM
> because of a slow load time or late signal handler registration.
>
> Thoughts appreciated! Thanks!
>
> -ryan woodrum
next prev parent reply other threads:[~2008-01-16 23:04 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-16 22:41 Ryan Woodrum
2008-01-16 23:04 ` Mike Buland [this message]
2008-01-17 0:25 ` Ryan Woodrum
2008-01-17 0:35 ` Ryan Woodrum
2008-01-17 8:25 ` Ryan Woodrum
2008-01-17 19:16 ` Mike Buland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200801161604.45554.mike@geekgene.com \
--to=mike@geekgene.com \
--cc=supervision@list.skarnet.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).