From: Mike Buland <mike@geekgene.com>
To: supervision@list.skarnet.org
Subject: Re: sv term handling with a slow child
Date: Thu, 17 Jan 2008 12:16:25 -0700 [thread overview]
Message-ID: <200801171216.25539.mike@geekgene.com> (raw)
In-Reply-To: <200801170025.27102.rwoodrum@avvo.com>
Whew, thanks for responding, I was curious about the outcome myself. That was
the next thing that I was going to suggest (ruby version comparison), and you
beat me to it. Well done finding that bug :)
--Mike
On Thursday 17 January 2008 01:25:27 am Ryan Woodrum wrote:
> Not to get into the habit of replying to myself, but....
>
> I found the problem with this and it actually appears to have nothing to do
> with runit. (Sorry, all!) While I was on the way home from work turning
> this over in my head, it occurred to me to indeed test the default handler
> by attempting to send a sig_term to a ruby script that was simply executing
> a sleep.
>
> I tried this on my home box running ruby v1.8.6 and it terminated as
> expected. I tried it in the environment where I was experiencing the
> problems (and where it is running ruby v1.8.5) and the process did not
> terminate. In retrospect I don't know why I didn't test this most basic of
> cases.
>
> So the answer is that it is, in fact, a bug in ruby. Or was, rather. See
> this thread where Matsumoto chimed in in a seemingly related situation:
> http://www.ruby-forum.com/topic/85485
>
> An strace of a more recent version of ruby shows the term coming in and
> then being handled by default:
> -----
> ) = ? ERESTARTNOHAND (To be restarted)
> --- SIGTERM (Terminated) @ 0 (0) ---
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> sigprocmask(SIG_SETMASK, [], NULL) = 0
> sigprocmask(SIG_BLOCK, NULL, []) = 0
> sigprocmask(SIG_BLOCK, NULL, []) = 0
> rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f56960, [], 0}, 8) = 0
> rt_sigaction(SIGTERM, {SIG_DFL}, {0xb7f56960, [], 0}, 8) = 0
> kill(9481, SIGTERM) = 0
> --- SIGTERM (Terminated) @ 0 (0) ---
> +++ killed by SIGTERM +++
>
>
> The older version doesn't seem to be doing this. Is the call to sigreturn
> indicative of the default handler not doing anything...?
> -----
> ) = ? ERESTARTNOHAND (To be restarted)
> --- SIGTERM (Terminated) @ 0 (0) ---
> sigreturn() = ? (mask now [])
> select(0, NULL, NULL, NULL, {13, 616000}
>
>
> ) = 0 (Timeout)
> time(NULL) = 1200558046
> sigprocmask(SIG_BLOCK, NULL, []) = 0
> sigprocmask(SIG_BLOCK, NULL, []) = 0
> rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f890d0, [], 0}, 8) = 0
> exit_group(0) = ?
>
>
> I don't believe I understand it 100% yet, but regardless, it is not a runit
> problem.
>
> -ryan woodrum
>
> On Wednesday 16 January 2008 03:04:45 pm Mike Buland wrote:
> > Hi
> >
> > I went ahead and ran a few tests, including your ruby script. I can't
> > apparently repreduce the behaviour you describe.
> >
> > On linux (and POSIX systems) there is a default signal handler for many
> > of the signals. The terminate signal normally ends the process. At
> > least in my tests the ruby program is indeed terminated, the process
> > ends, and the status in runit is set to 'd' or down. It is set to down,
> > but the program is gone.
> >
> > When I wrote my own test in C:
> > ----
> > #include <stdlib.h>
> >
> > int main()
> > {
> > sleep( 50000 );
> > }
> > ----
> >
> > to test the behaviour of TERM everything works as expected. No term
> > signal handler is registered, sending the program a term on the command
> > line (kill -15 $pid) terminates the program. Then I tried ignoring term:
> >
> > ----
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <signal.h>
> >
> > int main()
> > {
> > signal( 15, SIG_IGN );
> > sleep( 50000 );
> > }
> > ----
> >
> > And the program kept running. Testing both of these programs with runit
> > gave the expected results. The program using the default signal handler
> > exited as soon as runit sent it term, and the status of the service was
> > set accordingly, for the second program term was ignored and runit went
> > into "want down, got TERM" state.
> >
> > On your system, are you 100% sure that the ruby test program you're using
> > isn't just exiting appropriately? I can't find anything that mimics the
> > described bahaviour. I.E. runit is behaving the way you describe, but
> > the process does end.
> >
> > --Mike
> >
> > On Wednesday 16 January 2008 03:41:29 pm Ryan Woodrum wrote:
> > > Hello!
> > >
> > > I believe I have found a possible bug/oddity in the behavior of sv
> > > using runsv. I happened upon this particular scenario in a test
> > > environment, but was actually able to repro it in my production
> > > environment as well as in a primitive case. The issue involves slow
> > > children or children whose TERM handler isn't registered soon enough.
> > >
> > > Here's the setup:
> > > I create a simplistic base service configuration under which I will
> > > run a ruby application. The ruby app looks like so:
> > > slow_signal.rb
> > > ---
> > > sleep(10)
> > >
> > > puts "registering term handler..."
> > > trap("TERM") do
> > > puts "got term"
> > > exit
> > > end
> > >
> > > while(true) do
> > > puts "looping and sleeping..."
> > > sleep 2
> > > end
> > > ---
> > >
> > > I run this under my run svdir with:
> > > #!/bin/sh
> > > exec 2>&1
> > > exec /usr/bin/ruby /home/rwoodrum/tmp/slow_signal.rb
> > >
> > >
> > > The premise of the primitive ruby application is to emulate a slow-ish
> > > loading base of code that has a term handler registered early in the
> > > life of the process.
> > >
> > > If I invoke:
> > > /etc/init.d/slow_signal start
> > >
> > > followed within the 10 second sleep period by:
> > > /etc/init.d/slow_signal stop
> > >
> > > (/etc/init.d/slow_signal is a symlink to /usr/bin/sv)
> > >
> > > The process does not handle the signal but its state is set to 'd';
> > > down. In subsequent calls to control() within sv.c, it will no longer
> > > write to the pipe because it thinks there is no need. With no further
> > > writes to the pipe, another TERM will never get sent and so the
> > > process cannot be shut down via sv/runsv, at least not with TERM.
> > >
> > > It took me awhile to learn how everything was work and to track down
> > > just where this check was happening. The source I worked against was
> > > the source available via the debian package v1.8.0 (`apt-get source
> > > runit` under debian sid). (I looked for a repo but did not find a
> > > public one.)
> > >
> > > Two solutions I can think of are not to set svstatus[17] unless you're
> > > sure the process actually went down, but this is more complicated
> > > (perhaps more correct?) than a second solution. Inside of control() in
> > > sv.c, a modification to always send a TERM can be made like so:
> > > -----
> > > 247c247,248
> > > < if (svstatus[17] == *a) return(0);
> > > ---
> > >
> > > > /* Write a TERM to the pipe even if we already have. Slow TERM
> > > > handler perhaps? What about other cases?*/
> > > > if (svstatus[17] == *a && *a != 'd') return(0);
> > >
> > > -----
> > >
> > > In this case, we simply decide that, if we want to issue a TERM via sv
> > > stop, down etc., we will go ahead and write again to the pipe. Even
> > > if we think we don't need to. This way, we're not stuck in "want down,
> > > got TERM."
> > >
> > > So with an answer in hand... is this behavior by design? It seems
> > > that a particularly slow child shouldn't immunize itself from a TERM
> > > because of a slow load time or late signal handler registration.
> > >
> > > Thoughts appreciated! Thanks!
> > >
> > > -ryan woodrum
prev parent reply other threads:[~2008-01-17 19:16 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-16 22:41 Ryan Woodrum
2008-01-16 23:04 ` Mike Buland
2008-01-17 0:25 ` Ryan Woodrum
2008-01-17 0:35 ` Ryan Woodrum
2008-01-17 8:25 ` Ryan Woodrum
2008-01-17 19:16 ` Mike Buland [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200801171216.25539.mike@geekgene.com \
--to=mike@geekgene.com \
--cc=supervision@list.skarnet.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).