supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Ryan Woodrum <rwoodrum@avvo.com>
To: supervision@list.skarnet.org
Subject: Re: sv term handling with a slow child
Date: Thu, 17 Jan 2008 00:25:27 -0800	[thread overview]
Message-ID: <200801170025.27102.rwoodrum@avvo.com> (raw)
In-Reply-To: <200801161604.45554.mike@geekgene.com>

Not to get into the habit of replying to myself, but....

I found the problem with this and it actually appears to have nothing to do 
with runit. (Sorry, all!)   While I was on the way home from work turning 
this over in my head, it occurred to me to indeed test the default handler by 
attempting to send a sig_term to a ruby script that was simply executing a 
sleep.

I tried this on my home box running ruby v1.8.6 and it terminated as expected.  
I tried it in the environment where I was experiencing the problems (and 
where it is running ruby v1.8.5) and the process did not terminate.  In 
retrospect I don't know why I didn't test this most basic of cases.

So the answer is that it is, in fact, a bug in ruby.  Or was, rather.  See 
this thread where Matsumoto chimed in in a seemingly related situation:
http://www.ruby-forum.com/topic/85485

An strace of a more recent version of ruby shows the term coming in and then 
being handled by default:
-----
)    = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f56960, [], 0}, 8) = 0
rt_sigaction(SIGTERM, {SIG_DFL}, {0xb7f56960, [], 0}, 8) = 0
kill(9481, SIGTERM)                     = 0
--- SIGTERM (Terminated) @ 0 (0) ---
+++ killed by SIGTERM +++


The older version doesn't seem to be doing this.  Is the call to sigreturn 
indicative of the default handler not doing anything...?
-----
)    = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
select(0, NULL, NULL, NULL, {13, 616000}


) = 0 (Timeout)
time(NULL)                              = 1200558046
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f890d0, [], 0}, 8) = 0
exit_group(0)                           = ?


I don't believe I understand it 100% yet, but regardless, it is not a runit 
problem.

-ryan woodrum


On Wednesday 16 January 2008 03:04:45 pm Mike Buland wrote:
> Hi
>
> I went ahead and ran a few tests, including your ruby script.  I can't
> apparently repreduce the behaviour you describe.
>
> On linux (and POSIX systems) there is a default signal handler for many of
> the signals.  The terminate signal normally ends the process.  At least in
> my tests the ruby program is indeed terminated, the process ends, and the
> status in runit is set to 'd' or down.  It is set to down, but the program
> is gone.
>
> When I wrote my own test in C:
> ----
> #include <stdlib.h>
>
> int main()
> {
> 	sleep( 50000 );
> }
> ----
>
> to test the behaviour of TERM everything works as expected.  No term signal
> handler is registered, sending the program a term on the command line
> (kill -15 $pid) terminates the program.  Then I tried ignoring term:
>
> ----
> #include <stdio.h>
> #include <stdlib.h>
> #include <signal.h>
>
> int main()
> {
> 	signal( 15, SIG_IGN );
> 	sleep( 50000 );
> }
> ----
>
> And the program kept running.  Testing both of these programs with runit
> gave the expected results.  The program using the default signal handler
> exited as soon as runit sent it term, and the status of the service was set
> accordingly, for the second program term was ignored and runit went
> into "want down, got TERM" state.
>
> On your system, are you 100% sure that the ruby test program you're using
> isn't just exiting appropriately?  I can't find anything that mimics the
> described bahaviour.  I.E.  runit is behaving the way you describe, but the
> process does end.
>
> --Mike
>
> On Wednesday 16 January 2008 03:41:29 pm Ryan Woodrum wrote:
> > Hello!
> >
> > I believe I have found a possible bug/oddity in the behavior of sv
> > using runsv.  I happened upon this particular scenario in a test
> > environment, but was actually able to repro it in my production
> > environment as well as in a primitive case.  The issue involves slow
> > children or children whose TERM handler isn't registered soon enough.
> >
> > Here's the setup:
> > I create a simplistic base service configuration under which I will
> > run a ruby application.  The ruby app looks like so:
> > slow_signal.rb
> > ---
> > sleep(10)
> >
> > puts "registering term handler..."
> > trap("TERM") do
> >   puts "got term"
> >   exit
> > end
> >
> > while(true) do
> >   puts "looping and sleeping..."
> >   sleep 2
> > end
> > ---
> >
> > I run this under my run svdir with:
> > #!/bin/sh
> > exec 2>&1
> > exec /usr/bin/ruby /home/rwoodrum/tmp/slow_signal.rb
> >
> >
> > The premise of the primitive ruby application is to emulate a slow-ish
> > loading base of code that has a term handler registered early in the
> > life of the process.
> >
> > If I invoke:
> > /etc/init.d/slow_signal start
> >
> > followed within the 10 second sleep period by:
> > /etc/init.d/slow_signal stop
> >
> > (/etc/init.d/slow_signal is a symlink to /usr/bin/sv)
> >
> > The process does not handle the signal but its state is set to 'd';
> > down.  In subsequent calls to control() within sv.c, it will no longer
> > write to the pipe because it thinks there is no need.  With no further
> > writes to the pipe, another TERM will never get sent and so the
> > process cannot be shut down via sv/runsv, at least not with TERM.
> >
> > It took me awhile to learn how everything was work and to track down
> > just where this check was happening.  The source I worked against was
> > the source available via the debian package v1.8.0 (`apt-get source
> > runit` under debian sid).  (I looked for a repo but did not find a public
> > one.)
> >
> > Two solutions I can think of are not to set svstatus[17] unless you're
> > sure the process actually went down, but this is more complicated
> > (perhaps more correct?) than a second solution.  Inside of control() in
> > sv.c, a modification to always send a TERM can be made like so:
> > -----
> > 247c247,248
> > <   if (svstatus[17] == *a) return(0);
> > ---
> >
> > >   /* Write a TERM to the pipe even if we already have.  Slow TERM
> > >   handler perhaps?  What about other cases?*/
> > >   if (svstatus[17] == *a && *a != 'd') return(0);
> >
> > -----
> >
> > In this case, we simply decide that, if we want to issue a TERM via sv
> > stop, down etc., we will go ahead and write again to the pipe.  Even
> > if we think we don't need to.  This way, we're not stuck in "want down,
> > got TERM."
> >
> > So with an answer in hand... is this behavior by design?  It seems
> > that a particularly slow child shouldn't immunize itself from a TERM
> > because of a slow load time or late signal handler registration.
> >
> > Thoughts appreciated!  Thanks!
> >
> > -ryan woodrum


  parent reply	other threads:[~2008-01-17  8:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-16 22:41 Ryan Woodrum
2008-01-16 23:04 ` Mike Buland
2008-01-17  0:25   ` Ryan Woodrum
2008-01-17  0:35   ` Ryan Woodrum
2008-01-17  8:25   ` Ryan Woodrum [this message]
2008-01-17 19:16     ` Mike Buland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801170025.27102.rwoodrum@avvo.com \
    --to=rwoodrum@avvo.com \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).