supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Mike Buland <mike@geekgene.com>
To: supervision@list.skarnet.org
Subject: Re: sv term handling with a slow child
Date: Thu, 17 Jan 2008 12:16:25 -0700	[thread overview]
Message-ID: <200801171216.25539.mike@geekgene.com> (raw)
In-Reply-To: <200801170025.27102.rwoodrum@avvo.com>


Whew, thanks for responding, I was curious about the outcome myself.  That was 
the next thing that I was going to suggest (ruby version comparison), and you 
beat me to it.  Well done finding that bug :)

--Mike

On Thursday 17 January 2008 01:25:27 am Ryan Woodrum wrote:
> Not to get into the habit of replying to myself, but....
>
> I found the problem with this and it actually appears to have nothing to do
> with runit. (Sorry, all!)   While I was on the way home from work turning
> this over in my head, it occurred to me to indeed test the default handler
> by attempting to send a sig_term to a ruby script that was simply executing
> a sleep.
>
> I tried this on my home box running ruby v1.8.6 and it terminated as
> expected. I tried it in the environment where I was experiencing the
> problems (and where it is running ruby v1.8.5) and the process did not
> terminate.  In retrospect I don't know why I didn't test this most basic of
> cases.
>
> So the answer is that it is, in fact, a bug in ruby.  Or was, rather.  See
> this thread where Matsumoto chimed in in a seemingly related situation:
> http://www.ruby-forum.com/topic/85485
>
> An strace of a more recent version of ruby shows the term coming in and
> then being handled by default:
> -----
> )    = ? ERESTARTNOHAND (To be restarted)
> --- SIGTERM (Terminated) @ 0 (0) ---
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> sigprocmask(SIG_SETMASK, [], NULL)      = 0
> sigprocmask(SIG_BLOCK, NULL, [])        = 0
> sigprocmask(SIG_BLOCK, NULL, [])        = 0
> rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f56960, [], 0}, 8) = 0
> rt_sigaction(SIGTERM, {SIG_DFL}, {0xb7f56960, [], 0}, 8) = 0
> kill(9481, SIGTERM)                     = 0
> --- SIGTERM (Terminated) @ 0 (0) ---
> +++ killed by SIGTERM +++
>
>
> The older version doesn't seem to be doing this.  Is the call to sigreturn
> indicative of the default handler not doing anything...?
> -----
> )    = ? ERESTARTNOHAND (To be restarted)
> --- SIGTERM (Terminated) @ 0 (0) ---
> sigreturn()                             = ? (mask now [])
> select(0, NULL, NULL, NULL, {13, 616000}
>
>
> ) = 0 (Timeout)
> time(NULL)                              = 1200558046
> sigprocmask(SIG_BLOCK, NULL, [])        = 0
> sigprocmask(SIG_BLOCK, NULL, [])        = 0
> rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f890d0, [], 0}, 8) = 0
> exit_group(0)                           = ?
>
>
> I don't believe I understand it 100% yet, but regardless, it is not a runit
> problem.
>
> -ryan woodrum
>
> On Wednesday 16 January 2008 03:04:45 pm Mike Buland wrote:
> > Hi
> >
> > I went ahead and ran a few tests, including your ruby script.  I can't
> > apparently repreduce the behaviour you describe.
> >
> > On linux (and POSIX systems) there is a default signal handler for many
> > of the signals.  The terminate signal normally ends the process.  At
> > least in my tests the ruby program is indeed terminated, the process
> > ends, and the status in runit is set to 'd' or down.  It is set to down,
> > but the program is gone.
> >
> > When I wrote my own test in C:
> > ----
> > #include <stdlib.h>
> >
> > int main()
> > {
> > 	sleep( 50000 );
> > }
> > ----
> >
> > to test the behaviour of TERM everything works as expected.  No term
> > signal handler is registered, sending the program a term on the command
> > line (kill -15 $pid) terminates the program.  Then I tried ignoring term:
> >
> > ----
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <signal.h>
> >
> > int main()
> > {
> > 	signal( 15, SIG_IGN );
> > 	sleep( 50000 );
> > }
> > ----
> >
> > And the program kept running.  Testing both of these programs with runit
> > gave the expected results.  The program using the default signal handler
> > exited as soon as runit sent it term, and the status of the service was
> > set accordingly, for the second program term was ignored and runit went
> > into "want down, got TERM" state.
> >
> > On your system, are you 100% sure that the ruby test program you're using
> > isn't just exiting appropriately?  I can't find anything that mimics the
> > described bahaviour.  I.E.  runit is behaving the way you describe, but
> > the process does end.
> >
> > --Mike
> >
> > On Wednesday 16 January 2008 03:41:29 pm Ryan Woodrum wrote:
> > > Hello!
> > >
> > > I believe I have found a possible bug/oddity in the behavior of sv
> > > using runsv.  I happened upon this particular scenario in a test
> > > environment, but was actually able to repro it in my production
> > > environment as well as in a primitive case.  The issue involves slow
> > > children or children whose TERM handler isn't registered soon enough.
> > >
> > > Here's the setup:
> > > I create a simplistic base service configuration under which I will
> > > run a ruby application.  The ruby app looks like so:
> > > slow_signal.rb
> > > ---
> > > sleep(10)
> > >
> > > puts "registering term handler..."
> > > trap("TERM") do
> > >   puts "got term"
> > >   exit
> > > end
> > >
> > > while(true) do
> > >   puts "looping and sleeping..."
> > >   sleep 2
> > > end
> > > ---
> > >
> > > I run this under my run svdir with:
> > > #!/bin/sh
> > > exec 2>&1
> > > exec /usr/bin/ruby /home/rwoodrum/tmp/slow_signal.rb
> > >
> > >
> > > The premise of the primitive ruby application is to emulate a slow-ish
> > > loading base of code that has a term handler registered early in the
> > > life of the process.
> > >
> > > If I invoke:
> > > /etc/init.d/slow_signal start
> > >
> > > followed within the 10 second sleep period by:
> > > /etc/init.d/slow_signal stop
> > >
> > > (/etc/init.d/slow_signal is a symlink to /usr/bin/sv)
> > >
> > > The process does not handle the signal but its state is set to 'd';
> > > down.  In subsequent calls to control() within sv.c, it will no longer
> > > write to the pipe because it thinks there is no need.  With no further
> > > writes to the pipe, another TERM will never get sent and so the
> > > process cannot be shut down via sv/runsv, at least not with TERM.
> > >
> > > It took me awhile to learn how everything was work and to track down
> > > just where this check was happening.  The source I worked against was
> > > the source available via the debian package v1.8.0 (`apt-get source
> > > runit` under debian sid).  (I looked for a repo but did not find a
> > > public one.)
> > >
> > > Two solutions I can think of are not to set svstatus[17] unless you're
> > > sure the process actually went down, but this is more complicated
> > > (perhaps more correct?) than a second solution.  Inside of control() in
> > > sv.c, a modification to always send a TERM can be made like so:
> > > -----
> > > 247c247,248
> > > <   if (svstatus[17] == *a) return(0);
> > > ---
> > >
> > > >   /* Write a TERM to the pipe even if we already have.  Slow TERM
> > > >   handler perhaps?  What about other cases?*/
> > > >   if (svstatus[17] == *a && *a != 'd') return(0);
> > >
> > > -----
> > >
> > > In this case, we simply decide that, if we want to issue a TERM via sv
> > > stop, down etc., we will go ahead and write again to the pipe.  Even
> > > if we think we don't need to.  This way, we're not stuck in "want down,
> > > got TERM."
> > >
> > > So with an answer in hand... is this behavior by design?  It seems
> > > that a particularly slow child shouldn't immunize itself from a TERM
> > > because of a slow load time or late signal handler registration.
> > >
> > > Thoughts appreciated!  Thanks!
> > >
> > > -ryan woodrum



      reply	other threads:[~2008-01-17 19:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-16 22:41 Ryan Woodrum
2008-01-16 23:04 ` Mike Buland
2008-01-17  0:25   ` Ryan Woodrum
2008-01-17  0:35   ` Ryan Woodrum
2008-01-17  8:25   ` Ryan Woodrum
2008-01-17 19:16     ` Mike Buland [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801171216.25539.mike@geekgene.com \
    --to=mike@geekgene.com \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).