supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Ryan Woodrum <rwoodrum@avvo.com>
To: supervision@list.skarnet.org
Subject: Re: sv term handling with a slow child
Date: Wed, 16 Jan 2008 16:35:40 -0800	[thread overview]
Message-ID: <200801161635.40876.rwoodrum@avvo.com> (raw)
In-Reply-To: <200801161604.45554.mike@geekgene.com>

I should add for clarity to my first, well behaved example showing that the 
process does indeed exit:

ops1test:/home/rwoodrum/tmp# /etc/init.d/slow_signal start \
> && ps ax | grep slow \
> && sleep 12 \
> && /etc/init.d/slow_signal stop \
> && ps ax | grep slow
ok: run: slow_signal: (pid 31434) 58s
30229 ?        Ss     0:00 runsv slow_signal
31434 ?        S      0:00 /usr/bin/ruby /home/rwoodrum/tmp/slow_signal.rb
31446 ttyp0    S+     0:00 grep slow
ok: down: slow_signal: 0s, normally up
30229 ?        Ss     0:00 runsv slow_signal
31456 ttyp0    S+     0:00 grep slow


On Wednesday 16 January 2008 03:04:45 pm Mike Buland wrote:
> Hi
>
> I went ahead and ran a few tests, including your ruby script.  I can't
> apparently repreduce the behaviour you describe.
>
> On linux (and POSIX systems) there is a default signal handler for many of
> the signals.  The terminate signal normally ends the process.  At least in
> my tests the ruby program is indeed terminated, the process ends, and the
> status in runit is set to 'd' or down.  It is set to down, but the program
> is gone.
>
> When I wrote my own test in C:
> ----
> #include <stdlib.h>
>
> int main()
> {
> 	sleep( 50000 );
> }
> ----
>
> to test the behaviour of TERM everything works as expected.  No term signal
> handler is registered, sending the program a term on the command line
> (kill -15 $pid) terminates the program.  Then I tried ignoring term:
>
> ----
> #include <stdio.h>
> #include <stdlib.h>
> #include <signal.h>
>
> int main()
> {
> 	signal( 15, SIG_IGN );
> 	sleep( 50000 );
> }
> ----
>
> And the program kept running.  Testing both of these programs with runit
> gave the expected results.  The program using the default signal handler
> exited as soon as runit sent it term, and the status of the service was set
> accordingly, for the second program term was ignored and runit went
> into "want down, got TERM" state.
>
> On your system, are you 100% sure that the ruby test program you're using
> isn't just exiting appropriately?  I can't find anything that mimics the
> described bahaviour.  I.E.  runit is behaving the way you describe, but the
> process does end.
>
> --Mike
>
> On Wednesday 16 January 2008 03:41:29 pm Ryan Woodrum wrote:
> > Hello!
> >
> > I believe I have found a possible bug/oddity in the behavior of sv
> > using runsv.  I happened upon this particular scenario in a test
> > environment, but was actually able to repro it in my production
> > environment as well as in a primitive case.  The issue involves slow
> > children or children whose TERM handler isn't registered soon enough.
> >
> > Here's the setup:
> > I create a simplistic base service configuration under which I will
> > run a ruby application.  The ruby app looks like so:
> > slow_signal.rb
> > ---
> > sleep(10)
> >
> > puts "registering term handler..."
> > trap("TERM") do
> >   puts "got term"
> >   exit
> > end
> >
> > while(true) do
> >   puts "looping and sleeping..."
> >   sleep 2
> > end
> > ---
> >
> > I run this under my run svdir with:
> > #!/bin/sh
> > exec 2>&1
> > exec /usr/bin/ruby /home/rwoodrum/tmp/slow_signal.rb
> >
> >
> > The premise of the primitive ruby application is to emulate a slow-ish
> > loading base of code that has a term handler registered early in the
> > life of the process.
> >
> > If I invoke:
> > /etc/init.d/slow_signal start
> >
> > followed within the 10 second sleep period by:
> > /etc/init.d/slow_signal stop
> >
> > (/etc/init.d/slow_signal is a symlink to /usr/bin/sv)
> >
> > The process does not handle the signal but its state is set to 'd';
> > down.  In subsequent calls to control() within sv.c, it will no longer
> > write to the pipe because it thinks there is no need.  With no further
> > writes to the pipe, another TERM will never get sent and so the
> > process cannot be shut down via sv/runsv, at least not with TERM.
> >
> > It took me awhile to learn how everything was work and to track down
> > just where this check was happening.  The source I worked against was
> > the source available via the debian package v1.8.0 (`apt-get source
> > runit` under debian sid).  (I looked for a repo but did not find a public
> > one.)
> >
> > Two solutions I can think of are not to set svstatus[17] unless you're
> > sure the process actually went down, but this is more complicated
> > (perhaps more correct?) than a second solution.  Inside of control() in
> > sv.c, a modification to always send a TERM can be made like so:
> > -----
> > 247c247,248
> > <   if (svstatus[17] == *a) return(0);
> > ---
> >
> > >   /* Write a TERM to the pipe even if we already have.  Slow TERM
> > >   handler perhaps?  What about other cases?*/
> > >   if (svstatus[17] == *a && *a != 'd') return(0);
> >
> > -----
> >
> > In this case, we simply decide that, if we want to issue a TERM via sv
> > stop, down etc., we will go ahead and write again to the pipe.  Even
> > if we think we don't need to.  This way, we're not stuck in "want down,
> > got TERM."
> >
> > So with an answer in hand... is this behavior by design?  It seems
> > that a particularly slow child shouldn't immunize itself from a TERM
> > because of a slow load time or late signal handler registration.
> >
> > Thoughts appreciated!  Thanks!
> >
> > -ryan woodrum


  parent reply	other threads:[~2008-01-17  0:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-16 22:41 Ryan Woodrum
2008-01-16 23:04 ` Mike Buland
2008-01-17  0:25   ` Ryan Woodrum
2008-01-17  0:35   ` Ryan Woodrum [this message]
2008-01-17  8:25   ` Ryan Woodrum
2008-01-17 19:16     ` Mike Buland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801161635.40876.rwoodrum@avvo.com \
    --to=rwoodrum@avvo.com \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).