supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
From: Steve Litt <slitt@troubleshooters.com>
To: supervision@list.skarnet.org
Subject: Re: runit: why ignore SIGCONT for stages?
Date: Tue, 23 Nov 2021 13:28:30 -0500	[thread overview]
Message-ID: <20211123132830.26b5a62f@mydesk.domain.cxm> (raw)
In-Reply-To: <87tug3vzex.fsf@vuxu.org>

Leah Neukirchen said on Tue, 23 Nov 2021 13:17:58 +0100

>Hello,
>
>During debugging a ksh issue (https://github.com/ksh93/ksh/issues/301),
>we noticed that many processes on a Void Linux system booted with runit
>are ignoring SIGCONT.  This seems to be due to runit(8) before execing
>into the stages:
>
>      sig_unblock(sig_cont);
>      sig_ignore(sig_cont);
>...
>      strerr_warn3(INFO, "enter stage: ", stage[st], 0);
>      execve(*prog, (char *const *)prog, envp);
>
>This code has been there since 2001.  Can someone explain why?
>Ignoring SIGCONT seems to be a no-op, and the default handler seems to
>create no problems for other init systems.

Hi Leah,

For one thing, are you sure you're sending the SIGCONT to the correct
process? As far as I know, runit provides no way to retrieve the PID of
a daemon, so how do you send the signal?

Also, how do you know whether the daemon is stopped or paused? Is it
possible that the SIGCONT *is* working correctly on the daemon?

Assuming the preceding two questions indicate you're sending the
right signal to the right daemon, and the daemon really isn't
responding, I have an idea why runit might be built this way...

I've neither looked at that part of the source code, nor done any
experimentation, so what I'm about to say is pure guess.

My guess is that runit's intent was to have SIGSTOP and SIGCONT done
solely by the sv command, as described by the sv man page:

/* Stop mydaemon in its tracks */
sv pause mydaemon

/* Make mydaemon pick up where it left off */
/* Note the syntax diverges from the sv man page */
sv cont mydaemon 

My proposed explanation has some logical inconsistencies: This actually
makes some sense, because to directly send a signal to the daemon,
you'd need its PID, and daemontools/runit/s6 don't write a PID file, as
far as I know.

1) If SIGCONT is really shut off in the daemon, then sv can't send
   the daemon a SIGCONT any more than anyone else.

2) If runsv is required for a specifically crafted program to run (one
   that sends a SIGCONT to a daemon), then why is better than systemd?
   I suppose it would be easy enough to #IFDEF RUNIT or something, for
   the sole purpose of sending signals to the daemon.

I don't have time to research this right now, but if I were to research
it, I'd build a dummy daemon that did nothing but write a file on /tmp
every second, writing an incrementing integer and the time. Then run a
shellscript something like the following:

sv status mydaemon
echo "Before stop ======================="
sv stop mydaemon
sv status mydaemon
echo "After stop ======================="
sleep 30
echo "After sleep ======================="
sv status mydaemon
echo "After cont ======================="
sv cont mydaemon
sv status mydaemon
echo "Done ======================="

If the integer picks up where it left off, even though the time skips
30 seconds, that's proof you were stopped. If the last sv status says
"run" instead of "pause", it's proof that the process continued.

If your desire for SIGCONT to work is just so you can start and stop it
from the command prompt, you can just use sv stop and sv cont instead.
If you have a program that actually needs to stop and continue the
daemon, and this program needs to be portable, I guess you have to
detect that the runit supervisor is running and ran your daemon, by
doing a sv status mydaemon, and if the preceding returns 0 then 
system("sv stop mydaemon") or system("sv cont mydaemon"). Otherwise
send the signal manually.

Please keep us in the loop. I've used runit for 6 years, and until now,
I thought it was flawless except for the theoretical disadvantage of
PD1 not supervising anything, and the theoretical disadvantage of not
being able to start the daemons in a particular order, and the
theoretical disadvantage of polling.

Thanks,

SteveT

Steve Litt 
Spring 2021 featured book: Troubleshooting Techniques of the Successful
Technologist http://www.troubleshooters.com/techniques

  reply	other threads:[~2021-11-23 18:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 12:17 Leah Neukirchen
2021-11-23 18:28 ` Steve Litt [this message]
2021-11-27 20:05 ` Guillermo
2021-11-27 20:45   ` Steve Litt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211123132830.26b5a62f@mydesk.domain.cxm \
    --to=slitt@troubleshooters.com \
    --cc=supervision@list.skarnet.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).