From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 15152 invoked from network); 23 Nov 2021 18:28:39 -0000 Received: from alyss.skarnet.org (95.142.172.232) by inbox.vuxu.org with ESMTPUTF8; 23 Nov 2021 18:28:39 -0000 Received: (qmail 19274 invoked by uid 89); 23 Nov 2021 18:29:03 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Received: (qmail 19267 invoked from network); 23 Nov 2021 18:29:03 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; h=X-Originating-IP:Date:From:To:Subject:Message-ID:In-Reply-To:References:MIME-Version:Content-Type:Content-Transfer-Encoding; s=default; d=troubleshooters.com; b=GrkGN0IThw+r+MKvRWSlPVEZdjUVpqtUy3EQQ5dbOV0qMeM70jufih1fYIyy5RJNyj6TvXX7fc6AFpJYTWqn6fgVK9nt80oS+vs728W5UFBbrvTn2xoqRlOeMaEHpiQySVQIAypCnwUMBX4a7H9S0Zrigq2eYBGtGVpmWH0bNt8=; DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/simple; d=troubleshooters.com; s=default; t=1637692112; bh=axB7+BL5F2DfeLLs0vAem3fmOQ4=; l=4080; h=X-Originating-IP:Date:From:To:Subject:Message-ID:In-Reply-To: References:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Gg+xr1EWZ3bukLzSL7ehIo4N8Qamu7BkFpBYaciubPN2NDQoe6OKR79UN/v7mfLS0 lcg6/pYk7jABpQ8Wfkm7u8yBt4yp0/TCWYR4A+l2oWJ9MiDgmgSliLOZ0V5AHIEYf5 a7DirnpASPPMb1OVeb1JgDd1vcM4mtqawZIL66IY= X-Originating-IP: [184.90.157.212] Date: Tue, 23 Nov 2021 13:28:30 -0500 From: Steve Litt To: supervision@list.skarnet.org Subject: Re: runit: why ignore SIGCONT for stages? Message-ID: <20211123132830.26b5a62f@mydesk.domain.cxm> In-Reply-To: <87tug3vzex.fsf@vuxu.org> References: <87tug3vzex.fsf@vuxu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Leah Neukirchen said on Tue, 23 Nov 2021 13:17:58 +0100 >Hello, > >During debugging a ksh issue (https://github.com/ksh93/ksh/issues/301), >we noticed that many processes on a Void Linux system booted with runit >are ignoring SIGCONT. This seems to be due to runit(8) before execing >into the stages: > > sig_unblock(sig_cont); > sig_ignore(sig_cont); >... > strerr_warn3(INFO, "enter stage: ", stage[st], 0); > execve(*prog, (char *const *)prog, envp); > >This code has been there since 2001. Can someone explain why? >Ignoring SIGCONT seems to be a no-op, and the default handler seems to >create no problems for other init systems. Hi Leah, For one thing, are you sure you're sending the SIGCONT to the correct process? As far as I know, runit provides no way to retrieve the PID of a daemon, so how do you send the signal? Also, how do you know whether the daemon is stopped or paused? Is it possible that the SIGCONT *is* working correctly on the daemon? Assuming the preceding two questions indicate you're sending the right signal to the right daemon, and the daemon really isn't responding, I have an idea why runit might be built this way... I've neither looked at that part of the source code, nor done any experimentation, so what I'm about to say is pure guess. My guess is that runit's intent was to have SIGSTOP and SIGCONT done solely by the sv command, as described by the sv man page: /* Stop mydaemon in its tracks */ sv pause mydaemon /* Make mydaemon pick up where it left off */ /* Note the syntax diverges from the sv man page */ sv cont mydaemon My proposed explanation has some logical inconsistencies: This actually makes some sense, because to directly send a signal to the daemon, you'd need its PID, and daemontools/runit/s6 don't write a PID file, as far as I know. 1) If SIGCONT is really shut off in the daemon, then sv can't send the daemon a SIGCONT any more than anyone else. 2) If runsv is required for a specifically crafted program to run (one that sends a SIGCONT to a daemon), then why is better than systemd? I suppose it would be easy enough to #IFDEF RUNIT or something, for the sole purpose of sending signals to the daemon. I don't have time to research this right now, but if I were to research it, I'd build a dummy daemon that did nothing but write a file on /tmp every second, writing an incrementing integer and the time. Then run a shellscript something like the following: sv status mydaemon echo "Before stop =======================" sv stop mydaemon sv status mydaemon echo "After stop =======================" sleep 30 echo "After sleep =======================" sv status mydaemon echo "After cont =======================" sv cont mydaemon sv status mydaemon echo "Done =======================" If the integer picks up where it left off, even though the time skips 30 seconds, that's proof you were stopped. If the last sv status says "run" instead of "pause", it's proof that the process continued. If your desire for SIGCONT to work is just so you can start and stop it from the command prompt, you can just use sv stop and sv cont instead. If you have a program that actually needs to stop and continue the daemon, and this program needs to be portable, I guess you have to detect that the runit supervisor is running and ran your daemon, by doing a sv status mydaemon, and if the preceding returns 0 then system("sv stop mydaemon") or system("sv cont mydaemon"). Otherwise send the signal manually. Please keep us in the loop. I've used runit for 6 years, and until now, I thought it was flawless except for the theoretical disadvantage of PD1 not supervising anything, and the theoretical disadvantage of not being able to start the daemons in a particular order, and the theoretical disadvantage of polling. Thanks, SteveT Steve Litt Spring 2021 featured book: Troubleshooting Techniques of the Successful Technologist http://www.troubleshooters.com/techniques