From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 5704 invoked from network); 21 Oct 2021 09:20:46 -0000 Received: from alyss.skarnet.org (95.142.172.232) by inbox.vuxu.org with ESMTPUTF8; 21 Oct 2021 09:20:46 -0000 Received: (qmail 4221 invoked by uid 89); 21 Oct 2021 09:21:08 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Received: (qmail 4214 invoked from network); 21 Oct 2021 09:21:08 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=malat-biz.20210112.gappssmtp.com; s=20210112; h=date:from:to:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VuoHpuR/w/Ap1Baz6HMG1VLNk1b0MYZIL5Db+aSki2U=; b=n80d1QtZ5wssWWRpPDXMY8v6yEoBux63d7kuwWGWnJ27tF25oUYI2+L6EbHc/CzLUI /z85/UF2Cm0bAssbyF/PY1GqrYfs0tprniHQll1KZxIiaOKcs5vLFZCChHj678b84tVJ /tKG7b3b4rI7Z80LEjCaUkCMw3r+RQ+ctNXKEkdushsx6gp4dElEEQqlRFgE1voep15y FLVHPiArtE69SXiUpSgA/nCSIc1mI8fpLGxIn3Zq5I+QQk3A7SO1XU2eEuPuXRl0qnqZ gZfDNR5gbg96qr8HlnI90MyXxwh45vFRvq6w94Qzz83oeC7dxarjThRcOdMCeJHIQZUv LlYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VuoHpuR/w/Ap1Baz6HMG1VLNk1b0MYZIL5Db+aSki2U=; b=MMcx4mVkoEUfQjF75EvmXVWtZVAIKN15raF/CaloIvlOLiqRJuvypB1XjvhRIL+x9m Bv4kxjXlWpoRpOWyGQppKKeyA5HGerFw/m5euGe8WRASKKbeUj+nX44/ZThsV+EKRMX6 +TUxRZhFn0VSWDznvTzwu75jvHw1C1rqa+n088uAGIBbDxoXDWAy4BjnHioNWbAuhXep YQXCKSo2B1BTYfK3GWZbi5IE850mc1xi5RR8jbIRCctQ7J8jFO9MapPQlPFhKZHlIvWi 2lQ9p+GXlGU1r3PKFa/Ln1tZfgii7nCz1VKpPGK4WwH2h0l1QEDxO71M/7YQkG6/lxP2 lkPw== X-Gm-Message-State: AOAM530/4LDf2Rieya8uAwGJNdUT/2uHeb/nC96Z2+5QjaywWiUkS4ES /hwI7WK4PdZ2UWbNoYvrPaJN+Yaxu9zdgA== X-Google-Smtp-Source: ABdhPJzmZB8XzkTedst4R3XY7OZmwDVzNhydiGxBXAPqCYIkW+iMlbEEHZKUPHNz6Vv1C0mU07Bx1g== X-Received: by 2002:adf:959a:: with SMTP id p26mr5574280wrp.342.1634808040592; Thu, 21 Oct 2021 02:20:40 -0700 (PDT) Date: Thu, 21 Oct 2021 11:20:05 +0200 From: Petr Malat To: supervision@list.skarnet.org Subject: Re: Service watchdog Message-ID: References: <3A0AF46B-BB65-4D92-92F5-795FA3A5794D@umbrellix.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi! > > Yes, in my usecase this would be used at the place where sd_notify() > > is used if the service runs under systemd. Then periodically executed > > watchdog could check the service makes progress and react if it > > doesn't. > > If a single notification step is enough for you, i.e. the service > goes from a "preparing" state to a "ready" state and remains ready > until the process dies, then what you want is implemented in the s6 > process supervisor: https://skarnet.org/software/s6/notifywhenup.html > > Then you can synchronously wait for service readiness > (s6-svwait $service) or, if you have a watchdog service, periodically > poll for readiness (s6-svstat -r $service). > > But that's only valid if your service can only change states once > (from "not ready" to "ready"). If you need anything more complex, s6 > won't support it intrinsically. No, I need to monitor the service is alive - my watchdog script would test if the age of the status message is older than a defined threshold in which case it would kill the service (and the rest would be handled in finish script). > The reason why there isn't more advanced support for this in any > supervision suite (save systemd but even there it's pretty minimal) > is that service states other than "not ready yet" and "ready" are > very much service-dependent and it's impossible for a generic process > supervisor to support enough states for every possible existing service. > Daemons that need complex states usually come with their own > monitoring software that handles their specific states, with integrated > health checks etc. > > So my advice would be: > - if what you need is just readiness notification, switch to s6. > It's very similar to runit and I think you'll find it has other > benefits as well. The drawback, obviously, is that it's not in busybox > and the required effort to switch may not be worth it. > - if you need anything more complex, you can stick to runit, but you > will kinda need to write your own monitor for your daemon, because > that's what everyone does. > > Depending on the details of the monitoring you need, the monitoring > software can be implemented as another service (e.g. to receive > heartbeats from your daemon), or as a polling client (e.g. to do > periodic health checks). Both approaches are valid. That's what I thought of as well, but having this completely out of the runsv can lead to a possible race window when the watchdog can kill a service, which has restarted itself. This could be avoided if the check would be serialized with other steps (run/finish execution) within runsv. So far the futile restart of the service doesn't seem to cause problems to me, so I'm not much bothered with it. > Don't hack on runit, especially the control pipe thing. It will not > end well. > (runit's control pipe feature is super dangerous, because it allows a > service to hijack the control flow of its supervisor, which endangers > the supervisor's safety. That's why s6 does not implement it; it > provides similar - albeit slightly less powerful - control features > via ways that never give the service any power over the supervisor.) The main reason I wanted to use the service pipe for it was a possibility to see the service status in the process tree, which would be a nice benefit. BR, Petr