From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	NICE_REPLY_A autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 25140 invoked from network); 23 Oct 2021 15:48:38 -0000
Received: from alyss.skarnet.org (95.142.172.232)
  by inbox.vuxu.org with ESMTPUTF8; 23 Oct 2021 15:48:38 -0000
Received: (qmail 8240 invoked by uid 89); 23 Oct 2021 15:49:00 -0000
Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm
Sender: <supervision@list.skarnet.org>
Precedence: bulk
List-Post: <mailto:supervision@list.skarnet.org>
List-Help: <mailto:supervision-help@list.skarnet.org>
List-Unsubscribe: <mailto:supervision-unsubscribe@list.skarnet.org>
List-Subscribe: <mailto:supervision-subscribe@list.skarnet.org>
List-Id: <supervision.list.skarnet.org>
Received: (qmail 8233 invoked from network); 23 Oct 2021 15:49:00 -0000
X-Injected-Via-Gmane: http://gmane.org/
To: supervision@list.skarnet.org
From: Ben Franksen <ben.franksen@online.de>
Subject: Re: logging services with shell interaction
Date: Sat, 23 Oct 2021 17:48:23 +0200
Message-ID: <sl1as7$loi$1@ciao.gmane.io>
References: <skm1ef$nq9$1@ciao.gmane.io>
 <emba464994-2a6f-4713-af9e-68f4e93b785a@elzian> <skohum$a4l$1@ciao.gmane.io>
 <YXBZeTL4uBncSDl/@CasperVector>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.13.0
In-Reply-To: <YXBZeTL4uBncSDl/@CasperVector>
Content-Language: en-US

Hi Casper

Am 20.10.21 um 20:01 schrieb Casper Ti. Vector:
> On Wed, Oct 20, 2021 at 09:53:58AM +0200, Ben Franksen wrote:
>> Interesting, I didn't know about recordio, will take a look.
> 
> Hello from a fellow sufferer from EPICS.  (If you see a paper on some
> synchrotron-related journal in a few months that mentions "automation
> of automation", it will be from me, albeit not using a pseudonym.
> Another shameless plug: <https://github.com/CasperVector/ADXspress3>.)

Interesting, I didn't know you are from the accelerator community!

> As has been said by Laurent, in the presence of a supervision system
> with reliable logging and proper rotation, what `procServ' mainly does
> can be done better by something like `socat' which wraps something like
> `recordio', which in turn wraps the actual service process (EPICS IOC).

Yeah, that's what I was thinking, too.

> The devil is in the details: most importantly, when the service is to
> be stopped, the ideal situation is that the actual service process gets
> killed, leading to the graceful exit of `recordio' and then `socat'.
> 
> So the two wrapping programs need to propagate the killing signal, and
> then exit after waiting for the subprocess; since `procServ' defaults
> to kill the subprocess using SIGKILL, `recordio' also needs to translate
> the signal if this is to be emulated.  `socat' does this correctly when
> the `sighup'/`sigint'/`sigquit' options are given for `exec' addresses,
> but its manual page does not state about SIGTERM.  `recordio' does not
> seem to propagate (let alone translate) the signal; additionally, its
> output format (which is after all mainly used for debugging) feels too
> low-level to me, and perhaps needs to be adjusted.

I agree. BTW, another detail is the special handling of certain control 
characters by procServ: ^X to restart the child, ^T to toggle 
auto-restart, and the possibility to disable some others like ^C and 
especially ^D; which is not only convenient but also avoids accidental 
restarts (people are used to ^D meaning "exit the shell").

> At the facility where I am from, we use CentOS 7 and unsupervised
> procServ (triple shame for a systemd opponent, s6 enthusiast and
> minimalist :(), because we have not yet been bitten by log rotation
> problems.  It also takes quite an amount of code to implement the
> dynamic management of user supervision trees for IOCs, in addition
> to the adjustments needed for `recordio'.  To make the situation even
> worse, we are also using procServControl; anyway, I still hope we can
> get rid of procServ entirely someday.

Our approach uses a somewhat hybrid mixture of several components. Since 
the OS is Debian we use systemd service units, one for each IOC. They 
are executing `/usr/bin/unshare -u sethostname %i runuser -u ioc -- 
softIOC-run %i` which fakes the host name to trick EPICS' Channel Access 
"Security" into the proper behavior, and then drops privileges. 
softIOC-run is the script of which I posted a simplified version, with 
the pipeline between procServ and multilog. Despite the disadvantages 
explained by Laurent, so far this works pretty well (I have never yet 
observed multilog to crash or otherwise misbehave). Finally, the 
configuration for all IOCs (name, which host do they run on, path to the 
startup script) all reside in a small database and there are scripts to 
automatically install everything, including automatic enabling and 
disabling of the service units.

When I started developing this scheme I thought that systemd was a great 
leap forward from /etc/init.d scripts. I still think so, but I quickly 
became frustrated with its monolithic approach. Despite 1000s of 
configuration options, it always seemed like the one I needed was 
missing. I spend days and days debugging service units that should have 
worked according to the docs but did not, for reasons I wasn't always 
able to figure out. Nowadays my standing assumption about systemd is 
that nothing you didn't thoroughly test should be expected to work, 
regardless of what the docs claim.

In contrast, I found that small specialized tools that use the 
chain-loading technique to modify a particular aspect of a program much 
more reliably produce exactly the desired effect and nothing more. The 
fine-grained control this gives you over the order of these effects 
(like, first fake the host name, then drop privileges) is something that 
a monolith with an unstructured flat configuration language cannot give 
you. The syntactic simplicity of systemd's configuration language is 
certainly appealing, especially for non-programmers, but this easily 
lets you forget the extreme complexity of its semantics. I cannot help 
but see the machine executing it as an idiosyncratic monster with lots 
of poorly handled corner cases.

I would like to experiment with alternatives like s6/s6-rc but that 
means using one of the small distros that support it and I am sure such 
a proposal would not be well received.

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman