supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* logging services with shell interaction
@ 2021-10-19  8:59 Ben Franksen
  2021-10-19 23:27 ` Laurent Bercot
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Franksen @ 2021-10-19  8:59 UTC (permalink / raw)
  To: supervision

Hi Everyone

we have a fair number of services which allow (and occasionally require) 
user interaction via a (built-in) shell. All the shell interaction is 
supposed to be logged, in addition to all the messages that are issued 
spontaneously by the process. So we cannot directly use a logger 
attached to the stdout/stderr of the process.

procServ is a process supervisor adapted to such situations. It allows 
an external process (conserver in our case) to attach to the service's 
shell via a TCP or UNIX domain socket. procServ supports logging 
everything it sees (input and output) to a file or stdout.

In the past we had recurring problems with processes that spew out an 
extreme amount of messages, quickly filling up our local disks. Since 
logrotate runs via cron it is not possible to reliably guarantee that 
this doesn't happen. Thus, inspired by process supervision suites a la 
daemontools, we are now using a small shell wrapper script that pipes 
the output of the process into the multilog tool from the daemontools 
package.

Here is the script, slightly simplified. Most of the parameters are 
passed via environment.

```
IOC=$1

/usr/bin/procServ -f -L- --logstamp --timefmt="$TIMEFMT" \
  -q -n %i --ignore=^D^C^] -P "unix:$RUNDIR/$IOC" -c "$BOOTDIR" "./$STCMD" \
  | /usr/bin/multilog "s$LOGSIZE" "n$LOGNUM" "$LOGDIR/$IOC"
```

So far this seems to do the job, but I have two questions:

1. Is there anything "bad" about this approach? Most supervision tools 
have this sort of thing as a built-in feature and I suspect there may be 
a reason for that other than mere convenience.

2. Do any of the existing process supervision tools support what 
procServ gives us wrt interactive shell access from outside?

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: logging services with shell interaction
  2021-10-19  8:59 logging services with shell interaction Ben Franksen
@ 2021-10-19 23:27 ` Laurent Bercot
  2021-10-20  7:53   ` Ben Franksen
  0 siblings, 1 reply; 7+ messages in thread
From: Laurent Bercot @ 2021-10-19 23:27 UTC (permalink / raw)
  To: Ben Franksen, supervision

>we have a fair number of services which allow (and occasionally require) user interaction via a (built-in) shell. All the shell interaction is supposed to be logged, in addition to all the messages that are issued spontaneously by the process. So we cannot directly use a logger attached to the stdout/stderr of the process.

  I don't understand the consequence relationship here.

  - If you control your services / builtin shells, the services could
have an option to log the IO of their shells to stderr, as well as
their own messages.
  - Even if you cannot make the services log the shell IO, you can add
a small data dumper in front of the service's shell, which transmits
full-duplex everything it gets but also writes it to its own stdout or
stderr; if that stdout/err is the same pipe as the stdout/err of your
service, then all the IO from the shell will be logged to the same place
(and log lines won't be mixed unless they're more than PIPE_BUF bytes
long, which shouldn't happen in practice). So with that solution you
could definitely make your services log to multilog.


>procServ is a process supervisor adapted to such situations. It allows an external process (conserver in our case) to attach to the service's shell via a TCP or UNIX domain socket. procServ supports logging everything it sees (input and output) to a file or stdout.

  That works too.


>IOC=$1
>
>/usr/bin/procServ -f -L- --logstamp --timefmt="$TIMEFMT" \
>  -q -n %i --ignore=^D^C^] -P "unix:$RUNDIR/$IOC" -c "$BOOTDIR" "./$STCMD" \
>  | /usr/bin/multilog "s$LOGSIZE" "n$LOGNUM" "$LOGDIR/$IOC"
>```
>
>So far this seems to do the job, but I have two questions:
>
>1. Is there anything "bad" about this approach? Most supervision tools have this sort of thing as a built-in feature and I suspect there may be a reason for that other than mere convenience.

  It's not *bad*, it's just not as airtight as supervision suites make
it. The reasons why it's a built-in feature in 
daemontools/runit/s6/others
are:
  - it allows the logger process to be supervised as well
  - it maintains open the pipe to the logger, so service and logger can
be restarted independently at will, without risk of losing logs.

  As is, you can't send signals to multilog (useful if you want to force
a rotation) without knowing its pid. And if multilog dies, it broken
pipes procServ, and it (and your service) is probably forced to restart,
and you lose the data that it wanted to write.
  A supervision architecture with integrated logging protects from this.


>2. Do any of the existing process supervision tools support what procServ gives us wrt interactive shell access from outside?

  Not that I know of, because that need is pretty specific to your
service architecture.
  However, unless there are more details you have omitted, I still
believe you could obtain the same functionality with a daemontools/etc.
infrastructure and a program recording the IO from/to the shell. Since
you don't seem opposed to using old djb programs, you could probably
even directly reuse "recordio" from ucspi-tcp for this. :)

--
  Laurent


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: logging services with shell interaction
  2021-10-19 23:27 ` Laurent Bercot
@ 2021-10-20  7:53   ` Ben Franksen
  2021-10-20 18:01     ` Casper Ti. Vector
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Franksen @ 2021-10-20  7:53 UTC (permalink / raw)
  To: supervision

Am 20.10.21 um 01:27 schrieb Laurent Bercot:
>> we have a fair number of services which allow (and occasionally 
>> require) user interaction via a (built-in) shell. All the shell 
>> interaction is supposed to be logged, in addition to all the messages 
>> that are issued spontaneously by the process. So we cannot directly 
>> use a logger attached to the stdout/stderr of the process.
> 
>   I don't understand the consequence relationship here.
> >   - If you control your services / builtin shells, the services could
> have an option to log the IO of their shells to stderr, as well as
> their own messages.

We do have control over them, theoretically, but adding this 
functionality seems impractical. This is a complex piece of software, 
built from multiple components maintained by different parties. There is 
some sort of common framework for issuing messages but none of the 
components strictly adhere to it. In other words, they use things like 
printf all over the place. The only way I see to reliably get all the IO 
for logging is to delegate this to an external process.

>   - Even if you cannot make the services log the shell IO, you can add
> a small data dumper in front of the service's shell, which transmits
> full-duplex everything it gets but also writes it to its own stdout or
> stderr; if that stdout/err is the same pipe as the stdout/err of your
> service, then all the IO from the shell will be logged to the same place
> (and log lines won't be mixed unless they're more than PIPE_BUF bytes
> long, which shouldn't happen in practice). So with that solution you
> could definitely make your services log to multilog.

Yes, that would be possible. More or less what procServ does minus the 
supervision aspect.

>> IOC=$1
>>
>> /usr/bin/procServ -f -L- --logstamp --timefmt="$TIMEFMT" \
>>  -q -n %i --ignore=^D^C^] -P "unix:$RUNDIR/$IOC" -c "$BOOTDIR" 
>> "./$STCMD" \
>>  | /usr/bin/multilog "s$LOGSIZE" "n$LOGNUM" "$LOGDIR/$IOC"
>> ```
>>
>> So far this seems to do the job, but I have two questions:
>>
>> 1. Is there anything "bad" about this approach? Most supervision tools 
>> have this sort of thing as a built-in feature and I suspect there may 
>> be a reason for that other than mere convenience.
> 
>   It's not *bad*, it's just not as airtight as supervision suites make
> it. The reasons why it's a built-in feature in daemontools/runit/s6/others
> are:
>   - it allows the logger process to be supervised as well
>   - it maintains open the pipe to the logger, so service and logger can
> be restarted independently at will, without risk of losing logs.
> 
>   As is, you can't send signals to multilog (useful if you want to force
> a rotation) without knowing its pid. And if multilog dies, it broken
> pipes procServ, and it (and your service) is probably forced to restart,
> and you lose the data that it wanted to write.
>   A supervision architecture with integrated logging protects from this.

Thanks, this answers my question perfectly.

>> 2. Do any of the existing process supervision tools support what 
>> procServ gives us wrt interactive shell access from outside?
> 
>   Not that I know of, because that need is pretty specific to your
> service architecture.

It sure is.

>   However, unless there are more details you have omitted, I still
> believe you could obtain the same functionality with a daemontools/etc.
> infrastructure and a program recording the IO from/to the shell. Since
> you don't seem opposed to using old djb programs, you could probably
> even directly reuse "recordio" from ucspi-tcp for this. :)

Interesting, I didn't know about recordio, will take a look.

Again, thanks a lot for the detailed response!

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: logging services with shell interaction
  2021-10-20  7:53   ` Ben Franksen
@ 2021-10-20 18:01     ` Casper Ti. Vector
  2021-10-23 15:48       ` Ben Franksen
  0 siblings, 1 reply; 7+ messages in thread
From: Casper Ti. Vector @ 2021-10-20 18:01 UTC (permalink / raw)
  To: supervision

On Wed, Oct 20, 2021 at 09:53:58AM +0200, Ben Franksen wrote:
> Interesting, I didn't know about recordio, will take a look.

Hello from a fellow sufferer from EPICS.  (If you see a paper on some
synchrotron-related journal in a few months that mentions "automation
of automation", it will be from me, albeit not using a pseudonym.
Another shameless plug: <https://github.com/CasperVector/ADXspress3>.)

As has been said by Laurent, in the presence of a supervision system
with reliable logging and proper rotation, what `procServ' mainly does
can be done better by something like `socat' which wraps something like
`recordio', which in turn wraps the actual service process (EPICS IOC).
The devil is in the details: most importantly, when the service is to
be stopped, the ideal situation is that the actual service process gets
killed, leading to the graceful exit of `recordio' and then `socat'.

So the two wrapping programs need to propagate the killing signal, and
then exit after waiting for the subprocess; since `procServ' defaults
to kill the subprocess using SIGKILL, `recordio' also needs to translate
the signal if this is to be emulated.  `socat' does this correctly when
the `sighup'/`sigint'/`sigquit' options are given for `exec' addresses,
but its manual page does not state about SIGTERM.  `recordio' does not
seem to propagate (let alone translate) the signal; additionally, its
output format (which is after all mainly used for debugging) feels too
low-level to me, and perhaps needs to be adjusted.

At the facility where I am from, we use CentOS 7 and unsupervised
procServ (triple shame for a systemd opponent, s6 enthusiast and
minimalist :(), because we have not yet been bitten by log rotation
problems.  It also takes quite an amount of code to implement the
dynamic management of user supervision trees for IOCs, in addition
to the adjustments needed for `recordio'.  To make the situation even
worse, we are also using procServControl; anyway, I still hope we can
get rid of procServ entirely someday.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: logging services with shell interaction
  2021-10-20 18:01     ` Casper Ti. Vector
@ 2021-10-23 15:48       ` Ben Franksen
  2021-10-23 16:40         ` Casper Ti. Vector
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Franksen @ 2021-10-23 15:48 UTC (permalink / raw)
  To: supervision

Hi Casper

Am 20.10.21 um 20:01 schrieb Casper Ti. Vector:
> On Wed, Oct 20, 2021 at 09:53:58AM +0200, Ben Franksen wrote:
>> Interesting, I didn't know about recordio, will take a look.
> 
> Hello from a fellow sufferer from EPICS.  (If you see a paper on some
> synchrotron-related journal in a few months that mentions "automation
> of automation", it will be from me, albeit not using a pseudonym.
> Another shameless plug: <https://github.com/CasperVector/ADXspress3>.)

Interesting, I didn't know you are from the accelerator community!

> As has been said by Laurent, in the presence of a supervision system
> with reliable logging and proper rotation, what `procServ' mainly does
> can be done better by something like `socat' which wraps something like
> `recordio', which in turn wraps the actual service process (EPICS IOC).

Yeah, that's what I was thinking, too.

> The devil is in the details: most importantly, when the service is to
> be stopped, the ideal situation is that the actual service process gets
> killed, leading to the graceful exit of `recordio' and then `socat'.
> 
> So the two wrapping programs need to propagate the killing signal, and
> then exit after waiting for the subprocess; since `procServ' defaults
> to kill the subprocess using SIGKILL, `recordio' also needs to translate
> the signal if this is to be emulated.  `socat' does this correctly when
> the `sighup'/`sigint'/`sigquit' options are given for `exec' addresses,
> but its manual page does not state about SIGTERM.  `recordio' does not
> seem to propagate (let alone translate) the signal; additionally, its
> output format (which is after all mainly used for debugging) feels too
> low-level to me, and perhaps needs to be adjusted.

I agree. BTW, another detail is the special handling of certain control 
characters by procServ: ^X to restart the child, ^T to toggle 
auto-restart, and the possibility to disable some others like ^C and 
especially ^D; which is not only convenient but also avoids accidental 
restarts (people are used to ^D meaning "exit the shell").

> At the facility where I am from, we use CentOS 7 and unsupervised
> procServ (triple shame for a systemd opponent, s6 enthusiast and
> minimalist :(), because we have not yet been bitten by log rotation
> problems.  It also takes quite an amount of code to implement the
> dynamic management of user supervision trees for IOCs, in addition
> to the adjustments needed for `recordio'.  To make the situation even
> worse, we are also using procServControl; anyway, I still hope we can
> get rid of procServ entirely someday.

Our approach uses a somewhat hybrid mixture of several components. Since 
the OS is Debian we use systemd service units, one for each IOC. They 
are executing `/usr/bin/unshare -u sethostname %i runuser -u ioc -- 
softIOC-run %i` which fakes the host name to trick EPICS' Channel Access 
"Security" into the proper behavior, and then drops privileges. 
softIOC-run is the script of which I posted a simplified version, with 
the pipeline between procServ and multilog. Despite the disadvantages 
explained by Laurent, so far this works pretty well (I have never yet 
observed multilog to crash or otherwise misbehave). Finally, the 
configuration for all IOCs (name, which host do they run on, path to the 
startup script) all reside in a small database and there are scripts to 
automatically install everything, including automatic enabling and 
disabling of the service units.

When I started developing this scheme I thought that systemd was a great 
leap forward from /etc/init.d scripts. I still think so, but I quickly 
became frustrated with its monolithic approach. Despite 1000s of 
configuration options, it always seemed like the one I needed was 
missing. I spend days and days debugging service units that should have 
worked according to the docs but did not, for reasons I wasn't always 
able to figure out. Nowadays my standing assumption about systemd is 
that nothing you didn't thoroughly test should be expected to work, 
regardless of what the docs claim.

In contrast, I found that small specialized tools that use the 
chain-loading technique to modify a particular aspect of a program much 
more reliably produce exactly the desired effect and nothing more. The 
fine-grained control this gives you over the order of these effects 
(like, first fake the host name, then drop privileges) is something that 
a monolith with an unstructured flat configuration language cannot give 
you. The syntactic simplicity of systemd's configuration language is 
certainly appealing, especially for non-programmers, but this easily 
lets you forget the extreme complexity of its semantics. I cannot help 
but see the machine executing it as an idiosyncratic monster with lots 
of poorly handled corner cases.

I would like to experiment with alternatives like s6/s6-rc but that 
means using one of the small distros that support it and I am sure such 
a proposal would not be well received.

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: logging services with shell interaction
  2021-10-23 15:48       ` Ben Franksen
@ 2021-10-23 16:40         ` Casper Ti. Vector
  2021-10-24 20:36           ` Ben Franksen
  0 siblings, 1 reply; 7+ messages in thread
From: Casper Ti. Vector @ 2021-10-23 16:40 UTC (permalink / raw)
  To: supervision

On Sat, Oct 23, 2021 at 05:48:23PM +0200, Ben Franksen wrote:
> Interesting, I didn't know you are from the accelerator community!

(Actually I have only been in this field for 2.5 years...)

> I agree. BTW, another detail is the special handling of certain control
> characters by procServ: ^X to restart the child, ^T to toggle auto-restart,
> and the possibility to disable some others like ^C and especially ^D; which
> is not only convenient but also avoids accidental restarts (people are used
> to ^D meaning "exit the shell").

These functionalities would need to be (and would perhaps have better
been) done outside of the `socat'/`recordio' pair, as separate commands
(like `s6-svc -k ...' or `touch .../down') or wrappers.  `socat' simply
exits upon ^D/^C by default, so the IOC would not be hurt; I find this
enough to prevent most user errors, therefore more filtering of control
characters seems unnecessary.

> Our approach uses a somewhat hybrid mixture of several components. Since the
> OS is Debian we use systemd service units, one for each IOC. They are
> executing `/usr/bin/unshare -u sethostname %i runuser -u ioc -- softIOC-run
> %i` which fakes the host name to trick EPICS' Channel Access "Security" into
> the proper behavior, and then drops privileges. softIOC-run is the script of
> which I posted a simplified version, with the pipeline between procServ and
> multilog. Despite the disadvantages explained by Laurent, so far this works
> pretty well (I have never yet observed multilog to crash or otherwise
> misbehave). Finally, the configuration for all IOCs (name, which host do
> they run on, path to the startup script) all reside in a small database and
> there are scripts to automatically install everything, including automatic
> enabling and disabling of the service units.

Frankly I find the above a little over-complicated, even discounting the
part about CA security which we do not yet involve.  I think you might
be going to find our paper (after publication; it is to be submitted the
next week) interesting in simplifying IOC management.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: logging services with shell interaction
  2021-10-23 16:40         ` Casper Ti. Vector
@ 2021-10-24 20:36           ` Ben Franksen
  0 siblings, 0 replies; 7+ messages in thread
From: Ben Franksen @ 2021-10-24 20:36 UTC (permalink / raw)
  To: supervision

Am 23.10.21 um 18:40 schrieb Casper Ti. Vector:
> On Sat, Oct 23, 2021 at 05:48:23PM +0200, Ben Franksen wrote:
>> I agree. BTW, another detail is the special handling of certain control
>> characters by procServ: ^X to restart the child, ^T to toggle auto-restart,
>> and the possibility to disable some others like ^C and especially ^D; which
>> is not only convenient but also avoids accidental restarts (people are used
>> to ^D meaning "exit the shell").
> 
> These functionalities would need to be (and would perhaps have better
> been) done outside of the `socat'/`recordio' pair, as separate commands
> (like `s6-svc -k ...' or `touch .../down') or wrappers.  `socat' simply
> exits upon ^D/^C by default, so the IOC would not be hurt; I find this
> enough to prevent most user errors, therefore more filtering of control
> characters seems unnecessary.

Sure, there may be other solutions, it's just another one of those 
details that need to be taken care of somehow.

>> Our approach uses a somewhat hybrid mixture of several components. Since the
>> OS is Debian we use systemd service units, one for each IOC. They are
>> executing `/usr/bin/unshare -u sethostname %i runuser -u ioc -- softIOC-run
>> %i` which fakes the host name to trick EPICS' Channel Access "Security" into
>> the proper behavior, and then drops privileges. softIOC-run is the script of
>> which I posted a simplified version, with the pipeline between procServ and
>> multilog. Despite the disadvantages explained by Laurent, so far this works
>> pretty well (I have never yet observed multilog to crash or otherwise
>> misbehave). Finally, the configuration for all IOCs (name, which host do
>> they run on, path to the startup script) all reside in a small database and
>> there are scripts to automatically install everything, including automatic
>> enabling and disabling of the service units.
> 
> Frankly I find the above a little over-complicated, even discounting the
> part about CA security which we do not yet involve.  I think you might
> be going to find our paper (after publication; it is to be submitted the
> next week) interesting in simplifying IOC management.

I am looking forward to it. You may want to post a link when it's done, 
here or on the EPICS mailing list.

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-24 20:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19  8:59 logging services with shell interaction Ben Franksen
2021-10-19 23:27 ` Laurent Bercot
2021-10-20  7:53   ` Ben Franksen
2021-10-20 18:01     ` Casper Ti. Vector
2021-10-23 15:48       ` Ben Franksen
2021-10-23 16:40         ` Casper Ti. Vector
2021-10-24 20:36           ` Ben Franksen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).