supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* logging services with shell interaction
@ 2021-10-19  8:59 Ben Franksen
  2021-10-19 23:27 ` Laurent Bercot
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Franksen @ 2021-10-19  8:59 UTC (permalink / raw)
  To: supervision

Hi Everyone

we have a fair number of services which allow (and occasionally require) 
user interaction via a (built-in) shell. All the shell interaction is 
supposed to be logged, in addition to all the messages that are issued 
spontaneously by the process. So we cannot directly use a logger 
attached to the stdout/stderr of the process.

procServ is a process supervisor adapted to such situations. It allows 
an external process (conserver in our case) to attach to the service's 
shell via a TCP or UNIX domain socket. procServ supports logging 
everything it sees (input and output) to a file or stdout.

In the past we had recurring problems with processes that spew out an 
extreme amount of messages, quickly filling up our local disks. Since 
logrotate runs via cron it is not possible to reliably guarantee that 
this doesn't happen. Thus, inspired by process supervision suites a la 
daemontools, we are now using a small shell wrapper script that pipes 
the output of the process into the multilog tool from the daemontools 
package.

Here is the script, slightly simplified. Most of the parameters are 
passed via environment.

```
IOC=$1

/usr/bin/procServ -f -L- --logstamp --timefmt="$TIMEFMT" \
  -q -n %i --ignore=^D^C^] -P "unix:$RUNDIR/$IOC" -c "$BOOTDIR" "./$STCMD" \
  | /usr/bin/multilog "s$LOGSIZE" "n$LOGNUM" "$LOGDIR/$IOC"
```

So far this seems to do the job, but I have two questions:

1. Is there anything "bad" about this approach? Most supervision tools 
have this sort of thing as a built-in feature and I suspect there may be 
a reason for that other than mere convenience.

2. Do any of the existing process supervision tools support what 
procServ gives us wrt interactive shell access from outside?

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-19  8:59 logging services with shell interaction Ben Franksen
@ 2021-10-19 23:27 ` Laurent Bercot
  2021-10-20  7:53   ` Ben Franksen
  0 siblings, 1 reply; 11+ messages in thread
From: Laurent Bercot @ 2021-10-19 23:27 UTC (permalink / raw)
  To: Ben Franksen, supervision

>we have a fair number of services which allow (and occasionally require) user interaction via a (built-in) shell. All the shell interaction is supposed to be logged, in addition to all the messages that are issued spontaneously by the process. So we cannot directly use a logger attached to the stdout/stderr of the process.

  I don't understand the consequence relationship here.

  - If you control your services / builtin shells, the services could
have an option to log the IO of their shells to stderr, as well as
their own messages.
  - Even if you cannot make the services log the shell IO, you can add
a small data dumper in front of the service's shell, which transmits
full-duplex everything it gets but also writes it to its own stdout or
stderr; if that stdout/err is the same pipe as the stdout/err of your
service, then all the IO from the shell will be logged to the same place
(and log lines won't be mixed unless they're more than PIPE_BUF bytes
long, which shouldn't happen in practice). So with that solution you
could definitely make your services log to multilog.


>procServ is a process supervisor adapted to such situations. It allows an external process (conserver in our case) to attach to the service's shell via a TCP or UNIX domain socket. procServ supports logging everything it sees (input and output) to a file or stdout.

  That works too.


>IOC=$1
>
>/usr/bin/procServ -f -L- --logstamp --timefmt="$TIMEFMT" \
>  -q -n %i --ignore=^D^C^] -P "unix:$RUNDIR/$IOC" -c "$BOOTDIR" "./$STCMD" \
>  | /usr/bin/multilog "s$LOGSIZE" "n$LOGNUM" "$LOGDIR/$IOC"
>```
>
>So far this seems to do the job, but I have two questions:
>
>1. Is there anything "bad" about this approach? Most supervision tools have this sort of thing as a built-in feature and I suspect there may be a reason for that other than mere convenience.

  It's not *bad*, it's just not as airtight as supervision suites make
it. The reasons why it's a built-in feature in 
daemontools/runit/s6/others
are:
  - it allows the logger process to be supervised as well
  - it maintains open the pipe to the logger, so service and logger can
be restarted independently at will, without risk of losing logs.

  As is, you can't send signals to multilog (useful if you want to force
a rotation) without knowing its pid. And if multilog dies, it broken
pipes procServ, and it (and your service) is probably forced to restart,
and you lose the data that it wanted to write.
  A supervision architecture with integrated logging protects from this.


>2. Do any of the existing process supervision tools support what procServ gives us wrt interactive shell access from outside?

  Not that I know of, because that need is pretty specific to your
service architecture.
  However, unless there are more details you have omitted, I still
believe you could obtain the same functionality with a daemontools/etc.
infrastructure and a program recording the IO from/to the shell. Since
you don't seem opposed to using old djb programs, you could probably
even directly reuse "recordio" from ucspi-tcp for this. :)

--
  Laurent


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-19 23:27 ` Laurent Bercot
@ 2021-10-20  7:53   ` Ben Franksen
  2021-10-20 18:01     ` Casper Ti. Vector
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Franksen @ 2021-10-20  7:53 UTC (permalink / raw)
  To: supervision

Am 20.10.21 um 01:27 schrieb Laurent Bercot:
>> we have a fair number of services which allow (and occasionally 
>> require) user interaction via a (built-in) shell. All the shell 
>> interaction is supposed to be logged, in addition to all the messages 
>> that are issued spontaneously by the process. So we cannot directly 
>> use a logger attached to the stdout/stderr of the process.
> 
>   I don't understand the consequence relationship here.
> >   - If you control your services / builtin shells, the services could
> have an option to log the IO of their shells to stderr, as well as
> their own messages.

We do have control over them, theoretically, but adding this 
functionality seems impractical. This is a complex piece of software, 
built from multiple components maintained by different parties. There is 
some sort of common framework for issuing messages but none of the 
components strictly adhere to it. In other words, they use things like 
printf all over the place. The only way I see to reliably get all the IO 
for logging is to delegate this to an external process.

>   - Even if you cannot make the services log the shell IO, you can add
> a small data dumper in front of the service's shell, which transmits
> full-duplex everything it gets but also writes it to its own stdout or
> stderr; if that stdout/err is the same pipe as the stdout/err of your
> service, then all the IO from the shell will be logged to the same place
> (and log lines won't be mixed unless they're more than PIPE_BUF bytes
> long, which shouldn't happen in practice). So with that solution you
> could definitely make your services log to multilog.

Yes, that would be possible. More or less what procServ does minus the 
supervision aspect.

>> IOC=$1
>>
>> /usr/bin/procServ -f -L- --logstamp --timefmt="$TIMEFMT" \
>>  -q -n %i --ignore=^D^C^] -P "unix:$RUNDIR/$IOC" -c "$BOOTDIR" 
>> "./$STCMD" \
>>  | /usr/bin/multilog "s$LOGSIZE" "n$LOGNUM" "$LOGDIR/$IOC"
>> ```
>>
>> So far this seems to do the job, but I have two questions:
>>
>> 1. Is there anything "bad" about this approach? Most supervision tools 
>> have this sort of thing as a built-in feature and I suspect there may 
>> be a reason for that other than mere convenience.
> 
>   It's not *bad*, it's just not as airtight as supervision suites make
> it. The reasons why it's a built-in feature in daemontools/runit/s6/others
> are:
>   - it allows the logger process to be supervised as well
>   - it maintains open the pipe to the logger, so service and logger can
> be restarted independently at will, without risk of losing logs.
> 
>   As is, you can't send signals to multilog (useful if you want to force
> a rotation) without knowing its pid. And if multilog dies, it broken
> pipes procServ, and it (and your service) is probably forced to restart,
> and you lose the data that it wanted to write.
>   A supervision architecture with integrated logging protects from this.

Thanks, this answers my question perfectly.

>> 2. Do any of the existing process supervision tools support what 
>> procServ gives us wrt interactive shell access from outside?
> 
>   Not that I know of, because that need is pretty specific to your
> service architecture.

It sure is.

>   However, unless there are more details you have omitted, I still
> believe you could obtain the same functionality with a daemontools/etc.
> infrastructure and a program recording the IO from/to the shell. Since
> you don't seem opposed to using old djb programs, you could probably
> even directly reuse "recordio" from ucspi-tcp for this. :)

Interesting, I didn't know about recordio, will take a look.

Again, thanks a lot for the detailed response!

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-20  7:53   ` Ben Franksen
@ 2021-10-20 18:01     ` Casper Ti. Vector
  2021-10-23 15:48       ` Ben Franksen
  2023-06-22 17:16       ` Casper Ti. Vector
  0 siblings, 2 replies; 11+ messages in thread
From: Casper Ti. Vector @ 2021-10-20 18:01 UTC (permalink / raw)
  To: supervision

On Wed, Oct 20, 2021 at 09:53:58AM +0200, Ben Franksen wrote:
> Interesting, I didn't know about recordio, will take a look.

Hello from a fellow sufferer from EPICS.  (If you see a paper on some
synchrotron-related journal in a few months that mentions "automation
of automation", it will be from me, albeit not using a pseudonym.
Another shameless plug: <https://github.com/CasperVector/ADXspress3>.)

As has been said by Laurent, in the presence of a supervision system
with reliable logging and proper rotation, what `procServ' mainly does
can be done better by something like `socat' which wraps something like
`recordio', which in turn wraps the actual service process (EPICS IOC).
The devil is in the details: most importantly, when the service is to
be stopped, the ideal situation is that the actual service process gets
killed, leading to the graceful exit of `recordio' and then `socat'.

So the two wrapping programs need to propagate the killing signal, and
then exit after waiting for the subprocess; since `procServ' defaults
to kill the subprocess using SIGKILL, `recordio' also needs to translate
the signal if this is to be emulated.  `socat' does this correctly when
the `sighup'/`sigint'/`sigquit' options are given for `exec' addresses,
but its manual page does not state about SIGTERM.  `recordio' does not
seem to propagate (let alone translate) the signal; additionally, its
output format (which is after all mainly used for debugging) feels too
low-level to me, and perhaps needs to be adjusted.

At the facility where I am from, we use CentOS 7 and unsupervised
procServ (triple shame for a systemd opponent, s6 enthusiast and
minimalist :(), because we have not yet been bitten by log rotation
problems.  It also takes quite an amount of code to implement the
dynamic management of user supervision trees for IOCs, in addition
to the adjustments needed for `recordio'.  To make the situation even
worse, we are also using procServControl; anyway, I still hope we can
get rid of procServ entirely someday.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-20 18:01     ` Casper Ti. Vector
@ 2021-10-23 15:48       ` Ben Franksen
  2021-10-23 16:40         ` Casper Ti. Vector
  2023-06-22 17:16       ` Casper Ti. Vector
  1 sibling, 1 reply; 11+ messages in thread
From: Ben Franksen @ 2021-10-23 15:48 UTC (permalink / raw)
  To: supervision

Hi Casper

Am 20.10.21 um 20:01 schrieb Casper Ti. Vector:
> On Wed, Oct 20, 2021 at 09:53:58AM +0200, Ben Franksen wrote:
>> Interesting, I didn't know about recordio, will take a look.
> 
> Hello from a fellow sufferer from EPICS.  (If you see a paper on some
> synchrotron-related journal in a few months that mentions "automation
> of automation", it will be from me, albeit not using a pseudonym.
> Another shameless plug: <https://github.com/CasperVector/ADXspress3>.)

Interesting, I didn't know you are from the accelerator community!

> As has been said by Laurent, in the presence of a supervision system
> with reliable logging and proper rotation, what `procServ' mainly does
> can be done better by something like `socat' which wraps something like
> `recordio', which in turn wraps the actual service process (EPICS IOC).

Yeah, that's what I was thinking, too.

> The devil is in the details: most importantly, when the service is to
> be stopped, the ideal situation is that the actual service process gets
> killed, leading to the graceful exit of `recordio' and then `socat'.
> 
> So the two wrapping programs need to propagate the killing signal, and
> then exit after waiting for the subprocess; since `procServ' defaults
> to kill the subprocess using SIGKILL, `recordio' also needs to translate
> the signal if this is to be emulated.  `socat' does this correctly when
> the `sighup'/`sigint'/`sigquit' options are given for `exec' addresses,
> but its manual page does not state about SIGTERM.  `recordio' does not
> seem to propagate (let alone translate) the signal; additionally, its
> output format (which is after all mainly used for debugging) feels too
> low-level to me, and perhaps needs to be adjusted.

I agree. BTW, another detail is the special handling of certain control 
characters by procServ: ^X to restart the child, ^T to toggle 
auto-restart, and the possibility to disable some others like ^C and 
especially ^D; which is not only convenient but also avoids accidental 
restarts (people are used to ^D meaning "exit the shell").

> At the facility where I am from, we use CentOS 7 and unsupervised
> procServ (triple shame for a systemd opponent, s6 enthusiast and
> minimalist :(), because we have not yet been bitten by log rotation
> problems.  It also takes quite an amount of code to implement the
> dynamic management of user supervision trees for IOCs, in addition
> to the adjustments needed for `recordio'.  To make the situation even
> worse, we are also using procServControl; anyway, I still hope we can
> get rid of procServ entirely someday.

Our approach uses a somewhat hybrid mixture of several components. Since 
the OS is Debian we use systemd service units, one for each IOC. They 
are executing `/usr/bin/unshare -u sethostname %i runuser -u ioc -- 
softIOC-run %i` which fakes the host name to trick EPICS' Channel Access 
"Security" into the proper behavior, and then drops privileges. 
softIOC-run is the script of which I posted a simplified version, with 
the pipeline between procServ and multilog. Despite the disadvantages 
explained by Laurent, so far this works pretty well (I have never yet 
observed multilog to crash or otherwise misbehave). Finally, the 
configuration for all IOCs (name, which host do they run on, path to the 
startup script) all reside in a small database and there are scripts to 
automatically install everything, including automatic enabling and 
disabling of the service units.

When I started developing this scheme I thought that systemd was a great 
leap forward from /etc/init.d scripts. I still think so, but I quickly 
became frustrated with its monolithic approach. Despite 1000s of 
configuration options, it always seemed like the one I needed was 
missing. I spend days and days debugging service units that should have 
worked according to the docs but did not, for reasons I wasn't always 
able to figure out. Nowadays my standing assumption about systemd is 
that nothing you didn't thoroughly test should be expected to work, 
regardless of what the docs claim.

In contrast, I found that small specialized tools that use the 
chain-loading technique to modify a particular aspect of a program much 
more reliably produce exactly the desired effect and nothing more. The 
fine-grained control this gives you over the order of these effects 
(like, first fake the host name, then drop privileges) is something that 
a monolith with an unstructured flat configuration language cannot give 
you. The syntactic simplicity of systemd's configuration language is 
certainly appealing, especially for non-programmers, but this easily 
lets you forget the extreme complexity of its semantics. I cannot help 
but see the machine executing it as an idiosyncratic monster with lots 
of poorly handled corner cases.

I would like to experiment with alternatives like s6/s6-rc but that 
means using one of the small distros that support it and I am sure such 
a proposal would not be well received.

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-23 15:48       ` Ben Franksen
@ 2021-10-23 16:40         ` Casper Ti. Vector
  2021-10-24 20:36           ` Ben Franksen
  0 siblings, 1 reply; 11+ messages in thread
From: Casper Ti. Vector @ 2021-10-23 16:40 UTC (permalink / raw)
  To: supervision

On Sat, Oct 23, 2021 at 05:48:23PM +0200, Ben Franksen wrote:
> Interesting, I didn't know you are from the accelerator community!

(Actually I have only been in this field for 2.5 years...)

> I agree. BTW, another detail is the special handling of certain control
> characters by procServ: ^X to restart the child, ^T to toggle auto-restart,
> and the possibility to disable some others like ^C and especially ^D; which
> is not only convenient but also avoids accidental restarts (people are used
> to ^D meaning "exit the shell").

These functionalities would need to be (and would perhaps have better
been) done outside of the `socat'/`recordio' pair, as separate commands
(like `s6-svc -k ...' or `touch .../down') or wrappers.  `socat' simply
exits upon ^D/^C by default, so the IOC would not be hurt; I find this
enough to prevent most user errors, therefore more filtering of control
characters seems unnecessary.

> Our approach uses a somewhat hybrid mixture of several components. Since the
> OS is Debian we use systemd service units, one for each IOC. They are
> executing `/usr/bin/unshare -u sethostname %i runuser -u ioc -- softIOC-run
> %i` which fakes the host name to trick EPICS' Channel Access "Security" into
> the proper behavior, and then drops privileges. softIOC-run is the script of
> which I posted a simplified version, with the pipeline between procServ and
> multilog. Despite the disadvantages explained by Laurent, so far this works
> pretty well (I have never yet observed multilog to crash or otherwise
> misbehave). Finally, the configuration for all IOCs (name, which host do
> they run on, path to the startup script) all reside in a small database and
> there are scripts to automatically install everything, including automatic
> enabling and disabling of the service units.

Frankly I find the above a little over-complicated, even discounting the
part about CA security which we do not yet involve.  I think you might
be going to find our paper (after publication; it is to be submitted the
next week) interesting in simplifying IOC management.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-23 16:40         ` Casper Ti. Vector
@ 2021-10-24 20:36           ` Ben Franksen
  2022-04-22 10:40             ` Casper Ti. Vector
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Franksen @ 2021-10-24 20:36 UTC (permalink / raw)
  To: supervision

Am 23.10.21 um 18:40 schrieb Casper Ti. Vector:
> On Sat, Oct 23, 2021 at 05:48:23PM +0200, Ben Franksen wrote:
>> I agree. BTW, another detail is the special handling of certain control
>> characters by procServ: ^X to restart the child, ^T to toggle auto-restart,
>> and the possibility to disable some others like ^C and especially ^D; which
>> is not only convenient but also avoids accidental restarts (people are used
>> to ^D meaning "exit the shell").
> 
> These functionalities would need to be (and would perhaps have better
> been) done outside of the `socat'/`recordio' pair, as separate commands
> (like `s6-svc -k ...' or `touch .../down') or wrappers.  `socat' simply
> exits upon ^D/^C by default, so the IOC would not be hurt; I find this
> enough to prevent most user errors, therefore more filtering of control
> characters seems unnecessary.

Sure, there may be other solutions, it's just another one of those 
details that need to be taken care of somehow.

>> Our approach uses a somewhat hybrid mixture of several components. Since the
>> OS is Debian we use systemd service units, one for each IOC. They are
>> executing `/usr/bin/unshare -u sethostname %i runuser -u ioc -- softIOC-run
>> %i` which fakes the host name to trick EPICS' Channel Access "Security" into
>> the proper behavior, and then drops privileges. softIOC-run is the script of
>> which I posted a simplified version, with the pipeline between procServ and
>> multilog. Despite the disadvantages explained by Laurent, so far this works
>> pretty well (I have never yet observed multilog to crash or otherwise
>> misbehave). Finally, the configuration for all IOCs (name, which host do
>> they run on, path to the startup script) all reside in a small database and
>> there are scripts to automatically install everything, including automatic
>> enabling and disabling of the service units.
> 
> Frankly I find the above a little over-complicated, even discounting the
> part about CA security which we do not yet involve.  I think you might
> be going to find our paper (after publication; it is to be submitted the
> next week) interesting in simplifying IOC management.

I am looking forward to it. You may want to post a link when it's done, 
here or on the EPICS mailing list.

Cheers
Ben
-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-24 20:36           ` Ben Franksen
@ 2022-04-22 10:40             ` Casper Ti. Vector
  0 siblings, 0 replies; 11+ messages in thread
From: Casper Ti. Vector @ 2022-04-22 10:40 UTC (permalink / raw)
  To: supervision

On Sun, Oct 24, 2021 at 10:36:06PM +0200, Ben Franksen wrote:
> I am looking forward to it. You may want to post a link when it's
> done, here or on the EPICS mailing list.

The paper has been published on JSR, and is now available at
<https://journals.iucr.org/s/issues/2022/03/00/gy5033/index.html>
(arxiv:2204.08434).  Half of the paper is spent on (forgive my
bluntness) updating certain EPICS-related practices from 1990s to
~2010, which may be quite underwhelming to readers of this mailing
list.  However, I do find one aspect of the paper potentially useful,
and not only so in connection to EPICS: systematic efforts to minimise
the complexity in configuring a perhaps large system composed of often
specialised hardware and software.

Inspired by theories like the Kolmogorov complexity, we can ask: to
what limit can we reduce the amount of code and manual operations in
building a ready system (X-ray beamline, computer cluster, ...) from
commodity hardware, so that the total workload is minimised?  I would
like to note that the idea was not born from vacuum, as similar ideas
can be seen, for instance, from Guix SD's whole-system configuration
mechanism.  My idea actually originated independently from something
like <https://forums.gentoo.org/viewtopic-p-8369250.html#8369250>
(dating back to ~2011), but I guess there must be many other people
who have developed similar ideas.

BTW, as the review process was too slow, the codebase of our packaging
system (see <https://github.com/CasperVector/ihep-pkg-ose> for a fully
open-source edition) has evolved quite a little after submission of the
original manuscript.  One change of perhaps general interest is a small
inheritance system for RPM specs, obviously motivated by its Gentoo
counterpart; I am mildly confident that the repository in its current
state is capable of being a easy-to-use yet maintainable workalike of
the NSLS-II repository.  Another paper by us and published on JSR is
at <https://journals.iucr.org/s/issues/2022/03/00/yn5087/index.html>
(arxiv:2203.17236); readers of this list may find it of some interest
in implementing GUIs based on AutoCAD-like "command injection", and may
find the idea of an EPG immediately familiar after the discussion adove.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2021-10-20 18:01     ` Casper Ti. Vector
  2021-10-23 15:48       ` Ben Franksen
@ 2023-06-22 17:16       ` Casper Ti. Vector
  2023-06-23 11:48         ` Ben Franksen
  1 sibling, 1 reply; 11+ messages in thread
From: Casper Ti. Vector @ 2023-06-22 17:16 UTC (permalink / raw)
  To: supervision

On Thu, Oct 21, 2021 at 02:01:29AM +0800, Casper Ti. Vector wrote:
> As has been said by Laurent, in the presence of a supervision system
> with reliable logging and proper rotation, what `procServ' mainly does
> can be done better by something like `socat' which wraps something like
> `recordio', which in turn wraps the actual service process (EPICS IOC).
> The devil is in the details: most importantly, when the service is to
> be stopped, the ideal situation is that the actual service process gets
> killed, leading to the graceful exit of `recordio' and then `socat'.

It is found that socat does not do I/O fan-in/fan-out with multiple
clients; it also assumes the `exec:'-ed subprocess is constantly present
(i.e. it does not handle IOC restarting).  So I have written a dedicated
program, ipctee (see below for link to source code), that does this.
I have also written a program, iotrap, that after receiving a
terminating signal, first closes the stdin of its children in the hope
that the latter exits cleanly, and after a tunable delay forwards the
signal.  This way IOCs are allowed to really run their clean-up code,
instead of just being killed instantly by the signal.

> So the two wrapping programs need to propagate the killing signal, and
> then exit after waiting for the subprocess; since `procServ' defaults
> to kill the subprocess using SIGKILL, `recordio' also needs to translate
> the signal if this is to be emulated.  `socat' does this correctly when
> the `sighup'/`sigint'/`sigquit' options are given for `exec' addresses,
> but its manual page does not state about SIGTERM.  `recordio' does not
> seem to propagate (let alone translate) the signal; additionally, its
> output format (which is after all mainly used for debugging) feels too
> low-level to me, and perhaps needs to be adjusted.

Closer inspection of recordio revealed that it was designed in a smarter
way: after forking, the parent exec()s into the intended program, and
the children is what actually does the work of I/O forwarding.  This way
recordio (the children) does not need to forward signals.  Based on it,
I have written a program, recordln, that performs more line-oriented
recording: line fragments (without the line terminator) that go through
the same fd consecutively are joined before being copied to stderr.

> At the facility where I am from, we use CentOS 7 and unsupervised
> procServ (triple shame for a systemd opponent, s6 enthusiast and
> minimalist :(), because we have not yet been bitten by log rotation
> problems.  It also takes quite an amount of code to implement the
> dynamic management of user supervision trees for IOCs, in addition
> to the adjustments needed for `recordio'.  To make the situation even
> worse, we are also using procServControl; anyway, I still hope we can
> get rid of procServ entirely someday.

Source code for the programs above are available (licence: CC0) at
<https://cpaste.org/?fa30831511a456b7=#ECwUd1YaVQBLUokynQbRYZq5wvBvXXeXo3bQoeL2rL4L>
These programs can be tested with (in three different terminals):
$ ipctee /tmp/in.sock /tmp/out.sock
$ socat unix-connect:/tmp/in.sock exec:'recordln iotrap /bin/sh',sigint,sigquit
$ socat unix-connect:/tmp/out.sock -
Please feel free to tell me in case you find any defect in the code.
The dynamic management of IOC servicedirs is being developed, and will
be tested internally here before a paper gets submitted somewhere.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2023-06-22 17:16       ` Casper Ti. Vector
@ 2023-06-23 11:48         ` Ben Franksen
  2023-06-23 12:30           ` Casper Ti. Vector
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Franksen @ 2023-06-23 11:48 UTC (permalink / raw)
  To: supervision

Hi Caspar

thanks for the heads-up; this is certainly an interesting project, but 
for me to start playing with it only makes sense if and when it has 
matured to the point where there is a minimum of documentation 
(-h/--help or something like that) and ideally some sort of revision 
control, too. I may be (barely) able to debug such low-level C code if I 
notice it misbehaving but to reverse-engineer what it is supposed to do 
is beyond my abilities.

Cheers
Ben

Am 22.06.23 um 19:16 schrieb Casper Ti. Vector:
> On Thu, Oct 21, 2021 at 02:01:29AM +0800, Casper Ti. Vector wrote:
>> As has been said by Laurent, in the presence of a supervision system
>> with reliable logging and proper rotation, what `procServ' mainly does
>> can be done better by something like `socat' which wraps something like
>> `recordio', which in turn wraps the actual service process (EPICS IOC).
>> The devil is in the details: most importantly, when the service is to
>> be stopped, the ideal situation is that the actual service process gets
>> killed, leading to the graceful exit of `recordio' and then `socat'.
> 
> It is found that socat does not do I/O fan-in/fan-out with multiple
> clients; it also assumes the `exec:'-ed subprocess is constantly present
> (i.e. it does not handle IOC restarting).  So I have written a dedicated
> program, ipctee (see below for link to source code), that does this.
> I have also written a program, iotrap, that after receiving a
> terminating signal, first closes the stdin of its children in the hope
> that the latter exits cleanly, and after a tunable delay forwards the
> signal.  This way IOCs are allowed to really run their clean-up code,
> instead of just being killed instantly by the signal.
> 
>> So the two wrapping programs need to propagate the killing signal, and
>> then exit after waiting for the subprocess; since `procServ' defaults
>> to kill the subprocess using SIGKILL, `recordio' also needs to translate
>> the signal if this is to be emulated.  `socat' does this correctly when
>> the `sighup'/`sigint'/`sigquit' options are given for `exec' addresses,
>> but its manual page does not state about SIGTERM.  `recordio' does not
>> seem to propagate (let alone translate) the signal; additionally, its
>> output format (which is after all mainly used for debugging) feels too
>> low-level to me, and perhaps needs to be adjusted.
> 
> Closer inspection of recordio revealed that it was designed in a smarter
> way: after forking, the parent exec()s into the intended program, and
> the children is what actually does the work of I/O forwarding.  This way
> recordio (the children) does not need to forward signals.  Based on it,
> I have written a program, recordln, that performs more line-oriented
> recording: line fragments (without the line terminator) that go through
> the same fd consecutively are joined before being copied to stderr.
> 
>> At the facility where I am from, we use CentOS 7 and unsupervised
>> procServ (triple shame for a systemd opponent, s6 enthusiast and
>> minimalist :(), because we have not yet been bitten by log rotation
>> problems.  It also takes quite an amount of code to implement the
>> dynamic management of user supervision trees for IOCs, in addition
>> to the adjustments needed for `recordio'.  To make the situation even
>> worse, we are also using procServControl; anyway, I still hope we can
>> get rid of procServ entirely someday.
> 
> Source code for the programs above are available (licence: CC0) at
> <https://cpaste.org/?fa30831511a456b7=#ECwUd1YaVQBLUokynQbRYZq5wvBvXXeXo3bQoeL2rL4L>
> These programs can be tested with (in three different terminals):
> $ ipctee /tmp/in.sock /tmp/out.sock
> $ socat unix-connect:/tmp/in.sock exec:'recordln iotrap /bin/sh',sigint,sigquit
> $ socat unix-connect:/tmp/out.sock -
> Please feel free to tell me in case you find any defect in the code.
> The dynamic management of IOC servicedirs is being developed, and will
> be tested internally here before a paper gets submitted somewhere.
> 

-- 
I would rather have questions that cannot be answered, than answers that
cannot be questioned.  -- Richard Feynman



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: logging services with shell interaction
  2023-06-23 11:48         ` Ben Franksen
@ 2023-06-23 12:30           ` Casper Ti. Vector
  0 siblings, 0 replies; 11+ messages in thread
From: Casper Ti. Vector @ 2023-06-23 12:30 UTC (permalink / raw)
  To: supervision

On Fri, Jun 23, 2023 at 01:48:53PM +0200, Ben Franksen wrote:
> thanks for the heads-up; this is certainly an interesting project, but for
> me to start playing with it only makes sense if and when it has matured to
> the point where there is a minimum of documentation (-h/--help or something
> like that) and ideally some sort of revision control, too. I may be (barely)
> able to debug such low-level C code if I notice it misbehaving but to
> reverse-engineer what it is supposed to do is beyond my abilities.

The source code pasted above are indeed of a preview nature; the more
formal documentation will probably be written in the internal testing
here.  recordln works quite like recordio and the command line usage
are identical; the former is just more line-oriented.  The command line
usage of iotrap and ipctee are printed when `-h' is given; below is a
brief summary of them.

iotrap works like the trap program from execline, but the signals
currently cannot be customised.  When iotrap receives a terminating
signal, the spawned subprocess is sent an EOF; after this, if the
subprocess does not exit before the timeout tunable by the `-t' option,
the previous signal is forwarded.

ipctee listens to a pair of input and output sockets, the first accepts
at most 1 connection and the latter multiple connections.  (The input
socket is for the IOC; the output socket is for "procServ clients".)
For each connected client, the bytes it writes are forwarded to all
other clients connected at the time.

(You may realise ipctee is essentially a "chatting server" for all
connected clients, and it may seem that ipctee can be furtherly
simplified by eliminating the input socket and treating all clients
fully equal.  However, for certain use cases, we may disallow writes
from the output socket; this is why there is a `-r' option for the
"readonly" mode, and why the program is called "ipctee" not "ipcchat".)

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2024.09.30)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-23 12:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19  8:59 logging services with shell interaction Ben Franksen
2021-10-19 23:27 ` Laurent Bercot
2021-10-20  7:53   ` Ben Franksen
2021-10-20 18:01     ` Casper Ti. Vector
2021-10-23 15:48       ` Ben Franksen
2021-10-23 16:40         ` Casper Ti. Vector
2021-10-24 20:36           ` Ben Franksen
2022-04-22 10:40             ` Casper Ti. Vector
2023-06-22 17:16       ` Casper Ti. Vector
2023-06-23 11:48         ` Ben Franksen
2023-06-23 12:30           ` Casper Ti. Vector

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).