supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* A program that can get exactly the log of a supervised process?
@ 2021-10-23 16:03 Casper Ti. Vector
  0 siblings, 0 replies; 7+ messages in thread
From: Casper Ti. Vector @ 2021-10-23 16:03 UTC (permalink / raw)
  To: supervision

Many daemons output some lines in their logs (here assumed to have been
redirected to logging pipes in some supervision framework) that can be
used as a kind of markers for readiness.  So if we can implement the
following program, we will able to trivially add readiness support to
these daemons without patching their source code, while reducing the
daemon-specific code (the pattern recogniser) to a minimum:

* The program is double-forked from the ./run script of a service, and
  also watches for the death of the daemon: in case the daemon dies
  before logging the readiness marker (and optionally, if a specified
  timeout is reached), this program dies as well, killing its subprocess
  (the pattern recogniser, discussed below).  To maximise portability
  across supervision frameworks, the death notification can be
  implemented with a pipe that can be written to by an ancillary
  program ran from the ./finish script.

* The program somehow (directly or indirectly) gets a copy of the
  log from the reading side of the logging pipe, and feeds it into a
  specified pattern recogniser, which should exit 0 upon finding the
  readiness marker.  A special exit value of the subprocess can be
  considered a notification for fatal error erroneously ignored by the
  daemon, leading to the program exiting some special value which can
  be handled by a `s6-svc -t' invocation.  In order to avoid potential
  feedback loops, the subprocess should be killed (and waited for)
  before the program outputs any warning; the subprocess should be
  responsible for keeping itself quiet when running normally.

Any idea on how the log "teeing" may be done cleanly (and portably
if possible; something akin to `tail -f' seems unsuitable because of
potential log rotation), and perhaps any flaw or redundancy in the
design above?

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A program that can get exactly the log of a supervised process?
  2021-10-25 12:37     ` Laurent Bercot
  2021-10-25 12:57       ` Rio Liu
@ 2021-10-25 16:00       ` Casper Ti. Vector
  1 sibling, 0 replies; 7+ messages in thread
From: Casper Ti. Vector @ 2021-10-25 16:00 UTC (permalink / raw)
  To: supervision

On Mon, Oct 25, 2021 at 12:37:41PM +0000, Laurent Bercot wrote:
> Another question is how to piggyback loggrep into the notification
> mechanism: if loggrep is tied to the logger and not to the service,
> it doesn't have native access to the notification pipe. That means a
> specific mechanism is needed to give it cross-service access.

As has been sketched in previous mails, make loggrep a double-forking
chainloader, where the grandparent blocks (before closing the
notification fd and then chaining) until the grandchild is ready to
accept logs (after connecting to the logger and spawning the "grep";
the latter perhaps being an awk script in most cases)?

> But it's probably the least bad way to do it, [...]

I fullly agree.  This is the crux of the problem.

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A program that can get exactly the log of a supervised process?
  2021-10-25 12:57       ` Rio Liu
@ 2021-10-25 13:26         ` Laurent Bercot
  0 siblings, 0 replies; 7+ messages in thread
From: Laurent Bercot @ 2021-10-25 13:26 UTC (permalink / raw)
  To: Rio Liu, supervision


>Why not have the grepper listen on the log file directly? You'll need to have a timestamp in the log and know where the log is, but those can be known at the time of writing the service script.

  There's no such thing as "the log file". There's the log backendS,
which can be one or more automatically rotated log files in a directory,
status files, network backends, etc.
  The only way to access the log stream in a reliable and generic way
is to read it on the way to the backends. You have no control over
what happens later on.

--
  Laurent


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A program that can get exactly the log of a supervised process?
  2021-10-25 12:37     ` Laurent Bercot
@ 2021-10-25 12:57       ` Rio Liu
  2021-10-25 13:26         ` Laurent Bercot
  2021-10-25 16:00       ` Casper Ti. Vector
  1 sibling, 1 reply; 7+ messages in thread
From: Rio Liu @ 2021-10-25 12:57 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

Why not have the grepper listen on the log file directly? You'll need to have a timestamp in the log and know where the log is, but those can be known at the time of writing the service script.

--------原始郵件--------
在 2021年10月25日 08:37, Laurent Bercot 寫道:

>>Well, I do realise the lifespan issue of the loggrep program, which is
>>why I asked the question in the first place. But I really never thought
>>of directly inserting loggrep into the logging chain as a new node;
>>instead, what I have thought is making loggrep a program "attachable" to
>>the logger. That is, the logger is extended with a listener which at
>>least one client can connect to, and which upon connection tees the log
>>to the client. I do not know whether you have similar ideas.
>
> Well in theory you could have something like skabus-dyntee
> ( https://skarnet.org/software/skabus/skabus-dyntee.html ) as your
> logger, and have your "real" logger run as a skabus-dyntee client.
> Then you could add a loggrep as a second skabus-dyntee client, and it
> would just disappear when it has finished its work.
>
> It would be a little unreliable as is, because skabus-dyntee doesn't
> care if it has no client at all, so if the real logger dies, it won't
> block the log stream until the real logger has restarted, so you may
> lose logs. But apart from that, it would work.
>
> A really reliable solution would be having a skabus-dyntee equivalent
> that has one permanent output and blocks as long as that output isn't
> there. As you say, it would be a logger extended with a listener.
>
> Another question is how to piggyback loggrep into the notification
> mechanism: if loggrep is tied to the logger and not to the service,
> it doesn't have native access to the notification pipe. That means a
> specific mechanism is needed to give it cross-service access.
>
> That's definitely a lot of code and a lot of architectural
> convolutions to accommodate what is ultimately a daemon misdesign.
> But it's probably the least bad way to do it, so I might think about it
> more and add something like that to s6 in the distant future.
>
> --
> Laurent

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A program that can get exactly the log of a supervised process?
       [not found]   ` <YXUJKWltk+OwYhtf@caspervector>
@ 2021-10-25 12:37     ` Laurent Bercot
  2021-10-25 12:57       ` Rio Liu
  2021-10-25 16:00       ` Casper Ti. Vector
  0 siblings, 2 replies; 7+ messages in thread
From: Laurent Bercot @ 2021-10-25 12:37 UTC (permalink / raw)
  To: Casper Ti. Vector, supervision

>Well, I do realise the lifespan issue of the loggrep program, which is
>why I asked the question in the first place.  But I really never thought
>of directly inserting loggrep into the logging chain as a new node;
>instead, what I have thought is making loggrep a program "attachable" to
>the logger.  That is, the logger is extended with a listener which at
>least one client can connect to, and which upon connection tees the log
>to the client.  I do not know whether you have similar ideas.

  Well in theory you could have something like skabus-dyntee
( https://skarnet.org/software/skabus/skabus-dyntee.html ) as your
logger, and have your "real" logger run as a skabus-dyntee client.
Then you could add a loggrep as a second skabus-dyntee client, and it
would just disappear when it has finished its work.

  It would be a little unreliable as is, because skabus-dyntee doesn't
care if it has no client at all, so if the real logger dies, it won't
block the log stream until the real logger has restarted, so you may
lose logs. But apart from that, it would work.

  A really reliable solution would be having a skabus-dyntee equivalent
that has one permanent output and blocks as long as that output isn't
there. As you say, it would be a logger extended with a listener.

  Another question is how to piggyback loggrep into the notification
mechanism: if loggrep is tied to the logger and not to the service,
it doesn't have native access to the notification pipe. That means a
specific mechanism is needed to give it cross-service access.

  That's definitely a lot of code and a lot of architectural
convolutions to accommodate what is ultimately a daemon misdesign.
But it's probably the least bad way to do it, so I might think about it
more and add something like that to s6 in the distant future.

--
  Laurent


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A program that can get exactly the log of a supervised process?
  2021-10-24  6:20 ` Laurent Bercot
@ 2021-10-24  7:20   ` Casper Ti. Vector
       [not found]   ` <YXUJKWltk+OwYhtf@caspervector>
  1 sibling, 0 replies; 7+ messages in thread
From: Casper Ti. Vector @ 2021-10-24  7:20 UTC (permalink / raw)
  To: supervision

On Sun, Oct 24, 2021 at 06:20:53AM +0000, Laurent Bercot wrote:
>  So basically, either loggrep is a simple tee-like program but you
> weaken the supervision properties of the service, or the functionality
> needs to be embedded in the supervision architecture, with loggrep
> being a consumer for service and a producer for logger (which is
> easy with s6-rc but not so much with pure s6) and loggrep always
> watching the state of service (which is not so easy with s6-rc, where
> you don't know the full path to another service directory).

Well, I do realise the lifespan issue of the loggrep program, which is
why I asked the question in the first place.  But I really never thought
of directly inserting loggrep into the logging chain as a new node;
instead, what I have thought is making loggrep a program "attachable" to
the logger.  That is, the logger is extended with a listener which at
least one client can connect to, and which upon connection tees the log
to the client.  I do not know whether you have similar ideas.

Additionally, I have just noticed a race condition in the origin design
in my question.  If loggrep is double-forked by an external program
(execline or so), the daemon may become ready before loggrep is ready to
receive logs; a better way is to let loggrep do the double fork itself,
after the teeing pipe is ready.  The downside is that errors deemed
tolerable by the daemon but fatal by the user would then need to be
explicitly handled by loggrep; but this is probably already over-design,
so perhaps it just does not matter...

-- 
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2022.09.20)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A program that can get exactly the log of a supervised process?
       [not found] <YXQyNQbpPZBKYLXC@caspervector>
@ 2021-10-24  6:20 ` Laurent Bercot
  2021-10-24  7:20   ` Casper Ti. Vector
       [not found]   ` <YXUJKWltk+OwYhtf@caspervector>
  0 siblings, 2 replies; 7+ messages in thread
From: Laurent Bercot @ 2021-10-24  6:20 UTC (permalink / raw)
  To: supervision

>Any idea on how the log "teeing" may be done cleanly (and portably
>if possible; something akin to `tail -f' seems unsuitable because of
>potential log rotation), and perhaps any flaw or redundancy in the
>design above?

  The obstacle I have always bumped against when trying to do similar
things is that the teeing program always has to remain there, even
after it has done its job (it has read the readiness line and informed
the supervisor). And so, instead of "service | logger", your data
flow permanently becomes "service | loggrep | logger" before
readiness, and "service | cat | logger" after readiness (the best
loggrep can do is exec into cat, or reimplement its functionality,
once it has read its readiness line).

  That wouldn't be a huge performance problem, especially if "cat" can
do zero-copy data, but it is definitely a reliability problem:
  - loggrep must die when the service dies, so a new loggrep can be run
when the service runs again. So loggrep cannot be run as a separate
supervised service in the same service pipeline. (If loggrep were to
restart independently from service, it would need to check whether
service is ready, and run as 'cat' if it is. This is doable, but more
complex.)
  - That means that either the pipe between service and loggrep cannot
be held, or loggrep must have an additional notification that service
died. This is, again, doable, but more complex.
  - If loggrep isn't supervised, and the pipe isn't being held, then
killing loggrep will incur a broken pipe, which means a service restart
with a lost line of logs, which supervision aims to avoid.

  So basically, either loggrep is a simple tee-like program but you
weaken the supervision properties of the service, or the functionality
needs to be embedded in the supervision architecture, with loggrep
being a consumer for service and a producer for logger (which is
easy with s6-rc but not so much with pure s6) and loggrep always
watching the state of service (which is not so easy with s6-rc, where
you don't know the full path to another service directory).

  In short: practical issues. It's impossible to do that in a clean,
satisfying way.

  And it entirely makes sense that it's so difficult, because the very
idea is to use the *data flow* to inform the *control flow*, and that
is inherently dangerous and not planned for in supervision 
architectures.
Making your control flow depend on your data flow is not a good pattern
at all, and I really wish daemons would stop doing that.

  I'm sorry I don't have a better answer.

--
  Laurent


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-25 16:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-23 16:03 A program that can get exactly the log of a supervised process? Casper Ti. Vector
     [not found] <YXQyNQbpPZBKYLXC@caspervector>
2021-10-24  6:20 ` Laurent Bercot
2021-10-24  7:20   ` Casper Ti. Vector
     [not found]   ` <YXUJKWltk+OwYhtf@caspervector>
2021-10-25 12:37     ` Laurent Bercot
2021-10-25 12:57       ` Rio Liu
2021-10-25 13:26         ` Laurent Bercot
2021-10-25 16:00       ` Casper Ti. Vector

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).