supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* s6-rc : Anomalies or normal behaviour
@ 2020-10-03 22:30 Dewayne Geraghty
  2020-10-04  1:58 ` Dewayne Geraghty
  2020-10-04  2:14 ` Laurent Bercot
  0 siblings, 2 replies; 6+ messages in thread
From: Dewayne Geraghty @ 2020-10-03 22:30 UTC (permalink / raw)
  To: supervision

Is this correct behaviour or are these just anomalies?
1. Use of backtick variable assignment on FreeBSD doesn't appear correct
2. Use of emptyenv results in a remnant "defunct" process
3. Should a bundle's contents file include the dependencies of its
contents file, for a down change to the bundle to bring the service's
components down?


1. I expected to see the date in seconds since time epoch, but result is
variable name
# execlineb -Pc 'backtick D { date "+%s" } echo $D'
$D

Note: this isn't how I intend to use backtick, but I try to use the
simplest case to understand how things work

---
2. When I use emptyenv within an execlineb script, I have a "defunct"
zombie process
89685  3  S<       0:00.01   |-- s6-supervise base:time-srv
 3020  -  S<s      0:00.03   | `-- /usr/local/sbin/ntpd -c /etc/ntp.conf
-N -g -u ntpd --nofork
 3601  -  Z<       0:00.00   |   `-- <defunct>

The time server script is
#!/usr/local/bin/execlineb -P
emptyenv
multidefine -d " " "base time ntpd /usr/local/sbin/ntpd" { JAIL SERVICE
USER PROGRAM }
background { echo Starting service $SERVICE using $PROGRAM on $JAIL
under user $USER }
fdmove 2 1
redirfd -w 1 /m/base:time/fifo
$PROGRAM -c /etc/ntp.conf -N -g -u $USER --nofork

removing emptyenv, prevents the zombie from being created.  Is this normal?

---
3. Is it normal/standard/good practice to include a dependency in a
bundle.  For example, I have a "time" bundle whose contents are
time-srv.  time-srv starts the ntpd service, and has as a dependency
time-log.

Using "s6-rc -u change time", everything behaves as documented, ie
starts "time" which starts time-log, then time-srv.  However

# s6-rc -v 9 -d change base:time
s6-rc: info: bringing selected services down
s6-rc: info: processing service base:time-srv: stopping
s6-rc: info: service base:time-srv stopped successfully
# Starting logging service time for base with user s6log folder
/var/log/time

and the time-log continues running.

Admittedly
# s6-svstat /s/scan/base:time-srv ; s6-svstat /s/scan/base:time-log
down (exitcode 0) 6 seconds, ready 6 seconds  # This is time-srv
up (pid 85131) 6 seconds                      # This is time-log,so it
has been restarted

To obtain the desired/expected behaviour and bring time-log down must it
also be added to the bundle's contents?

These observations were made using FreeBSD 12.2Stable on amd64.

Apologies for still asking newbie questions, but I'm trying to embed s6
here, which translates to properly understand.
Regards, Dewayne.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: s6-rc : Anomalies or normal behaviour
  2020-10-03 22:30 s6-rc : Anomalies or normal behaviour Dewayne Geraghty
@ 2020-10-04  1:58 ` Dewayne Geraghty
  2020-10-04  2:20   ` Laurent Bercot
  2020-10-04  2:14 ` Laurent Bercot
  1 sibling, 1 reply; 6+ messages in thread
From: Dewayne Geraghty @ 2020-10-04  1:58 UTC (permalink / raw)
  To: supervision

Apologies, my earlier email, item 2, pointed to emptyenv as the cause of
zombie processes on FreeBSD 12.2S, actually it is due to background.

# execlineb -Pc 'background { echo hello } pipeline { ps -axw } grep
defunct'
hello
30144  0  Z+       0:00.00 <defunct>

while the following tests both foreground and emptyenv
# execlineb -Pc 'emptyenv foreground { echo hello } pipeline { /bin/ps
-axw } /usr/bin/grep defunct'
hello
#

Software revision level (as available in the FreeBSD ports system)
execline-2.6.0.1
s6-2.9.1.0
s6-rc-0.5.1.2
skalibs-2.9.2.1

Further detail:
# execlineb -Pc 'emptyenv background { echo hello } pipeline { /bin/ps
-axwwdo pid,ppid,stat,command } /usr/bin/grep -B1  "defunct"'
hello
71212 70760 Ss   | | `-- -csh (csh)
16885 71212 S+   | |   `-- /usr/bin/grep -B1 defunct
17052 16885 Z+   | |     |-- <defunct>

I've also placed a ktrace and kdump of
execlineb -Pc 'ktrace -f /tmp/bgnd.kt /usr/local/bin/background {
/bin/ps } echo a'
here
http://www.heuristicsystems.com/s6/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: s6-rc : Anomalies or normal behaviour
  2020-10-03 22:30 s6-rc : Anomalies or normal behaviour Dewayne Geraghty
  2020-10-04  1:58 ` Dewayne Geraghty
@ 2020-10-04  2:14 ` Laurent Bercot
  2020-10-06  3:57   ` Dewayne Geraghty
  1 sibling, 1 reply; 6+ messages in thread
From: Laurent Bercot @ 2020-10-04  2:14 UTC (permalink / raw)
  To: Dewayne Geraghty, supervision

>1. I expected to see the date in seconds since time epoch, but result is
>variable name
># execlineb -Pc 'backtick D { date "+%s" } echo $D'
>$D

  Normal behaviour, since there's no shell to interpret $D as the
contents of variable D. Try using "importas D D" before the echo:
it will read the value of D and substitute $D with this value, so
echo will print the value. Yeah, execline is annoying like that, it's
just a habit to take.
  Also, you generally want "backtick -n", to chomp the newline at
the end of your input.


>---
>2. When I use emptyenv within an execlineb script, I have a "defunct"
>zombie process
>89685  3  S<       0:00.01   |-- s6-supervise base:time-srv
>  3020  -  S<s      0:00.03   | `-- /usr/local/sbin/ntpd -c /etc/ntp.conf
>-N -g -u ntpd --nofork
>  3601  -  Z<       0:00.00   |   `-- <defunct>
>
>The time server script is
>#!/usr/local/bin/execlineb -P
>emptyenv
>multidefine -d " " "base time ntpd /usr/local/sbin/ntpd" { JAIL SERVICE
>USER PROGRAM }
>background { echo Starting service $SERVICE using $PROGRAM on $JAIL
>under user $USER }
>fdmove 2 1
>redirfd -w 1 /m/base:time/fifo
>$PROGRAM -c /etc/ntp.conf -N -g -u $USER --nofork
>
>removing emptyenv, prevents the zombie from being created.  Is this normal?

  The zombie is the echo program in your background block, since it's a
direct child of your run script and there's nothing that reaps it
after it's forked (fdmove, redirfd, ntpd - those programs don't expect
to inherit a child). So the zombie is expected. To prevent that, use
"background -d", which will doublefork your echo program, so it will
be reparented to pid 1 which will reap it properly.

  The anomaly is that you *don't* have that zombie without emptyenv;
my first guess is that there's something in your environment that 
changes
the behaviour of ntpd and makes it reap the zombie somehow.


>---
>3. Is it normal/standard/good practice to include a dependency in a
>bundle.  For example, I have a "time" bundle whose contents are
>time-srv.  time-srv starts the ntpd service, and has as a dependency
>time-log.
>
>Using "s6-rc -u change time", everything behaves as documented, ie
>starts "time" which starts time-log, then time-srv.  However
>
># s6-rc -v 9 -d change base:time
>s6-rc: info: bringing selected services down
>s6-rc: info: processing service base:time-srv: stopping
>s6-rc: info: service base:time-srv stopped successfully
># Starting logging service time for base with user s6log folder
>/var/log/time
>
>and the time-log continues running.

  If you only have time-srv in your 'time' bundle, then time-srv and
time are equivalent. Telling s6-rc to bring down time will do the
exact same thing as telling it to bring down time-srv. time-log is
not impacted. So the behaviour is expected.

  If you want "s6-rc -d change time" to also bring down time-log, then
yes, you should add time-log to the time bundle. Then 'time' will
address both time-srv and time-log.


>y 6 seconds  # This is time-srv
>up (pid 85131) 6 seconds                      # This is time-log,so it
>has been restarted

  If you're using a manually created named pipe to transmit data
from time-srv to time-log, that pipe will close when time-srv exits,
and your logger will get EOF and probably exit, which is why it
stopped; but time-log's supervisor has received no instruction that
it should stop, so it will restart it. This is also expected.

  The simplest way of achieving the behaviour you want is s6-rc's
integrated pipeline feature. Get rid of your named pipe and of your
stdout (for time-srv) and stdin (for time-log) redirections; get rid
of your time bundle definition. Then declare time-log as a consumer
for time-srv and time-srv as a producer for time-log. In the
time-log source definition directory, write 'time' into the
pipeline-name file. Then recompile your database.

  This will automatically create a pipe between time-srv and time-log;
the pipe will be held open so it won't close even if one of the
processes exits; and it will automatically create a 'time' bundle
that contains both time-srv and time-log.

  You're on the right track. :)

--
  Laurent


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: s6-rc : Anomalies or normal behaviour
  2020-10-04  1:58 ` Dewayne Geraghty
@ 2020-10-04  2:20   ` Laurent Bercot
  0 siblings, 0 replies; 6+ messages in thread
From: Laurent Bercot @ 2020-10-04  2:20 UTC (permalink / raw)
  To: Dewayne Geraghty, supervision

>Apologies, my earlier email, item 2, pointed to emptyenv as the cause of
>zombie processes on FreeBSD 12.2S, actually it is due to background.

  Ah, then everything is working as intended and there's no anomaly.
  background spawns a process as a direct child, so if the parent execs
into a long-lived program that never reaps bastards (children it doesn't
know it has), then the zombie will hang around.
  "background -d" was made for this situation, and will avoid the
zombie.

--
  Laurent


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: s6-rc : Anomalies or normal behaviour
  2020-10-04  2:14 ` Laurent Bercot
@ 2020-10-06  3:57   ` Dewayne Geraghty
  2020-10-06 10:29     ` Laurent Bercot
  0 siblings, 1 reply; 6+ messages in thread
From: Dewayne Geraghty @ 2020-10-06  3:57 UTC (permalink / raw)
  To: supervision

On 4/10/2020 1:14 pm, Laurent Bercot wrote:
>> 1. I expected to see the date in seconds since time epoch, but result is
>> variable name
>> # execlineb -Pc 'backtick D { date "+%s" } echo $D'
>> $D
> 
>  Normal behaviour, since there's no shell to interpret $D as the
> contents of variable D. Try using "importas D D" before the echo:
> it will read the value of D and substitute $D with this value, so
> echo will print the value. Yeah, execline is annoying like that, it's
> just a habit to take.
>  Also, you generally want "backtick -n", to chomp the newline at
> the end of your input.
> 
> 
>> ---
>> 2. When I use emptyenv within an execlineb script, I have a "defunct"
>> zombie process
>> 89685  3  S<       0:00.01   |-- s6-supervise base:time-srv
>>  3020  -  S<s      0:00.03   | `-- /usr/local/sbin/ntpd -c /etc/ntp.conf
>> -N -g -u ntpd --nofork
>>  3601  -  Z<       0:00.00   |   `-- <defunct>
>>
>> The time server script is
>> #!/usr/local/bin/execlineb -P
>> emptyenv
>> multidefine -d " " "base time ntpd /usr/local/sbin/ntpd" { JAIL SERVICE
>> USER PROGRAM }
>> background { echo Starting service $SERVICE using $PROGRAM on $JAIL
>> under user $USER }
>> fdmove 2 1
>> redirfd -w 1 /m/base:time/fifo
>> $PROGRAM -c /etc/ntp.conf -N -g -u $USER --nofork
>>
>> removing emptyenv, prevents the zombie from being created.  Is this
>> normal?
> 
>  The zombie is the echo program in your background block, since it's a
> direct child of your run script and there's nothing that reaps it
> after it's forked (fdmove, redirfd, ntpd - those programs don't expect
> to inherit a child). So the zombie is expected. To prevent that, use
> "background -d", which will doublefork your echo program, so it will
> be reparented to pid 1 which will reap it properly.
> 
EDIT My error, the problem was background, and -d fixes this.
>  The anomaly is that you *don't* have that zombie without emptyenv;
> my first guess is that there's something in your environment that changes
> the behaviour of ntpd and makes it reap the zombie somehow.
> 
> 
>> ---
>> 3. Is it normal/standard/good practice to include a dependency in a
>> bundle.  For example, I have a "time" bundle whose contents are
>> time-srv.  time-srv starts the ntpd service, and has as a dependency
>> time-log.
>>
>> Using "s6-rc -u change time", everything behaves as documented, ie
>> starts "time" which starts time-log, then time-srv.  However
>>
>> # s6-rc -v 9 -d change base:time
>> s6-rc: info: bringing selected services down
>> s6-rc: info: processing service base:time-srv: stopping
>> s6-rc: info: service base:time-srv stopped successfully
>> # Starting logging service time for base with user s6log folder
>> /var/log/time
>>
>> and the time-log continues running.
> 
>  If you only have time-srv in your 'time' bundle, then time-srv and
> time are equivalent. Telling s6-rc to bring down time will do the
> exact same thing as telling it to bring down time-srv. time-log is
> not impacted. So the behaviour is expected.
> 
>  If you want "s6-rc -d change time" to also bring down time-log, then
> yes, you should add time-log to the time bundle. Then 'time' will
> address both time-srv and time-log.
> 
> 
>> y 6 seconds  # This is time-srv
>> up (pid 85131) 6 seconds                      # This is time-log,so it
>> has been restarted
> 
>  If you're using a manually created named pipe to transmit data
> from time-srv to time-log, that pipe will close when time-srv exits,
> and your logger will get EOF and probably exit, which is why it
> stopped; but time-log's supervisor has received no instruction that
> it should stop, so it will restart it. This is also expected.
> 
>  The simplest way of achieving the behaviour you want is s6-rc's
> integrated pipeline feature. Get rid of your named pipe and of your
> stdout (for time-srv) and stdin (for time-log) redirections; get rid
> of your time bundle definition. Then declare time-log as a consumer
> for time-srv and time-srv as a producer for time-log. In the
> time-log source definition directory, write 'time' into the
> pipeline-name file. Then recompile your database.
> 
>  This will automatically create a pipe between time-srv and time-log;
> the pipe will be held open so it won't close even if one of the
> processes exits; and it will automatically create a 'time' bundle
> that contains both time-srv and time-log.
> 
>  You're on the right track. :)
> 
> -- 
>  Laurent
> 
> 
Laurent,

Thank-you very much.  Using your advise (re 1 & 2) I've redeployed our
testing platform and everything works as expected :)

re 3. Implementing the producer-for/consumer-for pair, we've gone from
(The application server in jail b3 to log server in jail b2 Ref1).

# cat b3:named-setup2/up
#!/usr/local/bin/execlineb -P
define D /m/b3/fifo/named
foreground { if -n { test -p $D } foreground { /usr/bin/mkfifo $D } }
foreground { /usr/sbin/chown s6log:named $D }
foreground { /bin/chmod 720 $D }

# cat b3:named2/run
#!/usr/local/bin/execlineb -P
fdmove 2 1
redirfd -w 1 /m/b3/fifo/named
/usr/sbin/jexec b3 /usr/local/sbin/named -f -n 1 -U 1 -u bind -c
/usr/local/etc/namedb/named.conf

# cat b3:named-log2/run
#!/usr/local/bin/execlineb -P
emptyenv
redirfd -r 0 /m/b3/fifo/named
/usr/sbin/jexec -U s6log b2 /usr/local/bin/s6-log -b n14 r7000 s100000
S3000000 !"/usr/bin/xz -7q" /var/log/named
#Read as: run in jail b2 as user s6log the s6-log program

TO
# cat b3:named3/run
#!/usr/local/bin/execlineb -P
/usr/sbin/jexec b3 /usr/local/sbin/named -f -n 1 -U 1 -u bind -c
/usr/local/etc/namedb/named.conf

# cat b3:named-log3/run
#!/usr/local/bin/execlineb -P
emptyenv
/usr/sbin/jexec -U s6log b2 /usr/local/bin/s6-log -b n14 r7000 s100000
S3000000 !"/usr/bin/xz -7q" /var/log/named

A significant reduction in complexity.  However, and the reason for my
delay in replying.  Magic happened!  I was now transmitting data which
crossed jail barriers (from b3 "named" to b2 "named logging").  I needed
to consult with one of the FreeBSD developers to ensure that a security
hole wasn't occurring. :)

It appears (and I'm assuming) that s6 uses pseudo terminal sub-system to
communicate. In this specific case below, per pts/3
# procstat -f 96796 95651 92390 | grep -E "text|pts"
96796 named             text v r r-------   -       - -
/jails/b3/usr/local/sbin/named
96796 named                2 v c rw------  71   10677 -   /dev/pts/3

95651 s6-log            text v r r-------   -       - -
/jails/b2/usr/local/bin/s6-log
95651 s6-log               1 v c rw------  71   10677 -   /dev/pts/3
95651 s6-log               2 v c rw------  71   10677 -   /dev/pts/3

92390 s6-fdholderd      text v r r-------   -       - -
/usr/local/bin/s6-fdholderd
92390 s6-fdholderd         2 v c rw------  71   10677 -   /dev/pts/3

I don't know if this is a good thing, so further investigation required.
 But for now s6 continues to make magic happen.

Kind regards, Dewayne.

References:
1.
https://www.freebsd.org/cgi/man.cgi?query=jail&apropos=0&sektion=0&manpath=FreeBSD+12.1-RELEASE&arch=default&format=html
2.
https://www.freebsd.org/cgi/man.cgi?query=nullfs&apropos=0&sektion=0&manpath=FreeBSD+12.1-RELEASE&arch=default&format=html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: s6-rc : Anomalies or normal behaviour
  2020-10-06  3:57   ` Dewayne Geraghty
@ 2020-10-06 10:29     ` Laurent Bercot
  0 siblings, 0 replies; 6+ messages in thread
From: Laurent Bercot @ 2020-10-06 10:29 UTC (permalink / raw)
  To: Dewayne Geraghty, supervision


  Glad it's working for you!


>A significant reduction in complexity.  However, and the reason for my
>delay in replying.  Magic happened!  I was now transmitting data which
>crossed jail barriers (from b3 "named" to b2 "named logging").  I needed
>to consult with one of the FreeBSD developers to ensure that a security
>hole wasn't occurring. :)

  Well, that's also what you were doing with your former
b3:named2 and b3:named-log2, except you were transmitting the data via
a named pipe created in your run script explicitly instead of an
anonymous pipe created by s6-rc implicitly. The integrated pipe
feature does not touch your security model at all; if you were to
consult with a FreeBSD developer, you needed to do it before making
the change. :)


>It appears (and I'm assuming) that s6 uses pseudo terminal sub-system to
>communicate. In this specific case below, per pts/3

  No, s6 does not use pseudo-terminals at all; all it does is let
processes inherit fds from their parent. In your case, /dev/pts/3 seems
to be s6-svscan's stdout and stderr; if you don't want to have
pseudo-terminals, you should check the script that launches your
supervision tree, and redirect s6-svscan's outputs accordingly.

--
  Laurent


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-10-06 10:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-03 22:30 s6-rc : Anomalies or normal behaviour Dewayne Geraghty
2020-10-04  1:58 ` Dewayne Geraghty
2020-10-04  2:20   ` Laurent Bercot
2020-10-04  2:14 ` Laurent Bercot
2020-10-06  3:57   ` Dewayne Geraghty
2020-10-06 10:29     ` Laurent Bercot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).