supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* Query on s6-log and s6-supervise
@ 2021-06-08  9:13 Arjun D R
  2021-06-08 11:04 ` Laurent Bercot
  0 siblings, 1 reply; 9+ messages in thread
From: Arjun D R @ 2021-06-08  9:13 UTC (permalink / raw)
  To: supervision

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]

Hi Team,



I would like to hear from you for a few queries. Please help.



   1. Why do we need to have separate supervisors for producer and consumer
   long run services? Is it possible to have one supervisor for both producer
   and consumer, because anyhow the consumer service need not to run when the
   producer is down.  I can understand that s6 supervisor is meant to monitor
   only one service, but why not monitor a couple of services when it is
   logically valid if I am not wrong.
   2. Is it possible to have a single supervisor for a bundle of services?
   Like, one supervisor for a bundle (consisting of few services)?
   3. Generally how many instances of s6-supervise can run? We are running
   into a problem where we have 129 instances of s6-supervise that leads to
   higher memory consumption. We are migrating from systemd to s6 init system
   considering the light weight, but we have a lot of s6-log and s6-supervise
   instances that results in higher memory usage compared to systemd.  Is it
   fine to have this many number of s6-supervise instances?    ps_mem data -
      5.5 MiB       s6-log (46) ,  14.3 MiB       s6-supervise (129)



Thanks,
Arjun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
  2021-06-08  9:13 Query on s6-log and s6-supervise Arjun D R
@ 2021-06-08 11:04 ` Laurent Bercot
  2021-06-09  2:19   ` Dewayne Geraghty
  0 siblings, 1 reply; 9+ messages in thread
From: Laurent Bercot @ 2021-06-08 11:04 UTC (permalink / raw)
  To: Arjun D R, supervision

>    1. Why do we need to have separate supervisors for producer and consumer
>    long run services? Is it possible to have one supervisor for both producer
>    and consumer, because anyhow the consumer service need not to run when the
>    producer is down.  I can understand that s6 supervisor is meant to monitor
>    only one service, but why not monitor a couple of services when it is
>    logically valid if I am not wrong.

  Hi Arjun,

  The logic of the supervisor is already complex enough when it has
to monitor one process. It would be quadratically as complex if it
had to monitor two. In all likeliness, the first impact of such a
change would be more bugs, because the logic would be a lot more
difficult to understand and maintain.

  The amount of memory used by the s6 logic itself would not change
(or would *increase* somewhat) if the code was organized in a
different way in order to reduce the amount of processes, and you
would see an overall decrease in code quality.

  Worsening the design to offset operational costs is not a good
trade-off - it is not "logically valid", as you put it. I would not
do it even if the high amount of memory consumed by your processes
was due to s6 itself.

  But it is not the case: your operational costs are due to something
else. See below.


>
>    2. Is it possible to have a single supervisor for a bundle of services?
>    Like, one supervisor for a bundle (consisting of few services)?

  Again, there would be no engineering benefit to that. You would likely
see operational benefits, yes, but s6 is the wrong place to try and get
those benefits, because it is not the cause of your operational costs.


>    3. Generally how many instances of s6-supervise can run? We are running
>    into a problem where we have 129 instances of s6-supervise that leads to
>    higher memory consumption. We are migrating from systemd to s6 init system
>    considering the light weight, but we have a lot of s6-log and s6-supervise
>    instances that results in higher memory usage compared to systemd.  Is it
>    fine to have this many number of s6-supervise instances?    ps_mem data -
>       5.5 MiB       s6-log (46) ,  14.3 MiB       s6-supervise (129)

  It is normally totally fine to have this many number of s6-supervise
instances (and of s6-log instances), and it is the intended usage.
The skarnet.org server only has 256 MB of RAM, and currently sports 93
instances of s6-supervise (and 44 instances of s6-log) without any
trouble. It could triple that amount without breaking a sweat.

  The real problem here is that your instances appear to use so much
memory: *that* is not normal.
Every s6-supervise process should use at most 4 pages (16k) of private
dirty memory, so for 129 processes I would expect the memory usage to
be around 2.1 MB. Your reported total shows 7 times as much, which
sounds totally out of bounds to me, and even accounting for normal
operational overhead, a factor of 7 is *completely bonkers*.

  There are two possible explanations here:
  - Either ps_mem is not accurately tallying the memory used by a given
set of processes;
  - Or you are using a libc with an incredible amount of overhead, and
your libc (and in particular, I suspect, dynamic linking management in
your libc) is the culprit for the insane amount of memory that the
s6-supervise processes seem to be eating.

  The easiest way to understand what's going on is to find a
s6-supervise process's pid, and to perform
# cat /proc/$pid/smaps_rollup

  That will tell you what's going on for the chosen s6-supervise process
(they're all similar, so the number for the other s6-supervise processes
won't be far off). In particular, look at the Private_Dirty line: that
is the "real" amount of uncompressible memory used by that process.
  It should be around 16k, tops. Anything over that is overhead from
your libc.
  If the value is not too much over 16k, then ps_mem is simply lying to
you and there is nothing to worry about, except that you should use
another tool to tally memory usage.
  But if the value is much higher, then it is time to diagnose deeper:

# cat /proc/$pid/smaps

  That will show you all the mappings performed by your libc, and
the amount of memory that each of these mappings uses. Again, the
most important lines are the Private_Dirty ones - these are the
values that add up for every s6-supervise instance.

  My hunch is that you will see *a lot* of mappings, each using
4k or 8k, or even in some cases 12k, of Private_Dirty memory.
If it is the case, unfortunately there is nothing I can do about it,
because that overhead is entirely caused by your libc.

  However, there is something *you* can do about it:

  - If "ldd /bin/s6-supervise" gives you a line mentioning libs6.so
or libskarnet.so, try recompiling s6 with --enable-allstatic. This
will link against the static version of libs6 and libskarnet, which
will alleviate the costs of dynamic linking. (The price is that the
*text* of s6-supervise will be a little bigger, but it doesn't matter:
text is Shared_Clean, the cost is only incurred once).

  That alone should decrease your memory usage by a lot.

  If that is still not enough, then it means your libc is trash. Sorry,
there is no other word. If you are on Linux and using glibc (which,
indeed, is trash), you can try building skalibs+execline+s6 against
the musl libc (https://musl.libc.org/); and not only will that allow
you to use s6 as intended, with hundreds of s6-supervise instances
without having to worry about memory usage - because musl has very
little overhead - but your s6 binaries will also be smaller and
faster.

  I hope this will help you, and I hope this unfortunate report can
serve as an illustration of *why* it is important to minimize overhead
at every level of a system, especially at lower levels and particularly
in the libc.

--
  Laurent


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
  2021-06-08 11:04 ` Laurent Bercot
@ 2021-06-09  2:19   ` Dewayne Geraghty
  2021-06-09  3:30     ` Arjun D R
       [not found]     ` <CAHJ2E=n4+bfO39LYfGaTpaqPGtPHUSy32++4t4n+PjdZz+S=Cw@mail.gmail.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Dewayne Geraghty @ 2021-06-09  2:19 UTC (permalink / raw)
  To: supervision

Thanks Laurent, that's really interesting.  By comparison, my FBSD
system uses:

# ps -axw -o pid,vsz,rss,time,comm | grep s6
       virt KB  resident cpu total
38724   10904   1600     0:00.02 s6-log
41848   10788   1552     0:00.03 s6-log
42138   10848   1576     0:00.01 s6-log
42222   10888   1596     0:00.02 s6-log
45878   10784   1516     0:00.00 s6-svscan
54453   10792   1544     0:00.00 s6-supervise
... lots ...
67937   10792   1540     0:00.00 s6-supervise
76442   10724   1484     0:00.01 s6-ipcserverd
76455   11364   1600     0:00.01 s6-fdholderd
84229   10896    712     0:00.01 s6-log

Processes pull-in both ld-elf and libc.so, from procstat -v
start           end             path
0x1021000	0x122a000	/usr/local/bin/s6-supervise
0x801229000	0x80124f000	/libexec/ld-elf.so.1
0x801272000	0x80144c000	/lib/libc.so.7

Yes - libc is ... large.

Arjun, if you want to reduce the number of s6-log processes perhaps
consider piping them to a file which s6-log reads from.  For example we
maintain various web servers, the accesses are unique and of interest to
customers, but they don't (really) care about the errors so we aggregate
this with one s6-log. Works very well  :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
  2021-06-09  2:19   ` Dewayne Geraghty
@ 2021-06-09  3:30     ` Arjun D R
  2021-06-09  8:32       ` Colin Booth
  2021-06-09 11:48       ` Laurent Bercot
       [not found]     ` <CAHJ2E=n4+bfO39LYfGaTpaqPGtPHUSy32++4t4n+PjdZz+S=Cw@mail.gmail.com>
  1 sibling, 2 replies; 9+ messages in thread
From: Arjun D R @ 2021-06-09  3:30 UTC (permalink / raw)
  To: Dewayne Geraghty; +Cc: supervision

[-- Attachment #1: Type: text/plain, Size: 6402 bytes --]

Thanks Laurent for the brief detail. That really helps.

I have checked the Private_Dirty memory in "smaps" of a s6-supervise
process and I don't see any consuming above 8kB. Just posting it here
for reference.

grep Private_Dirty /proc/991/smaps
Private_Dirty:         0 kB
Private_Dirty:         4 kB
Private_Dirty:         4 kB
Private_Dirty:         8 kB
Private_Dirty:         0 kB
Private_Dirty:         0 kB
Private_Dirty:         4 kB
Private_Dirty:         4 kB
Private_Dirty:         4 kB
Private_Dirty:         0 kB
Private_Dirty:         0 kB
Private_Dirty:         4 kB
Private_Dirty:         4 kB
Private_Dirty:         0 kB
Private_Dirty:         0 kB
Private_Dirty:         8 kB
Private_Dirty:         8 kB
Private_Dirty:         8 kB
Private_Dirty:         0 kB
Private_Dirty:         0 kB
Private_Dirty:         8 kB
Private_Dirty:         4 kB
Private_Dirty:         4 kB
Private_Dirty:         0 kB
Private_Dirty:         8 kB
Private_Dirty:         4 kB
Private_Dirty:         4 kB
Private_Dirty:         4 kB
Private_Dirty:         0 kB
Private_Dirty:         0 kB

cat /proc/991/smaps
00010000-00014000 r-xp 00000000 07:00 174        /bin/s6-supervise

00023000-00024000 r--p 00003000 07:00 174        /bin/s6-supervise

00024000-00025000 rw-p 00004000 07:00 174        /bin/s6-supervise

00025000-00046000 rw-p 00000000 00:00 0          [heap]

b6e1c000-b6e2d000 r-xp 00000000 07:00 3652       /lib/libpthread-2.31.so

b6e2d000-b6e3c000 ---p 00011000 07:00 3652       /lib/libpthread-2.31.so

b6e3c000-b6e3d000 r--p 00010000 07:00 3652       /lib/libpthread-2.31.so

b6e3d000-b6e3e000 rw-p 00011000 07:00 3652       /lib/libpthread-2.31.so

b6e3e000-b6e40000 rw-p 00000000 00:00 0

b6e40000-b6e45000 r-xp 00000000 07:00 3656       /lib/librt-2.31.so

b6e45000-b6e54000 ---p 00005000 07:00 3656       /lib/librt-2.31.so

b6e54000-b6e55000 r--p 00004000 07:00 3656       /lib/librt-2.31.so

b6e55000-b6e56000 rw-p 00005000 07:00 3656       /lib/librt-2.31.so

b6e56000-b6f19000 r-xp 00000000 07:00 3613       /lib/libc-2.31.so

b6f19000-b6f28000 ---p 000c3000 07:00 3613       /lib/libc-2.31.so

b6f28000-b6f2a000 r--p 000c2000 07:00 3613       /lib/libc-2.31.so

b6f2a000-b6f2c000 rw-p 000c4000 07:00 3613       /lib/libc-2.31.so

b6f2c000-b6f2e000 rw-p 00000000 00:00 0

b6f2e000-b6f4d000 r-xp 00000000 07:00 3665       /lib/libskarnet.so.2.9.2.1

b6f4d000-b6f5c000 ---p 0001f000 07:00 3665       /lib/libskarnet.so.2.9.2.1

b6f5c000-b6f5e000 r--p 0001e000 07:00 3665       /lib/libskarnet.so.2.9.2.1

b6f5e000-b6f5f000 rw-p 00020000 07:00 3665       /lib/libskarnet.so.2.9.2.1

b6f5f000-b6f6b000 rw-p 00000000 00:00 0

b6f6b000-b6f81000 r-xp 00000000 07:00 3605       /lib/ld-2.31.so

b6f87000-b6f89000 rw-p 00000000 00:00 0

b6f91000-b6f92000 r--p 00016000 07:00 3605       /lib/ld-2.31.so

b6f92000-b6f93000 rw-p 00017000 07:00 3605       /lib/ld-2.31.so

beaf8000-beb19000 rw-p 00000000 00:00 0          [stack]
Size:                132 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac
becd5000-becd6000 r-xp 00000000 00:00 0          [sigpage]

ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Sorry I am not able to post the whole data considering the mail size.

On my Linux system,
ps -axw -o pid,vsz,rss,time,comm | grep s6
    1   1732  1128 00:00:06 s6-svscan
  900   1736   452 00:00:00 s6-supervise
  901   1736   480 00:00:00 s6-supervise
  902   1736   444 00:00:00 s6-supervise
  903   1736   444 00:00:00 s6-supervise
  907   1744   496 00:00:00 s6-log
.....

And I don't think ps_mem is lying , I just compared it with smem as well.
Clear data on ps_mem:

 Private  +   Shared  =  RAM used       Program

  4.8 MiB + 786.0 KiB =   5.5 MiB       s6-log (46)
 12.2 MiB +   2.1 MiB =  14.3 MiB       s6-supervise (129)

smem:

  PID User     Command                         Swap      USS      PSS
 RSS
1020 root     s6-supervise wpa_supplicant        0       96       98
 996
 2001 root     s6-log -F wpa_supplicant.lo        0      104      106
1128

Same(almost) amount of PSS/RSS are used by other s6-supervise and s6-log
processes.

I have tried with flag "--enable-allstatic" and unfortunately I don't see
any improvement. If you were mentioning about shared memory, then yes we
are good here. It is using 2.1 MiB for 129 instances, but the private
memory is around 12.2 MiB. I am not sure whether this is the normal value
or not.

If possible, can you please share us a reference smap and ps_mem data on
s6-supervise. That would really help.

Dewayne, even though we pipe it to a file, we will be having a
s6-supervisor for the log service. Maybe I didn't understand it well. Sorry
about it. Please help me with that.

Thanks,
Arjun

On Wed, Jun 9, 2021 at 8:18 AM Dewayne Geraghty <
dewayne@heuristicsystems.com.au> wrote:

> Thanks Laurent, that's really interesting.  By comparison, my FBSD
> system uses:
>
> # ps -axw -o pid,vsz,rss,time,comm | grep s6
>        virt KB  resident cpu total
> 38724   10904   1600     0:00.02 s6-log
> 41848   10788   1552     0:00.03 s6-log
> 42138   10848   1576     0:00.01 s6-log
> 42222   10888   1596     0:00.02 s6-log
> 45878   10784   1516     0:00.00 s6-svscan
> 54453   10792   1544     0:00.00 s6-supervise
> ... lots ...
> 67937   10792   1540     0:00.00 s6-supervise
> 76442   10724   1484     0:00.01 s6-ipcserverd
> 76455   11364   1600     0:00.01 s6-fdholderd
> 84229   10896    712     0:00.01 s6-log
>
> Processes pull-in both ld-elf and libc.so, from procstat -v
> start           end             path
> 0x1021000       0x122a000       /usr/local/bin/s6-supervise
> 0x801229000     0x80124f000     /libexec/ld-elf.so.1
> 0x801272000     0x80144c000     /lib/libc.so.7
>
> Yes - libc is ... large.
>
> Arjun, if you want to reduce the number of s6-log processes perhaps
> consider piping them to a file which s6-log reads from.  For example we
> maintain various web servers, the accesses are unique and of interest to
> customers, but they don't (really) care about the errors so we aggregate
> this with one s6-log. Works very well  :)
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
       [not found]     ` <CAHJ2E=n4+bfO39LYfGaTpaqPGtPHUSy32++4t4n+PjdZz+S=Cw@mail.gmail.com>
@ 2021-06-09  3:40       ` Dewayne Geraghty
  2021-06-09  5:01         ` Arjun D R
  0 siblings, 1 reply; 9+ messages in thread
From: Dewayne Geraghty @ 2021-06-09  3:40 UTC (permalink / raw)
  To: Arjun D R; +Cc: supervision

Apologies, I'd implied that we have multiple s6-supervise processes
running and their children pipe to one file which is read by one s6-log
file.

You can achieve this outcome by using s6-rc's, where one consumer can
receive multiple inputs from producers.

There is a special (but not unique) case where a program, such as apache
which will have explicit log files (defined in apache's config file) to
record web-page accesses and error logs, on a per server basis.  Because
all the supervised apache instances can write to one error logfile, I
instructed apache to write to a pipe.  Multiple supervised apache
instances using the one pipe (aka funnel), which was read by one s6-log.
 This way reducing the number of (s6-log) processes.  I could do the
same with the access logs and use the regex function of s6-log, but I
tend to simplicity.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
  2021-06-09  3:40       ` Dewayne Geraghty
@ 2021-06-09  5:01         ` Arjun D R
  0 siblings, 0 replies; 9+ messages in thread
From: Arjun D R @ 2021-06-09  5:01 UTC (permalink / raw)
  To: Dewayne Geraghty; +Cc: supervision

[-- Attachment #1: Type: text/plain, Size: 1426 bytes --]

Dewayne,
Thanks for the details. We already have such an implementation (multiple
producers with one consumer) but still our s6-log instances are high. Many
of our services require direct logger services. We can reduce the direct
logger services by creating a funnel and using regex to separate the logs
but that indeed is a risky and complicated process. I am just interested to
confirm the memory usage of s6-log and s6-supervise processes.

Thanks,
Arjun

On Wed, Jun 9, 2021 at 9:11 AM Dewayne Geraghty <
dewayne@heuristicsystems.com.au> wrote:

> Apologies, I'd implied that we have multiple s6-supervise processes
> running and their children pipe to one file which is read by one s6-log
> file.
>
> You can achieve this outcome by using s6-rc's, where one consumer can
> receive multiple inputs from producers.
>
> There is a special (but not unique) case where a program, such as apache
> which will have explicit log files (defined in apache's config file) to
> record web-page accesses and error logs, on a per server basis.  Because
> all the supervised apache instances can write to one error logfile, I
> instructed apache to write to a pipe.  Multiple supervised apache
> instances using the one pipe (aka funnel), which was read by one s6-log.
>  This way reducing the number of (s6-log) processes.  I could do the
> same with the access logs and use the regex function of s6-log, but I
> tend to simplicity.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
  2021-06-09  3:30     ` Arjun D R
@ 2021-06-09  8:32       ` Colin Booth
  2021-06-09 11:48       ` Laurent Bercot
  1 sibling, 0 replies; 9+ messages in thread
From: Colin Booth @ 2021-06-09  8:32 UTC (permalink / raw)
  To: Arjun D R; +Cc: Dewayne Geraghty, supervision

On Wed, Jun 09, 2021 at 09:00:38AM +0530, Arjun D R wrote:
> Thanks Laurent for the brief detail. That really helps.
> 
> I have checked the Private_Dirty memory in "smaps" of a s6-supervise
> process and I don't see any consuming above 8kB. Just posting it here
> for reference.
> 
> grep Private_Dirty /proc/991/smaps
> Private_Dirty:         0 kB
> Private_Dirty:         4 kB
> Private_Dirty:         4 kB
> Private_Dirty:         8 kB
> Private_Dirty:         0 kB
> Private_Dirty:         0 kB
> Private_Dirty:         4 kB
> Private_Dirty:         4 kB
> Private_Dirty:         4 kB
> Private_Dirty:         0 kB
> Private_Dirty:         0 kB
> Private_Dirty:         4 kB
> Private_Dirty:         4 kB
> Private_Dirty:         0 kB
> Private_Dirty:         0 kB
> Private_Dirty:         8 kB
> Private_Dirty:         8 kB
> Private_Dirty:         8 kB
> Private_Dirty:         0 kB
> Private_Dirty:         0 kB
> Private_Dirty:         8 kB
> Private_Dirty:         4 kB
> Private_Dirty:         4 kB
> Private_Dirty:         0 kB
> Private_Dirty:         8 kB
> Private_Dirty:         4 kB
> Private_Dirty:         4 kB
> Private_Dirty:         4 kB
> Private_Dirty:         0 kB
> Private_Dirty:         0 kB
> 
... snip...
In a fully dynamic world a large number of dirty pages is expected, even
if each segment is only one or two pages. There's nothing particularly
surprising here when using a distro provided s6 that's linked
dynamically.

> 
> I have tried with flag "--enable-allstatic" and unfortunately I don't see
> any improvement. If you were mentioning about shared memory, then yes we
> are good here. It is using 2.1 MiB for 129 instances, but the private
> memory is around 12.2 MiB. I am not sure whether this is the normal value
> or not.
> 
If you're building against glibc then allstatic probably won't help. The
default config options for s6 will build with static links against all
libraries but the libc which is a pretty decent tradeoff between
efficiency and convenience, but to see super low dirty memory you'll
need to use a libc that can actually be linked statically such as musl.

For your needs, using a musl toolchain and building a fully static s6
should get you significantly better memory usage.
-- 
Colin Booth

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
  2021-06-09  3:30     ` Arjun D R
  2021-06-09  8:32       ` Colin Booth
@ 2021-06-09 11:48       ` Laurent Bercot
  2021-06-10  3:54         ` Arjun D R
  1 sibling, 1 reply; 9+ messages in thread
From: Laurent Bercot @ 2021-06-09 11:48 UTC (permalink / raw)
  To: Arjun D R, Dewayne Geraghty; +Cc: supervision

>I have checked the Private_Dirty memory in "smaps" of a s6-supervise
>process and I don't see any consuming above 8kB. Just posting it here
>for reference.

  Indeed, each mapping is small, but you have *a lot* of them. The
sum of all the Private_Dirty in your mappings, that should be shown
in smaps_rollup, is 96 kB. 24 pages! That is _huge_.

  In this list, the mappings that are really used by s6-supervise (i.e.
the incompressible amount of unshareable memory) are the following:

  - the /bin/s6-supervise section: this is static data, s6-supervise
needs a little, but it should not take more than one page.

  - the [heap] section: this is dynamically allocated memory, and for
s6-supervise it should not be bigger than 4 kB. s6-supervise does not
allocate dynamic memory itself, the presence of a heap section is due
to opendir() which needs dynamic buffers; the size of the buffer is
determined by the libc, and anything more than one page is wasteful.

( - anonymous mappings are really memory dynamically allocated for
internal  libc purposes; they do not show up in [heap] because they're
not obtained via malloc(). No function used by s6-supervise should
ever need those; any anonymous mapping you see is libc shenanigans
and counts as overhead. )

  - the [stack] section: this is difficult to control because the
amount of stack a process uses depends a lot on the compiler, the
compilation flags, etc. When built with -O2, s6-supervise should not
use more than 2-3 pages of stack. This includes a one-page buffer to
read from notification-fd; I can probably reduce the size of this
buffer and make sure the amount of needed stack pages never goes
above 2.

  So in total, the incompressible amount of private mappings is 4 to 5
pages (16 to 20 kB). All the other mappings are libc overhead.

  - the libpthread-2.31.so mapping uses 8 kB
  - the librt-2.31.so mapping uses 8 kB
  - the libc-2.31.so mapping uses 16 kB
  - the libskarnet.so mapping uses 12 kB
  - ld.so, the dynamic linker itself, uses 16 kB
  - there are 16 kB of anonymous mappings

  This is some serious waste; unfortunately, it's pretty much to be
expected from glibc, which suffers from decades of misdesign and
tunnel vision especially where dynamic linking is concerned. We are,
unfortunately, experiencing the consequences of technical debt.

  Linking against the static version of skalibs (--enable-allstatic)
should save you at least 12 kB (and probably 16) per instance of
s6-supervise. You should have noticed the improvement; your amount of
private memory should have dropped by at least 1.5MB when you switched
to --enable-allstatic.
  But I understand it is not enough.

  Unfortunately, once you have removed the libskarnet.so mappings,
it's basically down to the libc, and to achieve further improvements
I have no other suggestions than to change libcs.

>If possible, can you please share us a reference smap and ps_mem data on
>s6-supervise. That would really help.

  I don't use ps_mem, but here are the details of a s6-supervise process
on the skarnet.org server. s6 is linked statically against the musl
libc, which means:
  - the text segments are bigger (drawback of static linking)
  - there are fewer mappings (advantage of static linking, but even when
you're linking dynamically against musl it maps as little as it can)
  - the mappings have little libc overhead (advantage of musl)

# cat smaps_rollup

00400000-7ffd53096000 ---p 00000000 00:00 0  [rollup]
Rss:                  64 kB
Pss:                  36 kB
Pss_Anon:             20 kB
Pss_File:             16 kB
Pss_Shmem:             0 kB
Shared_Clean:         40 kB
Shared_Dirty:          0 kB
Private_Clean:         8 kB
Private_Dirty:        16 kB
Referenced:           64 kB
Anonymous:            20 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB

  You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of
Private_Clean - apparently there's one Private_Clean page of static
data and one of stack; I have no idea what this corresponds to in the
code, I will need to investigate and see if it can be trimmed down.

# grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:' 
smaps

00400000-00409000 r-xp 00000000 ca:00 659178  /command/s6-supervise
Private_Dirty:         0 kB
00609000-0060b000 rw-p 00009000 ca:00 659178  /command/s6-supervise
Private_Dirty:         4 kB
02462000-02463000 ---p 00000000 00:00 0  [heap]
Private_Dirty:         0 kB
02463000-02464000 rw-p 00000000 00:00 0  [heap]
Private_Dirty:         4 kB
7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0  [stack]
Private_Dirty:         8 kB
7ffd53090000-7ffd53094000 r--p 00000000 00:00 0  [vvar]
Private_Dirty:         0 kB
7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0  [vdso]
Private_Dirty:         0 kB

  One page of static data, one page of heap, two pages of stack (that
I should probably be able to get down to one). All the other mappings
are shared, except those weird two pages of Private_Clean that I don't
understand yet.
  As you can see, it is as close to incompressible as it gets. If I had
129 of these processes, without changing anything, they would use
something like: (16+8) * 129 + 40 = 3136 kB of RAM. Which is still
bigger than the theoretical minimum - I need to get rid of those two
Private_Clean pages - but much more acceptable than the 12.2 MB you get
from glibc.


  I was going to post this as is, but for completion's sake and my
peace of mind, I fired up an Alpine Linux VM and checked /proc for
a s6-supervise process. Alpine Linux uses musl, but with dynamic
linking, and --disable-allstatic. The results are mixed:

  - 8 kB of static data (why is it more than the static case?)
  - 4 kB of heap
  - 8 kB of stack
    (So far so good, more or less.)
  - 16 kB for libskarnet.so (why is it more than glibc uses?)
  - 8 kB of anonymous mapping related to libskarnet.so
  - 8 kB for libc.so
  - 8 kB of anonymous mapping related to libc.so

  That's better than glibc, but is still 40kB of overhead compared to
a static build, plus 4 kB of static data that I don't understand.
Total is 60 kB, which would net 7.7MB + shared for 129 instances.
Linking libskarnet statically would likely save 24kB per instance, so
the total RAM for --enable-allstatic would be 4.6MB + shared. Which
is starting to sound close to acceptable.

  My takeaway from this is that dynamic linking, despite being essential
for distributions (for ease of upgrade, maintenance, and security
reasons), is definitely _not free_. It has a high fixed cost in RAM;
this is not noticeable when using few instances of large, bloated
processes - which is how a lot of software operates - but it is very
noticeable when using a lot of instances of small, efficient processes,
where the costs of dynamic linking overshadow the legit RAM use of said
processes.

  In other words: the way s6 works is a worst case for dynamic linking,
and especially dynamic linking with glibc. I'm sorry.

  If you want to attempt building static binaries of s6 with musl, you
can find musl toolchains at https://skarnet.org/toolchains/ or
at https://musl.cc/ . Please bear in mind you will need to build the
whole stack with the same toolchain (skalibs, execline, s6).



  Dewayne:

>>  Thanks Laurent, that's really interesting.  By comparison, my FBSD
>>  system uses:
>>
>>  # ps -axw -o pid,vsz,rss,time,comm | grep s6

  Well that's the problem with ps: VSZ and RSS won't give you the real
information, because they include shared mappings in their numbers.
To get a reasonably accurate estimation of the marginal increase on
one additional process, you need to know what is shared and what is
private, and ps doesn't tell you that. There is probably a way to get
the information on FreeBSD, but I don't know what it is.

  Yes, the FreeBSD libc is relatively large, but it's pretty decent
compared to glibc. I suspect the marginal increase on one s6-supervise
process on FreeBSD is somewhere between what you get with musl and
what you get with glibc.

--
  Laurent


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Query on s6-log and s6-supervise
  2021-06-09 11:48       ` Laurent Bercot
@ 2021-06-10  3:54         ` Arjun D R
  0 siblings, 0 replies; 9+ messages in thread
From: Arjun D R @ 2021-06-10  3:54 UTC (permalink / raw)
  To: Laurent Bercot; +Cc: Dewayne Geraghty, supervision

[-- Attachment #1: Type: text/plain, Size: 8822 bytes --]

Thanks Laurent and Colin for the suggestions. I will try to build a fully
static s6 with musl toolchain. Thanks for the detailed analysis
once again.
--
Arjun

On Wed, Jun 9, 2021 at 5:18 PM Laurent Bercot <ska-supervision@skarnet.org>
wrote:

> >I have checked the Private_Dirty memory in "smaps" of a s6-supervise
> >process and I don't see any consuming above 8kB. Just posting it here
> >for reference.
>
>   Indeed, each mapping is small, but you have *a lot* of them. The
> sum of all the Private_Dirty in your mappings, that should be shown
> in smaps_rollup, is 96 kB. 24 pages! That is _huge_.
>
>   In this list, the mappings that are really used by s6-supervise (i.e.
> the incompressible amount of unshareable memory) are the following:
>
>   - the /bin/s6-supervise section: this is static data, s6-supervise
> needs a little, but it should not take more than one page.
>
>   - the [heap] section: this is dynamically allocated memory, and for
> s6-supervise it should not be bigger than 4 kB. s6-supervise does not
> allocate dynamic memory itself, the presence of a heap section is due
> to opendir() which needs dynamic buffers; the size of the buffer is
> determined by the libc, and anything more than one page is wasteful.
>
> ( - anonymous mappings are really memory dynamically allocated for
> internal  libc purposes; they do not show up in [heap] because they're
> not obtained via malloc(). No function used by s6-supervise should
> ever need those; any anonymous mapping you see is libc shenanigans
> and counts as overhead. )
>
>   - the [stack] section: this is difficult to control because the
> amount of stack a process uses depends a lot on the compiler, the
> compilation flags, etc. When built with -O2, s6-supervise should not
> use more than 2-3 pages of stack. This includes a one-page buffer to
> read from notification-fd; I can probably reduce the size of this
> buffer and make sure the amount of needed stack pages never goes
> above 2.
>
>   So in total, the incompressible amount of private mappings is 4 to 5
> pages (16 to 20 kB). All the other mappings are libc overhead.
>
>   - the libpthread-2.31.so mapping uses 8 kB
>   - the librt-2.31.so mapping uses 8 kB
>   - the libc-2.31.so mapping uses 16 kB
>   - the libskarnet.so mapping uses 12 kB
>   - ld.so, the dynamic linker itself, uses 16 kB
>   - there are 16 kB of anonymous mappings
>
>   This is some serious waste; unfortunately, it's pretty much to be
> expected from glibc, which suffers from decades of misdesign and
> tunnel vision especially where dynamic linking is concerned. We are,
> unfortunately, experiencing the consequences of technical debt.
>
>   Linking against the static version of skalibs (--enable-allstatic)
> should save you at least 12 kB (and probably 16) per instance of
> s6-supervise. You should have noticed the improvement; your amount of
> private memory should have dropped by at least 1.5MB when you switched
> to --enable-allstatic.
>   But I understand it is not enough.
>
>   Unfortunately, once you have removed the libskarnet.so mappings,
> it's basically down to the libc, and to achieve further improvements
> I have no other suggestions than to change libcs.
>
> >If possible, can you please share us a reference smap and ps_mem data on
> >s6-supervise. That would really help.
>
>   I don't use ps_mem, but here are the details of a s6-supervise process
> on the skarnet.org server. s6 is linked statically against the musl
> libc, which means:
>   - the text segments are bigger (drawback of static linking)
>   - there are fewer mappings (advantage of static linking, but even when
> you're linking dynamically against musl it maps as little as it can)
>   - the mappings have little libc overhead (advantage of musl)
>
> # cat smaps_rollup
>
> 00400000-7ffd53096000 ---p 00000000 00:00 0  [rollup]
> Rss:                  64 kB
> Pss:                  36 kB
> Pss_Anon:             20 kB
> Pss_File:             16 kB
> Pss_Shmem:             0 kB
> Shared_Clean:         40 kB
> Shared_Dirty:          0 kB
> Private_Clean:         8 kB
> Private_Dirty:        16 kB
> Referenced:           64 kB
> Anonymous:            20 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
>
>   You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of
> Private_Clean - apparently there's one Private_Clean page of static
> data and one of stack; I have no idea what this corresponds to in the
> code, I will need to investigate and see if it can be trimmed down.
>
> # grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:'
> smaps
>
> 00400000-00409000 r-xp 00000000 ca:00 659178  /command/s6-supervise
> Private_Dirty:         0 kB
> 00609000-0060b000 rw-p 00009000 ca:00 659178  /command/s6-supervise
> Private_Dirty:         4 kB
> 02462000-02463000 ---p 00000000 00:00 0  [heap]
> Private_Dirty:         0 kB
> 02463000-02464000 rw-p 00000000 00:00 0  [heap]
> Private_Dirty:         4 kB
> 7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0  [stack]
> Private_Dirty:         8 kB
> 7ffd53090000-7ffd53094000 r--p 00000000 00:00 0  [vvar]
> Private_Dirty:         0 kB
> 7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0  [vdso]
> Private_Dirty:         0 kB
>
>   One page of static data, one page of heap, two pages of stack (that
> I should probably be able to get down to one). All the other mappings
> are shared, except those weird two pages of Private_Clean that I don't
> understand yet.
>   As you can see, it is as close to incompressible as it gets. If I had
> 129 of these processes, without changing anything, they would use
> something like: (16+8) * 129 + 40 = 3136 kB of RAM. Which is still
> bigger than the theoretical minimum - I need to get rid of those two
> Private_Clean pages - but much more acceptable than the 12.2 MB you get
> from glibc.
>
>
>   I was going to post this as is, but for completion's sake and my
> peace of mind, I fired up an Alpine Linux VM and checked /proc for
> a s6-supervise process. Alpine Linux uses musl, but with dynamic
> linking, and --disable-allstatic. The results are mixed:
>
>   - 8 kB of static data (why is it more than the static case?)
>   - 4 kB of heap
>   - 8 kB of stack
>     (So far so good, more or less.)
>   - 16 kB for libskarnet.so (why is it more than glibc uses?)
>   - 8 kB of anonymous mapping related to libskarnet.so
>   - 8 kB for libc.so
>   - 8 kB of anonymous mapping related to libc.so
>
>   That's better than glibc, but is still 40kB of overhead compared to
> a static build, plus 4 kB of static data that I don't understand.
> Total is 60 kB, which would net 7.7MB + shared for 129 instances.
> Linking libskarnet statically would likely save 24kB per instance, so
> the total RAM for --enable-allstatic would be 4.6MB + shared. Which
> is starting to sound close to acceptable.
>
>   My takeaway from this is that dynamic linking, despite being essential
> for distributions (for ease of upgrade, maintenance, and security
> reasons), is definitely _not free_. It has a high fixed cost in RAM;
> this is not noticeable when using few instances of large, bloated
> processes - which is how a lot of software operates - but it is very
> noticeable when using a lot of instances of small, efficient processes,
> where the costs of dynamic linking overshadow the legit RAM use of said
> processes.
>
>   In other words: the way s6 works is a worst case for dynamic linking,
> and especially dynamic linking with glibc. I'm sorry.
>
>   If you want to attempt building static binaries of s6 with musl, you
> can find musl toolchains at https://skarnet.org/toolchains/ or
> at https://musl.cc/ . Please bear in mind you will need to build the
> whole stack with the same toolchain (skalibs, execline, s6).
>
>
>
>   Dewayne:
>
> >>  Thanks Laurent, that's really interesting.  By comparison, my FBSD
> >>  system uses:
> >>
> >>  # ps -axw -o pid,vsz,rss,time,comm | grep s6
>
>   Well that's the problem with ps: VSZ and RSS won't give you the real
> information, because they include shared mappings in their numbers.
> To get a reasonably accurate estimation of the marginal increase on
> one additional process, you need to know what is shared and what is
> private, and ps doesn't tell you that. There is probably a way to get
> the information on FreeBSD, but I don't know what it is.
>
>   Yes, the FreeBSD libc is relatively large, but it's pretty decent
> compared to glibc. I suspect the marginal increase on one s6-supervise
> process on FreeBSD is somewhere between what you get with musl and
> what you get with glibc.
>
> --
>   Laurent
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-10  3:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08  9:13 Query on s6-log and s6-supervise Arjun D R
2021-06-08 11:04 ` Laurent Bercot
2021-06-09  2:19   ` Dewayne Geraghty
2021-06-09  3:30     ` Arjun D R
2021-06-09  8:32       ` Colin Booth
2021-06-09 11:48       ` Laurent Bercot
2021-06-10  3:54         ` Arjun D R
     [not found]     ` <CAHJ2E=n4+bfO39LYfGaTpaqPGtPHUSy32++4t4n+PjdZz+S=Cw@mail.gmail.com>
2021-06-09  3:40       ` Dewayne Geraghty
2021-06-09  5:01         ` Arjun D R

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).