From: Arjun D R <drarjun95@gmail.com>
To: Laurent Bercot <ska-supervision@skarnet.org>
Cc: Dewayne Geraghty <dewayne@heuristicsystems.com.au>,
supervision@list.skarnet.org
Subject: Re: Query on s6-log and s6-supervise
Date: Thu, 10 Jun 2021 09:24:16 +0530 [thread overview]
Message-ID: <CAHJ2E=m1nvdWxDx8ZG9EnVXXXLTirTMKXqAtUCXQntm7HTAkxQ@mail.gmail.com> (raw)
In-Reply-To: <em78ce2a1b-c0ce-429e-ad3d-88c25286b086@elzian>
[-- Attachment #1: Type: text/plain, Size: 8822 bytes --]
Thanks Laurent and Colin for the suggestions. I will try to build a fully
static s6 with musl toolchain. Thanks for the detailed analysis
once again.
--
Arjun
On Wed, Jun 9, 2021 at 5:18 PM Laurent Bercot <ska-supervision@skarnet.org>
wrote:
> >I have checked the Private_Dirty memory in "smaps" of a s6-supervise
> >process and I don't see any consuming above 8kB. Just posting it here
> >for reference.
>
> Indeed, each mapping is small, but you have *a lot* of them. The
> sum of all the Private_Dirty in your mappings, that should be shown
> in smaps_rollup, is 96 kB. 24 pages! That is _huge_.
>
> In this list, the mappings that are really used by s6-supervise (i.e.
> the incompressible amount of unshareable memory) are the following:
>
> - the /bin/s6-supervise section: this is static data, s6-supervise
> needs a little, but it should not take more than one page.
>
> - the [heap] section: this is dynamically allocated memory, and for
> s6-supervise it should not be bigger than 4 kB. s6-supervise does not
> allocate dynamic memory itself, the presence of a heap section is due
> to opendir() which needs dynamic buffers; the size of the buffer is
> determined by the libc, and anything more than one page is wasteful.
>
> ( - anonymous mappings are really memory dynamically allocated for
> internal libc purposes; they do not show up in [heap] because they're
> not obtained via malloc(). No function used by s6-supervise should
> ever need those; any anonymous mapping you see is libc shenanigans
> and counts as overhead. )
>
> - the [stack] section: this is difficult to control because the
> amount of stack a process uses depends a lot on the compiler, the
> compilation flags, etc. When built with -O2, s6-supervise should not
> use more than 2-3 pages of stack. This includes a one-page buffer to
> read from notification-fd; I can probably reduce the size of this
> buffer and make sure the amount of needed stack pages never goes
> above 2.
>
> So in total, the incompressible amount of private mappings is 4 to 5
> pages (16 to 20 kB). All the other mappings are libc overhead.
>
> - the libpthread-2.31.so mapping uses 8 kB
> - the librt-2.31.so mapping uses 8 kB
> - the libc-2.31.so mapping uses 16 kB
> - the libskarnet.so mapping uses 12 kB
> - ld.so, the dynamic linker itself, uses 16 kB
> - there are 16 kB of anonymous mappings
>
> This is some serious waste; unfortunately, it's pretty much to be
> expected from glibc, which suffers from decades of misdesign and
> tunnel vision especially where dynamic linking is concerned. We are,
> unfortunately, experiencing the consequences of technical debt.
>
> Linking against the static version of skalibs (--enable-allstatic)
> should save you at least 12 kB (and probably 16) per instance of
> s6-supervise. You should have noticed the improvement; your amount of
> private memory should have dropped by at least 1.5MB when you switched
> to --enable-allstatic.
> But I understand it is not enough.
>
> Unfortunately, once you have removed the libskarnet.so mappings,
> it's basically down to the libc, and to achieve further improvements
> I have no other suggestions than to change libcs.
>
> >If possible, can you please share us a reference smap and ps_mem data on
> >s6-supervise. That would really help.
>
> I don't use ps_mem, but here are the details of a s6-supervise process
> on the skarnet.org server. s6 is linked statically against the musl
> libc, which means:
> - the text segments are bigger (drawback of static linking)
> - there are fewer mappings (advantage of static linking, but even when
> you're linking dynamically against musl it maps as little as it can)
> - the mappings have little libc overhead (advantage of musl)
>
> # cat smaps_rollup
>
> 00400000-7ffd53096000 ---p 00000000 00:00 0 [rollup]
> Rss: 64 kB
> Pss: 36 kB
> Pss_Anon: 20 kB
> Pss_File: 16 kB
> Pss_Shmem: 0 kB
> Shared_Clean: 40 kB
> Shared_Dirty: 0 kB
> Private_Clean: 8 kB
> Private_Dirty: 16 kB
> Referenced: 64 kB
> Anonymous: 20 kB
> LazyFree: 0 kB
> AnonHugePages: 0 kB
> ShmemPmdMapped: 0 kB
> FilePmdMapped: 0 kB
> Shared_Hugetlb: 0 kB
> Private_Hugetlb: 0 kB
> Swap: 0 kB
> SwapPss: 0 kB
> Locked: 0 kB
>
> You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of
> Private_Clean - apparently there's one Private_Clean page of static
> data and one of stack; I have no idea what this corresponds to in the
> code, I will need to investigate and see if it can be trimmed down.
>
> # grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:'
> smaps
>
> 00400000-00409000 r-xp 00000000 ca:00 659178 /command/s6-supervise
> Private_Dirty: 0 kB
> 00609000-0060b000 rw-p 00009000 ca:00 659178 /command/s6-supervise
> Private_Dirty: 4 kB
> 02462000-02463000 ---p 00000000 00:00 0 [heap]
> Private_Dirty: 0 kB
> 02463000-02464000 rw-p 00000000 00:00 0 [heap]
> Private_Dirty: 4 kB
> 7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0 [stack]
> Private_Dirty: 8 kB
> 7ffd53090000-7ffd53094000 r--p 00000000 00:00 0 [vvar]
> Private_Dirty: 0 kB
> 7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0 [vdso]
> Private_Dirty: 0 kB
>
> One page of static data, one page of heap, two pages of stack (that
> I should probably be able to get down to one). All the other mappings
> are shared, except those weird two pages of Private_Clean that I don't
> understand yet.
> As you can see, it is as close to incompressible as it gets. If I had
> 129 of these processes, without changing anything, they would use
> something like: (16+8) * 129 + 40 = 3136 kB of RAM. Which is still
> bigger than the theoretical minimum - I need to get rid of those two
> Private_Clean pages - but much more acceptable than the 12.2 MB you get
> from glibc.
>
>
> I was going to post this as is, but for completion's sake and my
> peace of mind, I fired up an Alpine Linux VM and checked /proc for
> a s6-supervise process. Alpine Linux uses musl, but with dynamic
> linking, and --disable-allstatic. The results are mixed:
>
> - 8 kB of static data (why is it more than the static case?)
> - 4 kB of heap
> - 8 kB of stack
> (So far so good, more or less.)
> - 16 kB for libskarnet.so (why is it more than glibc uses?)
> - 8 kB of anonymous mapping related to libskarnet.so
> - 8 kB for libc.so
> - 8 kB of anonymous mapping related to libc.so
>
> That's better than glibc, but is still 40kB of overhead compared to
> a static build, plus 4 kB of static data that I don't understand.
> Total is 60 kB, which would net 7.7MB + shared for 129 instances.
> Linking libskarnet statically would likely save 24kB per instance, so
> the total RAM for --enable-allstatic would be 4.6MB + shared. Which
> is starting to sound close to acceptable.
>
> My takeaway from this is that dynamic linking, despite being essential
> for distributions (for ease of upgrade, maintenance, and security
> reasons), is definitely _not free_. It has a high fixed cost in RAM;
> this is not noticeable when using few instances of large, bloated
> processes - which is how a lot of software operates - but it is very
> noticeable when using a lot of instances of small, efficient processes,
> where the costs of dynamic linking overshadow the legit RAM use of said
> processes.
>
> In other words: the way s6 works is a worst case for dynamic linking,
> and especially dynamic linking with glibc. I'm sorry.
>
> If you want to attempt building static binaries of s6 with musl, you
> can find musl toolchains at https://skarnet.org/toolchains/ or
> at https://musl.cc/ . Please bear in mind you will need to build the
> whole stack with the same toolchain (skalibs, execline, s6).
>
>
>
> Dewayne:
>
> >> Thanks Laurent, that's really interesting. By comparison, my FBSD
> >> system uses:
> >>
> >> # ps -axw -o pid,vsz,rss,time,comm | grep s6
>
> Well that's the problem with ps: VSZ and RSS won't give you the real
> information, because they include shared mappings in their numbers.
> To get a reasonably accurate estimation of the marginal increase on
> one additional process, you need to know what is shared and what is
> private, and ps doesn't tell you that. There is probably a way to get
> the information on FreeBSD, but I don't know what it is.
>
> Yes, the FreeBSD libc is relatively large, but it's pretty decent
> compared to glibc. I suspect the marginal increase on one s6-supervise
> process on FreeBSD is somewhere between what you get with musl and
> what you get with glibc.
>
> --
> Laurent
>
>
next prev parent reply other threads:[~2021-06-10 3:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-08 9:13 Arjun D R
2021-06-08 11:04 ` Laurent Bercot
2021-06-09 2:19 ` Dewayne Geraghty
2021-06-09 3:30 ` Arjun D R
2021-06-09 8:32 ` Colin Booth
2021-06-09 11:48 ` Laurent Bercot
2021-06-10 3:54 ` Arjun D R [this message]
[not found] ` <CAHJ2E=n4+bfO39LYfGaTpaqPGtPHUSy32++4t4n+PjdZz+S=Cw@mail.gmail.com>
2021-06-09 3:40 ` Dewayne Geraghty
2021-06-09 5:01 ` Arjun D R
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHJ2E=m1nvdWxDx8ZG9EnVXXXLTirTMKXqAtUCXQntm7HTAkxQ@mail.gmail.com' \
--to=drarjun95@gmail.com \
--cc=dewayne@heuristicsystems.com.au \
--cc=ska-supervision@skarnet.org \
--cc=supervision@list.skarnet.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).