Thanks Laurent and Colin for the suggestions. I will try to build a fully static s6 with musl toolchain. Thanks for the detailed analysis once again. -- Arjun On Wed, Jun 9, 2021 at 5:18 PM Laurent Bercot wrote: > >I have checked the Private_Dirty memory in "smaps" of a s6-supervise > >process and I don't see any consuming above 8kB. Just posting it here > >for reference. > > Indeed, each mapping is small, but you have *a lot* of them. The > sum of all the Private_Dirty in your mappings, that should be shown > in smaps_rollup, is 96 kB. 24 pages! That is _huge_. > > In this list, the mappings that are really used by s6-supervise (i.e. > the incompressible amount of unshareable memory) are the following: > > - the /bin/s6-supervise section: this is static data, s6-supervise > needs a little, but it should not take more than one page. > > - the [heap] section: this is dynamically allocated memory, and for > s6-supervise it should not be bigger than 4 kB. s6-supervise does not > allocate dynamic memory itself, the presence of a heap section is due > to opendir() which needs dynamic buffers; the size of the buffer is > determined by the libc, and anything more than one page is wasteful. > > ( - anonymous mappings are really memory dynamically allocated for > internal libc purposes; they do not show up in [heap] because they're > not obtained via malloc(). No function used by s6-supervise should > ever need those; any anonymous mapping you see is libc shenanigans > and counts as overhead. ) > > - the [stack] section: this is difficult to control because the > amount of stack a process uses depends a lot on the compiler, the > compilation flags, etc. When built with -O2, s6-supervise should not > use more than 2-3 pages of stack. This includes a one-page buffer to > read from notification-fd; I can probably reduce the size of this > buffer and make sure the amount of needed stack pages never goes > above 2. > > So in total, the incompressible amount of private mappings is 4 to 5 > pages (16 to 20 kB). All the other mappings are libc overhead. > > - the libpthread-2.31.so mapping uses 8 kB > - the librt-2.31.so mapping uses 8 kB > - the libc-2.31.so mapping uses 16 kB > - the libskarnet.so mapping uses 12 kB > - ld.so, the dynamic linker itself, uses 16 kB > - there are 16 kB of anonymous mappings > > This is some serious waste; unfortunately, it's pretty much to be > expected from glibc, which suffers from decades of misdesign and > tunnel vision especially where dynamic linking is concerned. We are, > unfortunately, experiencing the consequences of technical debt. > > Linking against the static version of skalibs (--enable-allstatic) > should save you at least 12 kB (and probably 16) per instance of > s6-supervise. You should have noticed the improvement; your amount of > private memory should have dropped by at least 1.5MB when you switched > to --enable-allstatic. > But I understand it is not enough. > > Unfortunately, once you have removed the libskarnet.so mappings, > it's basically down to the libc, and to achieve further improvements > I have no other suggestions than to change libcs. > > >If possible, can you please share us a reference smap and ps_mem data on > >s6-supervise. That would really help. > > I don't use ps_mem, but here are the details of a s6-supervise process > on the skarnet.org server. s6 is linked statically against the musl > libc, which means: > - the text segments are bigger (drawback of static linking) > - there are fewer mappings (advantage of static linking, but even when > you're linking dynamically against musl it maps as little as it can) > - the mappings have little libc overhead (advantage of musl) > > # cat smaps_rollup > > 00400000-7ffd53096000 ---p 00000000 00:00 0 [rollup] > Rss: 64 kB > Pss: 36 kB > Pss_Anon: 20 kB > Pss_File: 16 kB > Pss_Shmem: 0 kB > Shared_Clean: 40 kB > Shared_Dirty: 0 kB > Private_Clean: 8 kB > Private_Dirty: 16 kB > Referenced: 64 kB > Anonymous: 20 kB > LazyFree: 0 kB > AnonHugePages: 0 kB > ShmemPmdMapped: 0 kB > FilePmdMapped: 0 kB > Shared_Hugetlb: 0 kB > Private_Hugetlb: 0 kB > Swap: 0 kB > SwapPss: 0 kB > Locked: 0 kB > > You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of > Private_Clean - apparently there's one Private_Clean page of static > data and one of stack; I have no idea what this corresponds to in the > code, I will need to investigate and see if it can be trimmed down. > > # grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:' > smaps > > 00400000-00409000 r-xp 00000000 ca:00 659178 /command/s6-supervise > Private_Dirty: 0 kB > 00609000-0060b000 rw-p 00009000 ca:00 659178 /command/s6-supervise > Private_Dirty: 4 kB > 02462000-02463000 ---p 00000000 00:00 0 [heap] > Private_Dirty: 0 kB > 02463000-02464000 rw-p 00000000 00:00 0 [heap] > Private_Dirty: 4 kB > 7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0 [stack] > Private_Dirty: 8 kB > 7ffd53090000-7ffd53094000 r--p 00000000 00:00 0 [vvar] > Private_Dirty: 0 kB > 7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0 [vdso] > Private_Dirty: 0 kB > > One page of static data, one page of heap, two pages of stack (that > I should probably be able to get down to one). All the other mappings > are shared, except those weird two pages of Private_Clean that I don't > understand yet. > As you can see, it is as close to incompressible as it gets. If I had > 129 of these processes, without changing anything, they would use > something like: (16+8) * 129 + 40 = 3136 kB of RAM. Which is still > bigger than the theoretical minimum - I need to get rid of those two > Private_Clean pages - but much more acceptable than the 12.2 MB you get > from glibc. > > > I was going to post this as is, but for completion's sake and my > peace of mind, I fired up an Alpine Linux VM and checked /proc for > a s6-supervise process. Alpine Linux uses musl, but with dynamic > linking, and --disable-allstatic. The results are mixed: > > - 8 kB of static data (why is it more than the static case?) > - 4 kB of heap > - 8 kB of stack > (So far so good, more or less.) > - 16 kB for libskarnet.so (why is it more than glibc uses?) > - 8 kB of anonymous mapping related to libskarnet.so > - 8 kB for libc.so > - 8 kB of anonymous mapping related to libc.so > > That's better than glibc, but is still 40kB of overhead compared to > a static build, plus 4 kB of static data that I don't understand. > Total is 60 kB, which would net 7.7MB + shared for 129 instances. > Linking libskarnet statically would likely save 24kB per instance, so > the total RAM for --enable-allstatic would be 4.6MB + shared. Which > is starting to sound close to acceptable. > > My takeaway from this is that dynamic linking, despite being essential > for distributions (for ease of upgrade, maintenance, and security > reasons), is definitely _not free_. It has a high fixed cost in RAM; > this is not noticeable when using few instances of large, bloated > processes - which is how a lot of software operates - but it is very > noticeable when using a lot of instances of small, efficient processes, > where the costs of dynamic linking overshadow the legit RAM use of said > processes. > > In other words: the way s6 works is a worst case for dynamic linking, > and especially dynamic linking with glibc. I'm sorry. > > If you want to attempt building static binaries of s6 with musl, you > can find musl toolchains at https://skarnet.org/toolchains/ or > at https://musl.cc/ . Please bear in mind you will need to build the > whole stack with the same toolchain (skalibs, execline, s6). > > > > Dewayne: > > >> Thanks Laurent, that's really interesting. By comparison, my FBSD > >> system uses: > >> > >> # ps -axw -o pid,vsz,rss,time,comm | grep s6 > > Well that's the problem with ps: VSZ and RSS won't give you the real > information, because they include shared mappings in their numbers. > To get a reasonably accurate estimation of the marginal increase on > one additional process, you need to know what is shared and what is > private, and ps doesn't tell you that. There is probably a way to get > the information on FreeBSD, but I don't know what it is. > > Yes, the FreeBSD libc is relatively large, but it's pretty decent > compared to glibc. I suspect the marginal increase on one s6-supervise > process on FreeBSD is somewhere between what you get with musl and > what you get with glibc. > > -- > Laurent > >