[-- Attachment #1: Type: text/plain, Size: 1187 bytes --] Hi Team, I would like to hear from you for a few queries. Please help. 1. Why do we need to have separate supervisors for producer and consumer long run services? Is it possible to have one supervisor for both producer and consumer, because anyhow the consumer service need not to run when the producer is down. I can understand that s6 supervisor is meant to monitor only one service, but why not monitor a couple of services when it is logically valid if I am not wrong. 2. Is it possible to have a single supervisor for a bundle of services? Like, one supervisor for a bundle (consisting of few services)? 3. Generally how many instances of s6-supervise can run? We are running into a problem where we have 129 instances of s6-supervise that leads to higher memory consumption. We are migrating from systemd to s6 init system considering the light weight, but we have a lot of s6-log and s6-supervise instances that results in higher memory usage compared to systemd. Is it fine to have this many number of s6-supervise instances? ps_mem data - 5.5 MiB s6-log (46) , 14.3 MiB s6-supervise (129) Thanks, Arjun
> 1. Why do we need to have separate supervisors for producer and consumer > long run services? Is it possible to have one supervisor for both producer > and consumer, because anyhow the consumer service need not to run when the > producer is down. I can understand that s6 supervisor is meant to monitor > only one service, but why not monitor a couple of services when it is > logically valid if I am not wrong. Hi Arjun, The logic of the supervisor is already complex enough when it has to monitor one process. It would be quadratically as complex if it had to monitor two. In all likeliness, the first impact of such a change would be more bugs, because the logic would be a lot more difficult to understand and maintain. The amount of memory used by the s6 logic itself would not change (or would *increase* somewhat) if the code was organized in a different way in order to reduce the amount of processes, and you would see an overall decrease in code quality. Worsening the design to offset operational costs is not a good trade-off - it is not "logically valid", as you put it. I would not do it even if the high amount of memory consumed by your processes was due to s6 itself. But it is not the case: your operational costs are due to something else. See below. > > 2. Is it possible to have a single supervisor for a bundle of services? > Like, one supervisor for a bundle (consisting of few services)? Again, there would be no engineering benefit to that. You would likely see operational benefits, yes, but s6 is the wrong place to try and get those benefits, because it is not the cause of your operational costs. > 3. Generally how many instances of s6-supervise can run? We are running > into a problem where we have 129 instances of s6-supervise that leads to > higher memory consumption. We are migrating from systemd to s6 init system > considering the light weight, but we have a lot of s6-log and s6-supervise > instances that results in higher memory usage compared to systemd. Is it > fine to have this many number of s6-supervise instances? ps_mem data - > 5.5 MiB s6-log (46) , 14.3 MiB s6-supervise (129) It is normally totally fine to have this many number of s6-supervise instances (and of s6-log instances), and it is the intended usage. The skarnet.org server only has 256 MB of RAM, and currently sports 93 instances of s6-supervise (and 44 instances of s6-log) without any trouble. It could triple that amount without breaking a sweat. The real problem here is that your instances appear to use so much memory: *that* is not normal. Every s6-supervise process should use at most 4 pages (16k) of private dirty memory, so for 129 processes I would expect the memory usage to be around 2.1 MB. Your reported total shows 7 times as much, which sounds totally out of bounds to me, and even accounting for normal operational overhead, a factor of 7 is *completely bonkers*. There are two possible explanations here: - Either ps_mem is not accurately tallying the memory used by a given set of processes; - Or you are using a libc with an incredible amount of overhead, and your libc (and in particular, I suspect, dynamic linking management in your libc) is the culprit for the insane amount of memory that the s6-supervise processes seem to be eating. The easiest way to understand what's going on is to find a s6-supervise process's pid, and to perform # cat /proc/$pid/smaps_rollup That will tell you what's going on for the chosen s6-supervise process (they're all similar, so the number for the other s6-supervise processes won't be far off). In particular, look at the Private_Dirty line: that is the "real" amount of uncompressible memory used by that process. It should be around 16k, tops. Anything over that is overhead from your libc. If the value is not too much over 16k, then ps_mem is simply lying to you and there is nothing to worry about, except that you should use another tool to tally memory usage. But if the value is much higher, then it is time to diagnose deeper: # cat /proc/$pid/smaps That will show you all the mappings performed by your libc, and the amount of memory that each of these mappings uses. Again, the most important lines are the Private_Dirty ones - these are the values that add up for every s6-supervise instance. My hunch is that you will see *a lot* of mappings, each using 4k or 8k, or even in some cases 12k, of Private_Dirty memory. If it is the case, unfortunately there is nothing I can do about it, because that overhead is entirely caused by your libc. However, there is something *you* can do about it: - If "ldd /bin/s6-supervise" gives you a line mentioning libs6.so or libskarnet.so, try recompiling s6 with --enable-allstatic. This will link against the static version of libs6 and libskarnet, which will alleviate the costs of dynamic linking. (The price is that the *text* of s6-supervise will be a little bigger, but it doesn't matter: text is Shared_Clean, the cost is only incurred once). That alone should decrease your memory usage by a lot. If that is still not enough, then it means your libc is trash. Sorry, there is no other word. If you are on Linux and using glibc (which, indeed, is trash), you can try building skalibs+execline+s6 against the musl libc (https://musl.libc.org/); and not only will that allow you to use s6 as intended, with hundreds of s6-supervise instances without having to worry about memory usage - because musl has very little overhead - but your s6 binaries will also be smaller and faster. I hope this will help you, and I hope this unfortunate report can serve as an illustration of *why* it is important to minimize overhead at every level of a system, especially at lower levels and particularly in the libc. -- Laurent
Thanks Laurent, that's really interesting. By comparison, my FBSD system uses: # ps -axw -o pid,vsz,rss,time,comm | grep s6 virt KB resident cpu total 38724 10904 1600 0:00.02 s6-log 41848 10788 1552 0:00.03 s6-log 42138 10848 1576 0:00.01 s6-log 42222 10888 1596 0:00.02 s6-log 45878 10784 1516 0:00.00 s6-svscan 54453 10792 1544 0:00.00 s6-supervise ... lots ... 67937 10792 1540 0:00.00 s6-supervise 76442 10724 1484 0:00.01 s6-ipcserverd 76455 11364 1600 0:00.01 s6-fdholderd 84229 10896 712 0:00.01 s6-log Processes pull-in both ld-elf and libc.so, from procstat -v start end path 0x1021000 0x122a000 /usr/local/bin/s6-supervise 0x801229000 0x80124f000 /libexec/ld-elf.so.1 0x801272000 0x80144c000 /lib/libc.so.7 Yes - libc is ... large. Arjun, if you want to reduce the number of s6-log processes perhaps consider piping them to a file which s6-log reads from. For example we maintain various web servers, the accesses are unique and of interest to customers, but they don't (really) care about the errors so we aggregate this with one s6-log. Works very well :)
[-- Attachment #1: Type: text/plain, Size: 6402 bytes --] Thanks Laurent for the brief detail. That really helps. I have checked the Private_Dirty memory in "smaps" of a s6-supervise process and I don't see any consuming above 8kB. Just posting it here for reference. grep Private_Dirty /proc/991/smaps Private_Dirty: 0 kB Private_Dirty: 4 kB Private_Dirty: 4 kB Private_Dirty: 8 kB Private_Dirty: 0 kB Private_Dirty: 0 kB Private_Dirty: 4 kB Private_Dirty: 4 kB Private_Dirty: 4 kB Private_Dirty: 0 kB Private_Dirty: 0 kB Private_Dirty: 4 kB Private_Dirty: 4 kB Private_Dirty: 0 kB Private_Dirty: 0 kB Private_Dirty: 8 kB Private_Dirty: 8 kB Private_Dirty: 8 kB Private_Dirty: 0 kB Private_Dirty: 0 kB Private_Dirty: 8 kB Private_Dirty: 4 kB Private_Dirty: 4 kB Private_Dirty: 0 kB Private_Dirty: 8 kB Private_Dirty: 4 kB Private_Dirty: 4 kB Private_Dirty: 4 kB Private_Dirty: 0 kB Private_Dirty: 0 kB cat /proc/991/smaps 00010000-00014000 r-xp 00000000 07:00 174 /bin/s6-supervise 00023000-00024000 r--p 00003000 07:00 174 /bin/s6-supervise 00024000-00025000 rw-p 00004000 07:00 174 /bin/s6-supervise 00025000-00046000 rw-p 00000000 00:00 0 [heap] b6e1c000-b6e2d000 r-xp 00000000 07:00 3652 /lib/libpthread-2.31.so b6e2d000-b6e3c000 ---p 00011000 07:00 3652 /lib/libpthread-2.31.so b6e3c000-b6e3d000 r--p 00010000 07:00 3652 /lib/libpthread-2.31.so b6e3d000-b6e3e000 rw-p 00011000 07:00 3652 /lib/libpthread-2.31.so b6e3e000-b6e40000 rw-p 00000000 00:00 0 b6e40000-b6e45000 r-xp 00000000 07:00 3656 /lib/librt-2.31.so b6e45000-b6e54000 ---p 00005000 07:00 3656 /lib/librt-2.31.so b6e54000-b6e55000 r--p 00004000 07:00 3656 /lib/librt-2.31.so b6e55000-b6e56000 rw-p 00005000 07:00 3656 /lib/librt-2.31.so b6e56000-b6f19000 r-xp 00000000 07:00 3613 /lib/libc-2.31.so b6f19000-b6f28000 ---p 000c3000 07:00 3613 /lib/libc-2.31.so b6f28000-b6f2a000 r--p 000c2000 07:00 3613 /lib/libc-2.31.so b6f2a000-b6f2c000 rw-p 000c4000 07:00 3613 /lib/libc-2.31.so b6f2c000-b6f2e000 rw-p 00000000 00:00 0 b6f2e000-b6f4d000 r-xp 00000000 07:00 3665 /lib/libskarnet.so.2.9.2.1 b6f4d000-b6f5c000 ---p 0001f000 07:00 3665 /lib/libskarnet.so.2.9.2.1 b6f5c000-b6f5e000 r--p 0001e000 07:00 3665 /lib/libskarnet.so.2.9.2.1 b6f5e000-b6f5f000 rw-p 00020000 07:00 3665 /lib/libskarnet.so.2.9.2.1 b6f5f000-b6f6b000 rw-p 00000000 00:00 0 b6f6b000-b6f81000 r-xp 00000000 07:00 3605 /lib/ld-2.31.so b6f87000-b6f89000 rw-p 00000000 00:00 0 b6f91000-b6f92000 r--p 00016000 07:00 3605 /lib/ld-2.31.so b6f92000-b6f93000 rw-p 00017000 07:00 3605 /lib/ld-2.31.so beaf8000-beb19000 rw-p 00000000 00:00 0 [stack] Size: 132 kB Rss: 4 kB Pss: 4 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 4 kB Referenced: 4 kB Anonymous: 4 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me gd ac becd5000-becd6000 r-xp 00000000 00:00 0 [sigpage] ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors] Sorry I am not able to post the whole data considering the mail size. On my Linux system, ps -axw -o pid,vsz,rss,time,comm | grep s6 1 1732 1128 00:00:06 s6-svscan 900 1736 452 00:00:00 s6-supervise 901 1736 480 00:00:00 s6-supervise 902 1736 444 00:00:00 s6-supervise 903 1736 444 00:00:00 s6-supervise 907 1744 496 00:00:00 s6-log ..... And I don't think ps_mem is lying , I just compared it with smem as well. Clear data on ps_mem: Private + Shared = RAM used Program 4.8 MiB + 786.0 KiB = 5.5 MiB s6-log (46) 12.2 MiB + 2.1 MiB = 14.3 MiB s6-supervise (129) smem: PID User Command Swap USS PSS RSS 1020 root s6-supervise wpa_supplicant 0 96 98 996 2001 root s6-log -F wpa_supplicant.lo 0 104 106 1128 Same(almost) amount of PSS/RSS are used by other s6-supervise and s6-log processes. I have tried with flag "--enable-allstatic" and unfortunately I don't see any improvement. If you were mentioning about shared memory, then yes we are good here. It is using 2.1 MiB for 129 instances, but the private memory is around 12.2 MiB. I am not sure whether this is the normal value or not. If possible, can you please share us a reference smap and ps_mem data on s6-supervise. That would really help. Dewayne, even though we pipe it to a file, we will be having a s6-supervisor for the log service. Maybe I didn't understand it well. Sorry about it. Please help me with that. Thanks, Arjun On Wed, Jun 9, 2021 at 8:18 AM Dewayne Geraghty < dewayne@heuristicsystems.com.au> wrote: > Thanks Laurent, that's really interesting. By comparison, my FBSD > system uses: > > # ps -axw -o pid,vsz,rss,time,comm | grep s6 > virt KB resident cpu total > 38724 10904 1600 0:00.02 s6-log > 41848 10788 1552 0:00.03 s6-log > 42138 10848 1576 0:00.01 s6-log > 42222 10888 1596 0:00.02 s6-log > 45878 10784 1516 0:00.00 s6-svscan > 54453 10792 1544 0:00.00 s6-supervise > ... lots ... > 67937 10792 1540 0:00.00 s6-supervise > 76442 10724 1484 0:00.01 s6-ipcserverd > 76455 11364 1600 0:00.01 s6-fdholderd > 84229 10896 712 0:00.01 s6-log > > Processes pull-in both ld-elf and libc.so, from procstat -v > start end path > 0x1021000 0x122a000 /usr/local/bin/s6-supervise > 0x801229000 0x80124f000 /libexec/ld-elf.so.1 > 0x801272000 0x80144c000 /lib/libc.so.7 > > Yes - libc is ... large. > > Arjun, if you want to reduce the number of s6-log processes perhaps > consider piping them to a file which s6-log reads from. For example we > maintain various web servers, the accesses are unique and of interest to > customers, but they don't (really) care about the errors so we aggregate > this with one s6-log. Works very well :) >
Apologies, I'd implied that we have multiple s6-supervise processes running and their children pipe to one file which is read by one s6-log file. You can achieve this outcome by using s6-rc's, where one consumer can receive multiple inputs from producers. There is a special (but not unique) case where a program, such as apache which will have explicit log files (defined in apache's config file) to record web-page accesses and error logs, on a per server basis. Because all the supervised apache instances can write to one error logfile, I instructed apache to write to a pipe. Multiple supervised apache instances using the one pipe (aka funnel), which was read by one s6-log. This way reducing the number of (s6-log) processes. I could do the same with the access logs and use the regex function of s6-log, but I tend to simplicity.
[-- Attachment #1: Type: text/plain, Size: 1426 bytes --] Dewayne, Thanks for the details. We already have such an implementation (multiple producers with one consumer) but still our s6-log instances are high. Many of our services require direct logger services. We can reduce the direct logger services by creating a funnel and using regex to separate the logs but that indeed is a risky and complicated process. I am just interested to confirm the memory usage of s6-log and s6-supervise processes. Thanks, Arjun On Wed, Jun 9, 2021 at 9:11 AM Dewayne Geraghty < dewayne@heuristicsystems.com.au> wrote: > Apologies, I'd implied that we have multiple s6-supervise processes > running and their children pipe to one file which is read by one s6-log > file. > > You can achieve this outcome by using s6-rc's, where one consumer can > receive multiple inputs from producers. > > There is a special (but not unique) case where a program, such as apache > which will have explicit log files (defined in apache's config file) to > record web-page accesses and error logs, on a per server basis. Because > all the supervised apache instances can write to one error logfile, I > instructed apache to write to a pipe. Multiple supervised apache > instances using the one pipe (aka funnel), which was read by one s6-log. > This way reducing the number of (s6-log) processes. I could do the > same with the access logs and use the regex function of s6-log, but I > tend to simplicity. >
On Wed, Jun 09, 2021 at 09:00:38AM +0530, Arjun D R wrote: > Thanks Laurent for the brief detail. That really helps. > > I have checked the Private_Dirty memory in "smaps" of a s6-supervise > process and I don't see any consuming above 8kB. Just posting it here > for reference. > > grep Private_Dirty /proc/991/smaps > Private_Dirty: 0 kB > Private_Dirty: 4 kB > Private_Dirty: 4 kB > Private_Dirty: 8 kB > Private_Dirty: 0 kB > Private_Dirty: 0 kB > Private_Dirty: 4 kB > Private_Dirty: 4 kB > Private_Dirty: 4 kB > Private_Dirty: 0 kB > Private_Dirty: 0 kB > Private_Dirty: 4 kB > Private_Dirty: 4 kB > Private_Dirty: 0 kB > Private_Dirty: 0 kB > Private_Dirty: 8 kB > Private_Dirty: 8 kB > Private_Dirty: 8 kB > Private_Dirty: 0 kB > Private_Dirty: 0 kB > Private_Dirty: 8 kB > Private_Dirty: 4 kB > Private_Dirty: 4 kB > Private_Dirty: 0 kB > Private_Dirty: 8 kB > Private_Dirty: 4 kB > Private_Dirty: 4 kB > Private_Dirty: 4 kB > Private_Dirty: 0 kB > Private_Dirty: 0 kB > ... snip... In a fully dynamic world a large number of dirty pages is expected, even if each segment is only one or two pages. There's nothing particularly surprising here when using a distro provided s6 that's linked dynamically. > > I have tried with flag "--enable-allstatic" and unfortunately I don't see > any improvement. If you were mentioning about shared memory, then yes we > are good here. It is using 2.1 MiB for 129 instances, but the private > memory is around 12.2 MiB. I am not sure whether this is the normal value > or not. > If you're building against glibc then allstatic probably won't help. The default config options for s6 will build with static links against all libraries but the libc which is a pretty decent tradeoff between efficiency and convenience, but to see super low dirty memory you'll need to use a libc that can actually be linked statically such as musl. For your needs, using a musl toolchain and building a fully static s6 should get you significantly better memory usage. -- Colin Booth
>I have checked the Private_Dirty memory in "smaps" of a s6-supervise >process and I don't see any consuming above 8kB. Just posting it here >for reference. Indeed, each mapping is small, but you have *a lot* of them. The sum of all the Private_Dirty in your mappings, that should be shown in smaps_rollup, is 96 kB. 24 pages! That is _huge_. In this list, the mappings that are really used by s6-supervise (i.e. the incompressible amount of unshareable memory) are the following: - the /bin/s6-supervise section: this is static data, s6-supervise needs a little, but it should not take more than one page. - the [heap] section: this is dynamically allocated memory, and for s6-supervise it should not be bigger than 4 kB. s6-supervise does not allocate dynamic memory itself, the presence of a heap section is due to opendir() which needs dynamic buffers; the size of the buffer is determined by the libc, and anything more than one page is wasteful. ( - anonymous mappings are really memory dynamically allocated for internal libc purposes; they do not show up in [heap] because they're not obtained via malloc(). No function used by s6-supervise should ever need those; any anonymous mapping you see is libc shenanigans and counts as overhead. ) - the [stack] section: this is difficult to control because the amount of stack a process uses depends a lot on the compiler, the compilation flags, etc. When built with -O2, s6-supervise should not use more than 2-3 pages of stack. This includes a one-page buffer to read from notification-fd; I can probably reduce the size of this buffer and make sure the amount of needed stack pages never goes above 2. So in total, the incompressible amount of private mappings is 4 to 5 pages (16 to 20 kB). All the other mappings are libc overhead. - the libpthread-2.31.so mapping uses 8 kB - the librt-2.31.so mapping uses 8 kB - the libc-2.31.so mapping uses 16 kB - the libskarnet.so mapping uses 12 kB - ld.so, the dynamic linker itself, uses 16 kB - there are 16 kB of anonymous mappings This is some serious waste; unfortunately, it's pretty much to be expected from glibc, which suffers from decades of misdesign and tunnel vision especially where dynamic linking is concerned. We are, unfortunately, experiencing the consequences of technical debt. Linking against the static version of skalibs (--enable-allstatic) should save you at least 12 kB (and probably 16) per instance of s6-supervise. You should have noticed the improvement; your amount of private memory should have dropped by at least 1.5MB when you switched to --enable-allstatic. But I understand it is not enough. Unfortunately, once you have removed the libskarnet.so mappings, it's basically down to the libc, and to achieve further improvements I have no other suggestions than to change libcs. >If possible, can you please share us a reference smap and ps_mem data on >s6-supervise. That would really help. I don't use ps_mem, but here are the details of a s6-supervise process on the skarnet.org server. s6 is linked statically against the musl libc, which means: - the text segments are bigger (drawback of static linking) - there are fewer mappings (advantage of static linking, but even when you're linking dynamically against musl it maps as little as it can) - the mappings have little libc overhead (advantage of musl) # cat smaps_rollup 00400000-7ffd53096000 ---p 00000000 00:00 0 [rollup] Rss: 64 kB Pss: 36 kB Pss_Anon: 20 kB Pss_File: 16 kB Pss_Shmem: 0 kB Shared_Clean: 40 kB Shared_Dirty: 0 kB Private_Clean: 8 kB Private_Dirty: 16 kB Referenced: 64 kB Anonymous: 20 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of Private_Clean - apparently there's one Private_Clean page of static data and one of stack; I have no idea what this corresponds to in the code, I will need to investigate and see if it can be trimmed down. # grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:' smaps 00400000-00409000 r-xp 00000000 ca:00 659178 /command/s6-supervise Private_Dirty: 0 kB 00609000-0060b000 rw-p 00009000 ca:00 659178 /command/s6-supervise Private_Dirty: 4 kB 02462000-02463000 ---p 00000000 00:00 0 [heap] Private_Dirty: 0 kB 02463000-02464000 rw-p 00000000 00:00 0 [heap] Private_Dirty: 4 kB 7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0 [stack] Private_Dirty: 8 kB 7ffd53090000-7ffd53094000 r--p 00000000 00:00 0 [vvar] Private_Dirty: 0 kB 7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0 [vdso] Private_Dirty: 0 kB One page of static data, one page of heap, two pages of stack (that I should probably be able to get down to one). All the other mappings are shared, except those weird two pages of Private_Clean that I don't understand yet. As you can see, it is as close to incompressible as it gets. If I had 129 of these processes, without changing anything, they would use something like: (16+8) * 129 + 40 = 3136 kB of RAM. Which is still bigger than the theoretical minimum - I need to get rid of those two Private_Clean pages - but much more acceptable than the 12.2 MB you get from glibc. I was going to post this as is, but for completion's sake and my peace of mind, I fired up an Alpine Linux VM and checked /proc for a s6-supervise process. Alpine Linux uses musl, but with dynamic linking, and --disable-allstatic. The results are mixed: - 8 kB of static data (why is it more than the static case?) - 4 kB of heap - 8 kB of stack (So far so good, more or less.) - 16 kB for libskarnet.so (why is it more than glibc uses?) - 8 kB of anonymous mapping related to libskarnet.so - 8 kB for libc.so - 8 kB of anonymous mapping related to libc.so That's better than glibc, but is still 40kB of overhead compared to a static build, plus 4 kB of static data that I don't understand. Total is 60 kB, which would net 7.7MB + shared for 129 instances. Linking libskarnet statically would likely save 24kB per instance, so the total RAM for --enable-allstatic would be 4.6MB + shared. Which is starting to sound close to acceptable. My takeaway from this is that dynamic linking, despite being essential for distributions (for ease of upgrade, maintenance, and security reasons), is definitely _not free_. It has a high fixed cost in RAM; this is not noticeable when using few instances of large, bloated processes - which is how a lot of software operates - but it is very noticeable when using a lot of instances of small, efficient processes, where the costs of dynamic linking overshadow the legit RAM use of said processes. In other words: the way s6 works is a worst case for dynamic linking, and especially dynamic linking with glibc. I'm sorry. If you want to attempt building static binaries of s6 with musl, you can find musl toolchains at https://skarnet.org/toolchains/ or at https://musl.cc/ . Please bear in mind you will need to build the whole stack with the same toolchain (skalibs, execline, s6). Dewayne: >> Thanks Laurent, that's really interesting. By comparison, my FBSD >> system uses: >> >> # ps -axw -o pid,vsz,rss,time,comm | grep s6 Well that's the problem with ps: VSZ and RSS won't give you the real information, because they include shared mappings in their numbers. To get a reasonably accurate estimation of the marginal increase on one additional process, you need to know what is shared and what is private, and ps doesn't tell you that. There is probably a way to get the information on FreeBSD, but I don't know what it is. Yes, the FreeBSD libc is relatively large, but it's pretty decent compared to glibc. I suspect the marginal increase on one s6-supervise process on FreeBSD is somewhere between what you get with musl and what you get with glibc. -- Laurent
[-- Attachment #1: Type: text/plain, Size: 8822 bytes --] Thanks Laurent and Colin for the suggestions. I will try to build a fully static s6 with musl toolchain. Thanks for the detailed analysis once again. -- Arjun On Wed, Jun 9, 2021 at 5:18 PM Laurent Bercot <ska-supervision@skarnet.org> wrote: > >I have checked the Private_Dirty memory in "smaps" of a s6-supervise > >process and I don't see any consuming above 8kB. Just posting it here > >for reference. > > Indeed, each mapping is small, but you have *a lot* of them. The > sum of all the Private_Dirty in your mappings, that should be shown > in smaps_rollup, is 96 kB. 24 pages! That is _huge_. > > In this list, the mappings that are really used by s6-supervise (i.e. > the incompressible amount of unshareable memory) are the following: > > - the /bin/s6-supervise section: this is static data, s6-supervise > needs a little, but it should not take more than one page. > > - the [heap] section: this is dynamically allocated memory, and for > s6-supervise it should not be bigger than 4 kB. s6-supervise does not > allocate dynamic memory itself, the presence of a heap section is due > to opendir() which needs dynamic buffers; the size of the buffer is > determined by the libc, and anything more than one page is wasteful. > > ( - anonymous mappings are really memory dynamically allocated for > internal libc purposes; they do not show up in [heap] because they're > not obtained via malloc(). No function used by s6-supervise should > ever need those; any anonymous mapping you see is libc shenanigans > and counts as overhead. ) > > - the [stack] section: this is difficult to control because the > amount of stack a process uses depends a lot on the compiler, the > compilation flags, etc. When built with -O2, s6-supervise should not > use more than 2-3 pages of stack. This includes a one-page buffer to > read from notification-fd; I can probably reduce the size of this > buffer and make sure the amount of needed stack pages never goes > above 2. > > So in total, the incompressible amount of private mappings is 4 to 5 > pages (16 to 20 kB). All the other mappings are libc overhead. > > - the libpthread-2.31.so mapping uses 8 kB > - the librt-2.31.so mapping uses 8 kB > - the libc-2.31.so mapping uses 16 kB > - the libskarnet.so mapping uses 12 kB > - ld.so, the dynamic linker itself, uses 16 kB > - there are 16 kB of anonymous mappings > > This is some serious waste; unfortunately, it's pretty much to be > expected from glibc, which suffers from decades of misdesign and > tunnel vision especially where dynamic linking is concerned. We are, > unfortunately, experiencing the consequences of technical debt. > > Linking against the static version of skalibs (--enable-allstatic) > should save you at least 12 kB (and probably 16) per instance of > s6-supervise. You should have noticed the improvement; your amount of > private memory should have dropped by at least 1.5MB when you switched > to --enable-allstatic. > But I understand it is not enough. > > Unfortunately, once you have removed the libskarnet.so mappings, > it's basically down to the libc, and to achieve further improvements > I have no other suggestions than to change libcs. > > >If possible, can you please share us a reference smap and ps_mem data on > >s6-supervise. That would really help. > > I don't use ps_mem, but here are the details of a s6-supervise process > on the skarnet.org server. s6 is linked statically against the musl > libc, which means: > - the text segments are bigger (drawback of static linking) > - there are fewer mappings (advantage of static linking, but even when > you're linking dynamically against musl it maps as little as it can) > - the mappings have little libc overhead (advantage of musl) > > # cat smaps_rollup > > 00400000-7ffd53096000 ---p 00000000 00:00 0 [rollup] > Rss: 64 kB > Pss: 36 kB > Pss_Anon: 20 kB > Pss_File: 16 kB > Pss_Shmem: 0 kB > Shared_Clean: 40 kB > Shared_Dirty: 0 kB > Private_Clean: 8 kB > Private_Dirty: 16 kB > Referenced: 64 kB > Anonymous: 20 kB > LazyFree: 0 kB > AnonHugePages: 0 kB > ShmemPmdMapped: 0 kB > FilePmdMapped: 0 kB > Shared_Hugetlb: 0 kB > Private_Hugetlb: 0 kB > Swap: 0 kB > SwapPss: 0 kB > Locked: 0 kB > > You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of > Private_Clean - apparently there's one Private_Clean page of static > data and one of stack; I have no idea what this corresponds to in the > code, I will need to investigate and see if it can be trimmed down. > > # grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:' > smaps > > 00400000-00409000 r-xp 00000000 ca:00 659178 /command/s6-supervise > Private_Dirty: 0 kB > 00609000-0060b000 rw-p 00009000 ca:00 659178 /command/s6-supervise > Private_Dirty: 4 kB > 02462000-02463000 ---p 00000000 00:00 0 [heap] > Private_Dirty: 0 kB > 02463000-02464000 rw-p 00000000 00:00 0 [heap] > Private_Dirty: 4 kB > 7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0 [stack] > Private_Dirty: 8 kB > 7ffd53090000-7ffd53094000 r--p 00000000 00:00 0 [vvar] > Private_Dirty: 0 kB > 7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0 [vdso] > Private_Dirty: 0 kB > > One page of static data, one page of heap, two pages of stack (that > I should probably be able to get down to one). All the other mappings > are shared, except those weird two pages of Private_Clean that I don't > understand yet. > As you can see, it is as close to incompressible as it gets. If I had > 129 of these processes, without changing anything, they would use > something like: (16+8) * 129 + 40 = 3136 kB of RAM. Which is still > bigger than the theoretical minimum - I need to get rid of those two > Private_Clean pages - but much more acceptable than the 12.2 MB you get > from glibc. > > > I was going to post this as is, but for completion's sake and my > peace of mind, I fired up an Alpine Linux VM and checked /proc for > a s6-supervise process. Alpine Linux uses musl, but with dynamic > linking, and --disable-allstatic. The results are mixed: > > - 8 kB of static data (why is it more than the static case?) > - 4 kB of heap > - 8 kB of stack > (So far so good, more or less.) > - 16 kB for libskarnet.so (why is it more than glibc uses?) > - 8 kB of anonymous mapping related to libskarnet.so > - 8 kB for libc.so > - 8 kB of anonymous mapping related to libc.so > > That's better than glibc, but is still 40kB of overhead compared to > a static build, plus 4 kB of static data that I don't understand. > Total is 60 kB, which would net 7.7MB + shared for 129 instances. > Linking libskarnet statically would likely save 24kB per instance, so > the total RAM for --enable-allstatic would be 4.6MB + shared. Which > is starting to sound close to acceptable. > > My takeaway from this is that dynamic linking, despite being essential > for distributions (for ease of upgrade, maintenance, and security > reasons), is definitely _not free_. It has a high fixed cost in RAM; > this is not noticeable when using few instances of large, bloated > processes - which is how a lot of software operates - but it is very > noticeable when using a lot of instances of small, efficient processes, > where the costs of dynamic linking overshadow the legit RAM use of said > processes. > > In other words: the way s6 works is a worst case for dynamic linking, > and especially dynamic linking with glibc. I'm sorry. > > If you want to attempt building static binaries of s6 with musl, you > can find musl toolchains at https://skarnet.org/toolchains/ or > at https://musl.cc/ . Please bear in mind you will need to build the > whole stack with the same toolchain (skalibs, execline, s6). > > > > Dewayne: > > >> Thanks Laurent, that's really interesting. By comparison, my FBSD > >> system uses: > >> > >> # ps -axw -o pid,vsz,rss,time,comm | grep s6 > > Well that's the problem with ps: VSZ and RSS won't give you the real > information, because they include shared mappings in their numbers. > To get a reasonably accurate estimation of the marginal increase on > one additional process, you need to know what is shared and what is > private, and ps doesn't tell you that. There is probably a way to get > the information on FreeBSD, but I don't know what it is. > > Yes, the FreeBSD libc is relatively large, but it's pretty decent > compared to glibc. I suspect the marginal increase on one s6-supervise > process on FreeBSD is somewhere between what you get with musl and > what you get with glibc. > > -- > Laurent > >