From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.7 required=5.0 tests=MAILING_LIST_MULTI,URIBL_BLACK autolearn=no autolearn_force=no version=3.4.4 Received: (qmail 27610 invoked from network); 9 Jun 2021 11:48:58 -0000 Received: from alyss.skarnet.org (95.142.172.232) by inbox.vuxu.org with ESMTPUTF8; 9 Jun 2021 11:48:58 -0000 Received: (qmail 26441 invoked by uid 89); 9 Jun 2021 11:49:22 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Received: (qmail 26434 invoked from network); 9 Jun 2021 11:49:21 -0000 From: "Laurent Bercot" To: "Arjun D R" , "Dewayne Geraghty" Subject: Re: Query on s6-log and s6-supervise Cc: supervision@list.skarnet.org Date: Wed, 09 Jun 2021 11:48:54 +0000 Message-Id: In-Reply-To: References: <6a95462b-547f-bef8-7cc2-8d9e26b6a61d@heuristicsystems.com.au> Reply-To: "Laurent Bercot" User-Agent: eM_Client/8.2.1473.0 Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -85 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduledrfeduuddggedvucetufdoteggodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgpdfqfgfvnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenfghrlhcuvffnffculdduhedmnecujfgurhephffvufffkfgjfhhrfgggtgfgsehtqhertddtreejnecuhfhrohhmpedfnfgruhhrvghnthcuuegvrhgtohhtfdcuoehskhgrqdhsuhhpvghrvhhishhiohhnsehskhgrrhhnvghtrdhorhhgqeenucggtffrrghtthgvrhhnpedtvefhgeeuhfduheffveeugfevkeekgfeigffgudekffehfeehueeiieffkeelgfenucffohhmrghinhepfedurdhsohdpshhkrghrnhgvthdrohhrghdpmhhushhlrdgttgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht >I have checked the Private_Dirty memory in "smaps" of a s6-supervise >process and I don't see any consuming above 8kB. Just posting it here >for reference. Indeed, each mapping is small, but you have *a lot* of them. The sum of all the Private_Dirty in your mappings, that should be shown in smaps_rollup, is 96 kB. 24 pages! That is _huge_. In this list, the mappings that are really used by s6-supervise (i.e. the incompressible amount of unshareable memory) are the following: - the /bin/s6-supervise section: this is static data, s6-supervise needs a little, but it should not take more than one page. - the [heap] section: this is dynamically allocated memory, and for s6-supervise it should not be bigger than 4 kB. s6-supervise does not allocate dynamic memory itself, the presence of a heap section is due to opendir() which needs dynamic buffers; the size of the buffer is determined by the libc, and anything more than one page is wasteful. ( - anonymous mappings are really memory dynamically allocated for internal libc purposes; they do not show up in [heap] because they're not obtained via malloc(). No function used by s6-supervise should ever need those; any anonymous mapping you see is libc shenanigans and counts as overhead. ) - the [stack] section: this is difficult to control because the amount of stack a process uses depends a lot on the compiler, the compilation flags, etc. When built with -O2, s6-supervise should not use more than 2-3 pages of stack. This includes a one-page buffer to read from notification-fd; I can probably reduce the size of this buffer and make sure the amount of needed stack pages never goes above 2. So in total, the incompressible amount of private mappings is 4 to 5 pages (16 to 20 kB). All the other mappings are libc overhead. - the libpthread-2.31.so mapping uses 8 kB - the librt-2.31.so mapping uses 8 kB - the libc-2.31.so mapping uses 16 kB - the libskarnet.so mapping uses 12 kB - ld.so, the dynamic linker itself, uses 16 kB - there are 16 kB of anonymous mappings This is some serious waste; unfortunately, it's pretty much to be expected from glibc, which suffers from decades of misdesign and tunnel vision especially where dynamic linking is concerned. We are, unfortunately, experiencing the consequences of technical debt. Linking against the static version of skalibs (--enable-allstatic) should save you at least 12 kB (and probably 16) per instance of s6-supervise. You should have noticed the improvement; your amount of private memory should have dropped by at least 1.5MB when you switched to --enable-allstatic. But I understand it is not enough. Unfortunately, once you have removed the libskarnet.so mappings, it's basically down to the libc, and to achieve further improvements I have no other suggestions than to change libcs. >If possible, can you please share us a reference smap and ps_mem data on >s6-supervise. That would really help. I don't use ps_mem, but here are the details of a s6-supervise process on the skarnet.org server. s6 is linked statically against the musl libc, which means: - the text segments are bigger (drawback of static linking) - there are fewer mappings (advantage of static linking, but even when you're linking dynamically against musl it maps as little as it can) - the mappings have little libc overhead (advantage of musl) # cat smaps_rollup 00400000-7ffd53096000 ---p 00000000 00:00 0 [rollup] Rss: 64 kB Pss: 36 kB Pss_Anon: 20 kB Pss_File: 16 kB Pss_Shmem: 0 kB Shared_Clean: 40 kB Shared_Dirty: 0 kB Private_Clean: 8 kB Private_Dirty: 16 kB Referenced: 64 kB Anonymous: 20 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of Private_Clean - apparently there's one Private_Clean page of static data and one of stack; I have no idea what this corresponds to in the code, I will need to investigate and see if it can be trimmed down. # grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:'=20 smaps 00400000-00409000 r-xp 00000000 ca:00 659178 /command/s6-supervise Private_Dirty: 0 kB 00609000-0060b000 rw-p 00009000 ca:00 659178 /command/s6-supervise Private_Dirty: 4 kB 02462000-02463000 ---p 00000000 00:00 0 [heap] Private_Dirty: 0 kB 02463000-02464000 rw-p 00000000 00:00 0 [heap] Private_Dirty: 4 kB 7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0 [stack] Private_Dirty: 8 kB 7ffd53090000-7ffd53094000 r--p 00000000 00:00 0 [vvar] Private_Dirty: 0 kB 7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0 [vdso] Private_Dirty: 0 kB One page of static data, one page of heap, two pages of stack (that I should probably be able to get down to one). All the other mappings are shared, except those weird two pages of Private_Clean that I don't understand yet. As you can see, it is as close to incompressible as it gets. If I had 129 of these processes, without changing anything, they would use something like: (16+8) * 129 + 40 =3D 3136 kB of RAM. Which is still bigger than the theoretical minimum - I need to get rid of those two Private_Clean pages - but much more acceptable than the 12.2 MB you get from glibc. I was going to post this as is, but for completion's sake and my peace of mind, I fired up an Alpine Linux VM and checked /proc for a s6-supervise process. Alpine Linux uses musl, but with dynamic linking, and --disable-allstatic. The results are mixed: - 8 kB of static data (why is it more than the static case?) - 4 kB of heap - 8 kB of stack (So far so good, more or less.) - 16 kB for libskarnet.so (why is it more than glibc uses?) - 8 kB of anonymous mapping related to libskarnet.so - 8 kB for libc.so - 8 kB of anonymous mapping related to libc.so That's better than glibc, but is still 40kB of overhead compared to a static build, plus 4 kB of static data that I don't understand. Total is 60 kB, which would net 7.7MB + shared for 129 instances. Linking libskarnet statically would likely save 24kB per instance, so the total RAM for --enable-allstatic would be 4.6MB + shared. Which is starting to sound close to acceptable. My takeaway from this is that dynamic linking, despite being essential for distributions (for ease of upgrade, maintenance, and security reasons), is definitely _not free_. It has a high fixed cost in RAM; this is not noticeable when using few instances of large, bloated processes - which is how a lot of software operates - but it is very noticeable when using a lot of instances of small, efficient processes, where the costs of dynamic linking overshadow the legit RAM use of said processes. In other words: the way s6 works is a worst case for dynamic linking, and especially dynamic linking with glibc. I'm sorry. If you want to attempt building static binaries of s6 with musl, you can find musl toolchains at https://skarnet.org/toolchains/ or at https://musl.cc/ . Please bear in mind you will need to build the whole stack with the same toolchain (skalibs, execline, s6). Dewayne: >> Thanks Laurent, that's really interesting. By comparison, my FBSD >> system uses: >> >> # ps -axw -o pid,vsz,rss,time,comm | grep s6 Well that's the problem with ps: VSZ and RSS won't give you the real information, because they include shared mappings in their numbers. To get a reasonably accurate estimation of the marginal increase on one additional process, you need to know what is shared and what is private, and ps doesn't tell you that. There is probably a way to get the information on FreeBSD, but I don't know what it is. Yes, the FreeBSD libc is relatively large, but it's pretty decent compared to glibc. I suspect the marginal increase on one s6-supervise process on FreeBSD is somewhere between what you get with musl and what you get with glibc. -- Laurent