From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 18123 invoked from network); 8 Jun 2021 11:04:39 -0000 Received: from alyss.skarnet.org (95.142.172.232) by inbox.vuxu.org with ESMTPUTF8; 8 Jun 2021 11:04:39 -0000 Received: (qmail 26187 invoked by uid 89); 8 Jun 2021 11:05:03 -0000 Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Received: (qmail 26180 invoked from network); 8 Jun 2021 11:05:03 -0000 From: "Laurent Bercot" To: "Arjun D R" , supervision@list.skarnet.org Subject: Re: Query on s6-log and s6-supervise Date: Tue, 08 Jun 2021 11:04:37 +0000 Message-Id: In-Reply-To: References: Reply-To: "Laurent Bercot" User-Agent: eM_Client/8.2.1237.0 Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduledrfedtledgudekucetufdoteggodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgpdfqfgfvnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvffufffkjghfrhgfgggtgfesthhqredttderjeenucfhrhhomhepfdfnrghurhgvnhhtuceuvghrtghothdfuceoshhkrgdqshhuphgvrhhvihhsihhonhesshhkrghrnhgvthdrohhrgheqnecuggftrfgrthhtvghrnhepkefgkeehuefgkeekudffhffhledtkeekueegjefffedvtedtudeuhefffeetjefgnecuffhomhgrihhnpehskhgrrhhnvghtrdhorhhgpdhlihgstgdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht > 1. Why do we need to have separate supervisors for producer and consum= er > long run services? Is it possible to have one supervisor for both prod= ucer > and consumer, because anyhow the consumer service need not to run when = the > producer is down. I can understand that s6 supervisor is meant to mon= itor > only one service, but why not monitor a couple of services when it is > logically valid if I am not wrong. Hi Arjun, The logic of the supervisor is already complex enough when it has to monitor one process. It would be quadratically as complex if it had to monitor two. In all likeliness, the first impact of such a change would be more bugs, because the logic would be a lot more difficult to understand and maintain. The amount of memory used by the s6 logic itself would not change (or would *increase* somewhat) if the code was organized in a different way in order to reduce the amount of processes, and you would see an overall decrease in code quality. Worsening the design to offset operational costs is not a good trade-off - it is not "logically valid", as you put it. I would not do it even if the high amount of memory consumed by your processes was due to s6 itself. But it is not the case: your operational costs are due to something else. See below. > > 2. Is it possible to have a single supervisor for a bundle of services= ? > Like, one supervisor for a bundle (consisting of few services)? Again, there would be no engineering benefit to that. You would likely see operational benefits, yes, but s6 is the wrong place to try and get those benefits, because it is not the cause of your operational costs. > 3. Generally how many instances of s6-supervise can run? We are runnin= g > into a problem where we have 129 instances of s6-supervise that leads= to > higher memory consumption. We are migrating from systemd to s6 init sy= stem > considering the light weight, but we have a lot of s6-log and s6-super= vise > instances that results in higher memory usage compared to systemd. Is = it > fine to have this many number of s6-supervise instances? ps_mem dat= a - > 5.5 MiB s6-log (46) , 14.3 MiB s6-supervise (129) It is normally totally fine to have this many number of s6-supervise instances (and of s6-log instances), and it is the intended usage. The skarnet.org server only has 256 MB of RAM, and currently sports 93 instances of s6-supervise (and 44 instances of s6-log) without any trouble. It could triple that amount without breaking a sweat. The real problem here is that your instances appear to use so much memory: *that* is not normal. Every s6-supervise process should use at most 4 pages (16k) of private dirty memory, so for 129 processes I would expect the memory usage to be around 2.1 MB. Your reported total shows 7 times as much, which sounds totally out of bounds to me, and even accounting for normal operational overhead, a factor of 7 is *completely bonkers*. There are two possible explanations here: - Either ps_mem is not accurately tallying the memory used by a given set of processes; - Or you are using a libc with an incredible amount of overhead, and your libc (and in particular, I suspect, dynamic linking management in your libc) is the culprit for the insane amount of memory that the s6-supervise processes seem to be eating. The easiest way to understand what's going on is to find a s6-supervise process's pid, and to perform # cat /proc/$pid/smaps_rollup That will tell you what's going on for the chosen s6-supervise process (they're all similar, so the number for the other s6-supervise processes won't be far off). In particular, look at the Private_Dirty line: that is the "real" amount of uncompressible memory used by that process. It should be around 16k, tops. Anything over that is overhead from your libc. If the value is not too much over 16k, then ps_mem is simply lying to you and there is nothing to worry about, except that you should use another tool to tally memory usage. But if the value is much higher, then it is time to diagnose deeper: # cat /proc/$pid/smaps That will show you all the mappings performed by your libc, and the amount of memory that each of these mappings uses. Again, the most important lines are the Private_Dirty ones - these are the values that add up for every s6-supervise instance. My hunch is that you will see *a lot* of mappings, each using 4k or 8k, or even in some cases 12k, of Private_Dirty memory. If it is the case, unfortunately there is nothing I can do about it, because that overhead is entirely caused by your libc. However, there is something *you* can do about it: - If "ldd /bin/s6-supervise" gives you a line mentioning libs6.so or libskarnet.so, try recompiling s6 with --enable-allstatic. This will link against the static version of libs6 and libskarnet, which will alleviate the costs of dynamic linking. (The price is that the *text* of s6-supervise will be a little bigger, but it doesn't matter: text is Shared_Clean, the cost is only incurred once). That alone should decrease your memory usage by a lot. If that is still not enough, then it means your libc is trash. Sorry, there is no other word. If you are on Linux and using glibc (which, indeed, is trash), you can try building skalibs+execline+s6 against the musl libc (https://musl.libc.org/); and not only will that allow you to use s6 as intended, with hundreds of s6-supervise instances without having to worry about memory usage - because musl has very little overhead - but your s6 binaries will also be smaller and faster. I hope this will help you, and I hope this unfortunate report can serve as an illustration of *why* it is important to minimize overhead at every level of a system, especially at lower levels and particularly in the libc. -- Laurent