From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.7 required=5.0 tests=MAILING_LIST_MULTI,URIBL_BLACK
	autolearn=no autolearn_force=no version=3.4.4
Received: (qmail 27610 invoked from network); 9 Jun 2021 11:48:58 -0000
Received: from alyss.skarnet.org (95.142.172.232)
  by inbox.vuxu.org with ESMTPUTF8; 9 Jun 2021 11:48:58 -0000
Received: (qmail 26441 invoked by uid 89); 9 Jun 2021 11:49:22 -0000
Mailing-List: contact supervision-help@list.skarnet.org; run by ezmlm
Sender: <supervision@list.skarnet.org>
Precedence: bulk
List-Post: <mailto:supervision@list.skarnet.org>
List-Help: <mailto:supervision-help@list.skarnet.org>
List-Unsubscribe: <mailto:supervision-unsubscribe@list.skarnet.org>
List-Subscribe: <mailto:supervision-subscribe@list.skarnet.org>
List-Id: <supervision.list.skarnet.org>
Received: (qmail 26434 invoked from network); 9 Jun 2021 11:49:21 -0000
From: "Laurent Bercot" <ska-supervision@skarnet.org>
To: "Arjun D R" <drarjun95@gmail.com>, "Dewayne Geraghty"
 <dewayne@heuristicsystems.com.au>
Subject: Re: Query on s6-log and s6-supervise
Cc: supervision@list.skarnet.org
Date: Wed, 09 Jun 2021 11:48:54 +0000
Message-Id: <em78ce2a1b-c0ce-429e-ad3d-88c25286b086@elzian>
In-Reply-To: <CAHJ2E=kK8JgoqCHAqNzvUZJP4PUEEZeUhp_5RcTjWep7gxt=UQ@mail.gmail.com>
References: <CAHJ2E==g-=8Mu8x8U99-DZbGk-uVqX5yMPp1ZD2G5tYL_9KOqA@mail.gmail.com>
 <em33e7517f-6785-4ad2-b54c-1b83b6da2da6@elzian>
 <6a95462b-547f-bef8-7cc2-8d9e26b6a61d@heuristicsystems.com.au>
 <CAHJ2E=kK8JgoqCHAqNzvUZJP4PUEEZeUhp_5RcTjWep7gxt=UQ@mail.gmail.com>
Reply-To: "Laurent Bercot" <ska-supervision@skarnet.org>
User-Agent: eM_Client/8.2.1473.0
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: -85
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduledrfeduuddggedvucetufdoteggodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgpdfqfgfvnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenfghrlhcuvffnffculdduhedmnecujfgurhephffvufffkfgjfhhrfgggtgfgsehtqhertddtreejnecuhfhrohhmpedfnfgruhhrvghnthcuuegvrhgtohhtfdcuoehskhgrqdhsuhhpvghrvhhishhiohhnsehskhgrrhhnvghtrdhorhhgqeenucggtffrrghtthgvrhhnpedtvefhgeeuhfduheffveeugfevkeekgfeigffgudekffehfeehueeiieffkeelgfenucffohhmrghinhepfedurdhsohdpshhkrghrnhgvthdrohhrghdpmhhushhlrdgttgenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht

>I have checked the Private_Dirty memory in "smaps" of a s6-supervise
>process and I don't see any consuming above 8kB. Just posting it here
>for reference.

  Indeed, each mapping is small, but you have *a lot* of them. The
sum of all the Private_Dirty in your mappings, that should be shown
in smaps_rollup, is 96 kB. 24 pages! That is _huge_.

  In this list, the mappings that are really used by s6-supervise (i.e.
the incompressible amount of unshareable memory) are the following:

  - the /bin/s6-supervise section: this is static data, s6-supervise
needs a little, but it should not take more than one page.

  - the [heap] section: this is dynamically allocated memory, and for
s6-supervise it should not be bigger than 4 kB. s6-supervise does not
allocate dynamic memory itself, the presence of a heap section is due
to opendir() which needs dynamic buffers; the size of the buffer is
determined by the libc, and anything more than one page is wasteful.

( - anonymous mappings are really memory dynamically allocated for
internal  libc purposes; they do not show up in [heap] because they're
not obtained via malloc(). No function used by s6-supervise should
ever need those; any anonymous mapping you see is libc shenanigans
and counts as overhead. )

  - the [stack] section: this is difficult to control because the
amount of stack a process uses depends a lot on the compiler, the
compilation flags, etc. When built with -O2, s6-supervise should not
use more than 2-3 pages of stack. This includes a one-page buffer to
read from notification-fd; I can probably reduce the size of this
buffer and make sure the amount of needed stack pages never goes
above 2.

  So in total, the incompressible amount of private mappings is 4 to 5
pages (16 to 20 kB). All the other mappings are libc overhead.

  - the libpthread-2.31.so mapping uses 8 kB
  - the librt-2.31.so mapping uses 8 kB
  - the libc-2.31.so mapping uses 16 kB
  - the libskarnet.so mapping uses 12 kB
  - ld.so, the dynamic linker itself, uses 16 kB
  - there are 16 kB of anonymous mappings

  This is some serious waste; unfortunately, it's pretty much to be
expected from glibc, which suffers from decades of misdesign and
tunnel vision especially where dynamic linking is concerned. We are,
unfortunately, experiencing the consequences of technical debt.

  Linking against the static version of skalibs (--enable-allstatic)
should save you at least 12 kB (and probably 16) per instance of
s6-supervise. You should have noticed the improvement; your amount of
private memory should have dropped by at least 1.5MB when you switched
to --enable-allstatic.
  But I understand it is not enough.

  Unfortunately, once you have removed the libskarnet.so mappings,
it's basically down to the libc, and to achieve further improvements
I have no other suggestions than to change libcs.

>If possible, can you please share us a reference smap and ps_mem data on
>s6-supervise. That would really help.

  I don't use ps_mem, but here are the details of a s6-supervise process
on the skarnet.org server. s6 is linked statically against the musl
libc, which means:
  - the text segments are bigger (drawback of static linking)
  - there are fewer mappings (advantage of static linking, but even when
you're linking dynamically against musl it maps as little as it can)
  - the mappings have little libc overhead (advantage of musl)

# cat smaps_rollup

00400000-7ffd53096000 ---p 00000000 00:00 0  [rollup]
Rss:                  64 kB
Pss:                  36 kB
Pss_Anon:             20 kB
Pss_File:             16 kB
Pss_Shmem:             0 kB
Shared_Clean:         40 kB
Shared_Dirty:          0 kB
Private_Clean:         8 kB
Private_Dirty:        16 kB
Referenced:           64 kB
Anonymous:            20 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB

  You can see 40kB of shared, 16kB of Private_Dirty, and 8kB of
Private_Clean - apparently there's one Private_Clean page of static
data and one of stack; I have no idea what this corresponds to in the
code, I will need to investigate and see if it can be trimmed down.

# grep -E '[[:space:]](-|r)(-|w)(-|x)(-|p)[[:space:]]|^Private_Dirty:'=20
smaps

00400000-00409000 r-xp 00000000 ca:00 659178  /command/s6-supervise
Private_Dirty:         0 kB
00609000-0060b000 rw-p 00009000 ca:00 659178  /command/s6-supervise
Private_Dirty:         4 kB
02462000-02463000 ---p 00000000 00:00 0  [heap]
Private_Dirty:         0 kB
02463000-02464000 rw-p 00000000 00:00 0  [heap]
Private_Dirty:         4 kB
7ffd53036000-7ffd53057000 rw-p 00000000 00:00 0  [stack]
Private_Dirty:         8 kB
7ffd53090000-7ffd53094000 r--p 00000000 00:00 0  [vvar]
Private_Dirty:         0 kB
7ffd53094000-7ffd53096000 r-xp 00000000 00:00 0  [vdso]
Private_Dirty:         0 kB

  One page of static data, one page of heap, two pages of stack (that
I should probably be able to get down to one). All the other mappings
are shared, except those weird two pages of Private_Clean that I don't
understand yet.
  As you can see, it is as close to incompressible as it gets. If I had
129 of these processes, without changing anything, they would use
something like: (16+8) * 129 + 40 =3D 3136 kB of RAM. Which is still
bigger than the theoretical minimum - I need to get rid of those two
Private_Clean pages - but much more acceptable than the 12.2 MB you get
from glibc.


  I was going to post this as is, but for completion's sake and my
peace of mind, I fired up an Alpine Linux VM and checked /proc for
a s6-supervise process. Alpine Linux uses musl, but with dynamic
linking, and --disable-allstatic. The results are mixed:

  - 8 kB of static data (why is it more than the static case?)
  - 4 kB of heap
  - 8 kB of stack
    (So far so good, more or less.)
  - 16 kB for libskarnet.so (why is it more than glibc uses?)
  - 8 kB of anonymous mapping related to libskarnet.so
  - 8 kB for libc.so
  - 8 kB of anonymous mapping related to libc.so

  That's better than glibc, but is still 40kB of overhead compared to
a static build, plus 4 kB of static data that I don't understand.
Total is 60 kB, which would net 7.7MB + shared for 129 instances.
Linking libskarnet statically would likely save 24kB per instance, so
the total RAM for --enable-allstatic would be 4.6MB + shared. Which
is starting to sound close to acceptable.

  My takeaway from this is that dynamic linking, despite being essential
for distributions (for ease of upgrade, maintenance, and security
reasons), is definitely _not free_. It has a high fixed cost in RAM;
this is not noticeable when using few instances of large, bloated
processes - which is how a lot of software operates - but it is very
noticeable when using a lot of instances of small, efficient processes,
where the costs of dynamic linking overshadow the legit RAM use of said
processes.

  In other words: the way s6 works is a worst case for dynamic linking,
and especially dynamic linking with glibc. I'm sorry.

  If you want to attempt building static binaries of s6 with musl, you
can find musl toolchains at https://skarnet.org/toolchains/ or
at https://musl.cc/ . Please bear in mind you will need to build the
whole stack with the same toolchain (skalibs, execline, s6).


  Dewayne:

>>  Thanks Laurent, that's really interesting.  By comparison, my FBSD
>>  system uses:
>>
>>  # ps -axw -o pid,vsz,rss,time,comm | grep s6

  Well that's the problem with ps: VSZ and RSS won't give you the real
information, because they include shared mappings in their numbers.
To get a reasonably accurate estimation of the marginal increase on
one additional process, you need to know what is shared and what is
private, and ps doesn't tell you that. There is probably a way to get
the information on FreeBSD, but I don't know what it is.

  Yes, the FreeBSD libc is relatively large, but it's pretty decent
compared to glibc. I suspect the marginal increase on one s6-supervise
process on FreeBSD is somewhere between what you get with musl and
what you get with glibc.

--
  Laurent