mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: debloating data, bss
Date: Sun, 18 Nov 2012 16:03:53 -0500	[thread overview]
Message-ID: <20121118210353.GA4844@brightrain.aerifal.cx> (raw)

Hi all,

I've been doing a bit of checking for unneeded bloat, and here's what
I've found in data & bss:

   1045       0     288    1333     535 mntent.o (ex lib/libc.a)

This is simply a line buffer. It seems to me we could instead use
getline or fgetln, but for the latter some consideration is needed to
determine whether its semantics suffice.

    665       0      32     697     2b9 mq_notify.o (ex lib/libc.a)

This is purely gcc being stupid and putting static const char[32] into
bss rather than text (wasting 32 bytes of writable memory to save 32
bytes on disk). Since the buffer is junk, we could just use "char[32]"
(uninitialized); that would shrink the code but would result in
warnings. Or, we could use a pointer to any static object or code of
at least 32 bytes in size.

     86       0     544     630     276 gethostbyaddr.o (ex lib/libc.a)
     78       0     544     622     26e gethostbyname2.o (ex lib/libc.a)

Some ugly gigantic buffers for results. Since getaddrinfo requires
dynamic allocation anyway, it would be reasonable to dynamically
allocate these too; it would not be introducing a failure case that
did not already previously exist.

      6       0     512     518     206 res_state.o (ex lib/libc.a)

This is pure junk; it's just there to satisfy broken programs that try
to peek/poke at the resolver state. I wonder if we could make it
smaller without breaking anything.

    908     160      12    1080     438 random.o (ex lib/libc.a)

Unfortunately I think random really does have that much state...

     43       0     128     171      ab sigisemptyset.o (ex lib/libc.a)

This is another case of gcc stupidly putting uninitialized static
const in bss instead of text. I have a better workaround anyway
though; anyway this code needs to be fixed because it's comparing the
while 1024-bit bit-array even though we treat all but the first 64/128
bits as padding now.

    209       0    8192    8401    20d1 pthread_key_create.o (ex lib/libc.a)

I'm considering replacing pthread_key_create with a new implementation
that makes a fake DSO with TLS instead of having the pthread
thread-specific data being part of the main thread block.

Aside from this, the main issue that's making libc.so's dirty-page
cost so high is that the bss isn't sorted; most of bss is unused in
most programs, but because the commonly-used stuff isn't grouped
together, several pages end up dirty. With this in mind, I'm
considering one of the following 3 approaches to get the commonly-used
data all together in one page:

1. Explicitly initialize everything that's always-used, so it ends up
in .data rather than .bss, and thus on the first page.

2. Reorder object files in the linking so that the bloated junk is all
at the end.

3. Find a way to get the linker to sort it for us, possibly with
alignment and alignment-based sorting.

With the above changes, I think we should be able to cut 2-3 pages of
commit charged off of libc.so and drop the minimum dirty pages for
dynamic linking from 20k (5 pages) to 12k (3 pages, only one of which
is in libc.so; the others are the main app's data and stack).

Major work on debloating will probably not begin until after the next
release.

Rich


             reply	other threads:[~2012-11-18 21:03 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-18 21:03 Rich Felker [this message]
2012-11-18 22:59 ` Szabolcs Nagy
2012-11-19  0:36   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121118210353.GA4844@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).