Results of Aboriginal/musl CFLAGS experiment

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Results of Aboriginal/musl CFLAGS experiment
@ 2015-10-23  7:35 Denys Vlasenko
  2015-10-27  1:29 ` Rich Felker
  0 siblings, 1 reply; 2+ messages in thread
From: Denys Vlasenko @ 2015-10-23  7:35 UTC (permalink / raw)
  To: Rob Landley, Rich Felker, musl

Hi Rob, Rich,

I decided to take a look at how well building busybox against musl
would fare compared to building it against a custom-configured
uclibc I was using for quite some time.

Instead of reinventing the wheel, I decided to use Rob's excellent
Aboriginal Linux build scripts. Here's what I did.

I took Aboriginal's tip.tar.bz2, which was aboriginal-0b3b780ea942.
I built "./build.sh x86_64" without any tweaking.

Then I started adding gcc options I was using in my old custom uclibc build
to sources/sections/musl.build, and not changing anything else:

--- a.0/sources/sections/musl.build     2015-10-11 10:10:26.000000000 +0200
+++ a.1/sources/sections/musl.build     2015-10-23 02:37:45.803972995 +0200
@@ -1,7 +1,10 @@
 # Build and install musl

+(
+export CFLAGS="-Wl,--sort-section,alignment -Wl,--sort-common"
+
 CC= CROSS_COMPILE=${ARCH}- ./configure --prefix=/ &&
 DESTDIR="$STAGE_DIR" make -j $CPUS CROSS_COMPILE=${ARCH}- all install &&
 echo '#define __MUSL__' >> "$STAGE_DIR"/include/features.h &&
 ln -s libc.so "$STAGE_DIR/lib/ld-musl.so.0"
-
+)

I made four steps:
step 1 - CFLAGS+="-Wl,--sort-section,alignment -Wl,--sort-common"
step 2 - CFLAGS+="-ffunction-sections -fdata-sections"
step 3 - CFLAGS+="-falign-jumps=1 -falign-labels=1"
step 4 - CFLAGS+="-falign-functions=1 -falign-loops=1"

and collected size information from several executables after each step:
ls -l */build/native-compiler-x86_64/usr/lib/libc.a
size */build/native-compiler-x86_64/usr/lib/libc.so
size */build/root-filesystem-x86_64/usr/bin/toybox
size */build/root-filesystem-x86_64/usr/bin/busybox
size */build/native-compiler-x86_64/usr/bin/as
size */build/native-compiler-x86_64/usr/bin/ld
size */build/native-compiler-x86_64/usr/bin/bash
size */build/native-compiler-x86_64/usr/x86_64-unknown-linux/bin/collect2

Here is what I discovered.

Step 1, which added "-Wl,--sort-section,alignment -Wl,--sort-common"
affects only the size of libc.so:

   text    data     bss     dec filename
 572242    1920   11640  585802 a.0/native-compiler/lib/libc.so
 572068    1916   11576  585560 a.1/native-compiler/lib/libc.so

What it does is it reduces the chances when during linking,
when sections are merged, a small section (such as one
resulting from "static char flag_var") with no alignment restrictions
gets logded between two bigger ones (say, "static int global_cnt")
which want e.g. 32-bit alignment.

Without section sorting, byte-sized "flag_var" gets 3 bytes of padding.

With section sorting by alignment, one-byte flag variables have
higher chances of being grouped together and not requiring padding.
(It can be made even better. Linker is too dumb).

Step 2: adding "-ffunction-sections -fdata-sections"

Previous optimization isn't working too well because data objects
aren't living in separate sections, they are all grouped in one .data
and one .bss section per *.o file.

"-ffunction-sections -fdata-sections" fix this by putting every function
and data object into its own section. Then section sorting eliminates
many more padding gaps:

   text    data     bss     dec filename
 572068    1916   11576  585560 a.1/native-compiler/lib/libc.so
 570356    1900   11480  583736 a.2/native-compiler/lib/libc.so

More to it. Object files in static libc.a also have their functions
and objects each in its own section. This means that programs
linked with -Wl,--gc-sections (toybox and busybox do this)
will be able to drop unused code and data not on per-.o-file basis,
but on per-function and per-object basis, resulting in ~1% size decrease!

   text    data     bss     dec filename
 338047    6608   22384  367039 a.1/root-filesystem/usr/bin/toybox
 336143    6560   22352  365055 a.2/root-filesystem/usr/bin/toybox
   text    data     bss     dec filename
 324711     862    7648  333221 a.1/root-filesystem/bin/busybox
 321913     826    7520  330259 a.2/root-filesystem/bin/busybox

Most programs, alas, don't use -Wl,--gc-sections, but they still get
a tiny bit smaller:

   text    data     bss     dec filename
1029977    8752   60192 1098921 a.1/native-compiler/bin/as
1029945    8720   60192 1098857 a.2/native-compiler/bin/as
   text    data     bss     dec filename
1122513    9328   25120 1156961 a.1/native-compiler/bin/ld
1122513    9296   25120 1156929 a.2/native-compiler/bin/ld
   text    data     bss     dec filename
 425757   50652   16448  492857 a.1/native-compiler/bin/bash
 425725   50604   16416  492745 a.2/native-compiler/bin/bash
   text    data     bss     dec filename
 140624     880    9472  150976
a.1/native-compiler/x86_64-unknown-linux/bin/collect2
 140624     848    9440  150912
a.2/native-compiler/x86_64-unknown-linux/bin/collect2

I would say there is no reason to not do steps 1 and 2 always.
They don't pessimize execution speed. They simply get rid of some
data padding, and drop dead, unreachable code.

Step 3: add "-falign-jumps=1 -falign-labels=1"
Step 4: add "-falign-functions=1 -falign-loops=1"

Not particularly interesting - they do reduce size of every program I measured,
but some (many?) people would prefer to leave it to gcc to decide when
and how align code, for speed reasons. Anyway, here are stats:

 -rw-r--r-- 1 root root 2514966 a.2/native-compiler/lib/libc.a
 -rw-r--r-- 1 root root 2514726 a.3/native-compiler/lib/libc.a
 -rw-r--r-- 1 root root 2514646 a.4/native-compiler/lib/libc.a
   text    data     bss     dec filename
 570356    1900   11480  583736 a.2/native-compiler/lib/libc.so
 570148    1900   11480  583528 a.3/native-compiler/lib/libc.so
 569637    1900   11480  583017 a.4/native-compiler/lib/libc.so
   text    data     bss     dec filename
 336143    6560   22352  365055 a.2/root-filesystem/usr/bin/toybox
 335999    6560   22352  364911 a.3/root-filesystem/usr/bin/toybox
 335743    6560   22352  364655 a.4/root-filesystem/usr/bin/toybox
   text    data     bss     dec filename
 321913     826    7520  330259 a.2/root-filesystem/bin/busybox
 321801     826    7520  330147 a.3/root-filesystem/bin/busybox
 321541     826    7520  329887 a.4/root-filesystem/bin/busybox
  text    data     bss     dec filename
1029945    8720   60192 1098857 a.2/native-compiler/bin/as
1029817    8720   60192 1098729 a.3/native-compiler/bin/as
1029609    8720   60192 1098521 a.4/native-compiler/bin/as
   text    data     bss     dec filename
1122513    9296   25120 1156929 a.2/native-compiler/bin/ld
1122369    9296   25120 1156785 a.3/native-compiler/bin/ld
1122161    9296   25120 1156577 a.4/native-compiler/bin/ld
   text    data     bss     dec filename
 425725   50604   16416  492745 a.2/native-compiler/bin/bash
 425629   50604   16416  492649 a.3/native-compiler/bin/bash
 425437   50604   16416  492457 a.4/native-compiler/bin/bash
   text    data     bss     dec filename
 140624     848    9440  150912
a.2/native-compiler/x86_64-unknown-linux/bin/collect2
 140560     848    9440  150848
a.3/native-compiler/x86_64-unknown-linux/bin/collect2
 140336     848    9440  150624
a.4/native-compiler/x86_64-unknown-linux/bin/collect2

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Results of Aboriginal/musl CFLAGS experiment
  2015-10-23  7:35 Results of Aboriginal/musl CFLAGS experiment Denys Vlasenko
@ 2015-10-27  1:29 ` Rich Felker
  0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2015-10-27  1:29 UTC (permalink / raw)
  To: musl

On Fri, Oct 23, 2015 at 09:35:39AM +0200, Denys Vlasenko wrote:
> Step 3: add "-falign-jumps=1 -falign-labels=1"
> Step 4: add "-falign-functions=1 -falign-loops=1"
> 
> Not particularly interesting - they do reduce size of every program I measured,
> but some (many?) people would prefer to leave it to gcc to decide when
> and how align code, for speed reasons. Anyway, here are stats:

We had these a long time ago, but I removed them in commit
a80847d86a8865a78fdbebe7f9e2533f7a74e010 because I thought they were
the default at -Os and only relevant to debloating -O3. However I've
heard some suggestions that -Os is no longer worthwhile and that -O2
with overrides to turn off the useless/harmful alignment would be a
better default. Do you have any input on this topic?

Rich

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-10-27  1:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-23  7:35 Results of Aboriginal/musl CFLAGS experiment Denys Vlasenko
2015-10-27  1:29 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).