* Results of Aboriginal/musl CFLAGS experiment
@ 2015-10-23 7:35 Denys Vlasenko
2015-10-27 1:29 ` Rich Felker
0 siblings, 1 reply; 2+ messages in thread
From: Denys Vlasenko @ 2015-10-23 7:35 UTC (permalink / raw)
To: Rob Landley, Rich Felker, musl
Hi Rob, Rich,
I decided to take a look at how well building busybox against musl
would fare compared to building it against a custom-configured
uclibc I was using for quite some time.
Instead of reinventing the wheel, I decided to use Rob's excellent
Aboriginal Linux build scripts. Here's what I did.
I took Aboriginal's tip.tar.bz2, which was aboriginal-0b3b780ea942.
I built "./build.sh x86_64" without any tweaking.
Then I started adding gcc options I was using in my old custom uclibc build
to sources/sections/musl.build, and not changing anything else:
--- a.0/sources/sections/musl.build 2015-10-11 10:10:26.000000000 +0200
+++ a.1/sources/sections/musl.build 2015-10-23 02:37:45.803972995 +0200
@@ -1,7 +1,10 @@
# Build and install musl
+(
+export CFLAGS="-Wl,--sort-section,alignment -Wl,--sort-common"
+
CC= CROSS_COMPILE=${ARCH}- ./configure --prefix=/ &&
DESTDIR="$STAGE_DIR" make -j $CPUS CROSS_COMPILE=${ARCH}- all install &&
echo '#define __MUSL__' >> "$STAGE_DIR"/include/features.h &&
ln -s libc.so "$STAGE_DIR/lib/ld-musl.so.0"
-
+)
I made four steps:
step 1 - CFLAGS+="-Wl,--sort-section,alignment -Wl,--sort-common"
step 2 - CFLAGS+="-ffunction-sections -fdata-sections"
step 3 - CFLAGS+="-falign-jumps=1 -falign-labels=1"
step 4 - CFLAGS+="-falign-functions=1 -falign-loops=1"
and collected size information from several executables after each step:
ls -l */build/native-compiler-x86_64/usr/lib/libc.a
size */build/native-compiler-x86_64/usr/lib/libc.so
size */build/root-filesystem-x86_64/usr/bin/toybox
size */build/root-filesystem-x86_64/usr/bin/busybox
size */build/native-compiler-x86_64/usr/bin/as
size */build/native-compiler-x86_64/usr/bin/ld
size */build/native-compiler-x86_64/usr/bin/bash
size */build/native-compiler-x86_64/usr/x86_64-unknown-linux/bin/collect2
Here is what I discovered.
Step 1, which added "-Wl,--sort-section,alignment -Wl,--sort-common"
affects only the size of libc.so:
text data bss dec filename
572242 1920 11640 585802 a.0/native-compiler/lib/libc.so
572068 1916 11576 585560 a.1/native-compiler/lib/libc.so
What it does is it reduces the chances when during linking,
when sections are merged, a small section (such as one
resulting from "static char flag_var") with no alignment restrictions
gets logded between two bigger ones (say, "static int global_cnt")
which want e.g. 32-bit alignment.
Without section sorting, byte-sized "flag_var" gets 3 bytes of padding.
With section sorting by alignment, one-byte flag variables have
higher chances of being grouped together and not requiring padding.
(It can be made even better. Linker is too dumb).
Step 2: adding "-ffunction-sections -fdata-sections"
Previous optimization isn't working too well because data objects
aren't living in separate sections, they are all grouped in one .data
and one .bss section per *.o file.
"-ffunction-sections -fdata-sections" fix this by putting every function
and data object into its own section. Then section sorting eliminates
many more padding gaps:
text data bss dec filename
572068 1916 11576 585560 a.1/native-compiler/lib/libc.so
570356 1900 11480 583736 a.2/native-compiler/lib/libc.so
More to it. Object files in static libc.a also have their functions
and objects each in its own section. This means that programs
linked with -Wl,--gc-sections (toybox and busybox do this)
will be able to drop unused code and data not on per-.o-file basis,
but on per-function and per-object basis, resulting in ~1% size decrease!
text data bss dec filename
338047 6608 22384 367039 a.1/root-filesystem/usr/bin/toybox
336143 6560 22352 365055 a.2/root-filesystem/usr/bin/toybox
text data bss dec filename
324711 862 7648 333221 a.1/root-filesystem/bin/busybox
321913 826 7520 330259 a.2/root-filesystem/bin/busybox
Most programs, alas, don't use -Wl,--gc-sections, but they still get
a tiny bit smaller:
text data bss dec filename
1029977 8752 60192 1098921 a.1/native-compiler/bin/as
1029945 8720 60192 1098857 a.2/native-compiler/bin/as
text data bss dec filename
1122513 9328 25120 1156961 a.1/native-compiler/bin/ld
1122513 9296 25120 1156929 a.2/native-compiler/bin/ld
text data bss dec filename
425757 50652 16448 492857 a.1/native-compiler/bin/bash
425725 50604 16416 492745 a.2/native-compiler/bin/bash
text data bss dec filename
140624 880 9472 150976
a.1/native-compiler/x86_64-unknown-linux/bin/collect2
140624 848 9440 150912
a.2/native-compiler/x86_64-unknown-linux/bin/collect2
I would say there is no reason to not do steps 1 and 2 always.
They don't pessimize execution speed. They simply get rid of some
data padding, and drop dead, unreachable code.
Step 3: add "-falign-jumps=1 -falign-labels=1"
Step 4: add "-falign-functions=1 -falign-loops=1"
Not particularly interesting - they do reduce size of every program I measured,
but some (many?) people would prefer to leave it to gcc to decide when
and how align code, for speed reasons. Anyway, here are stats:
-rw-r--r-- 1 root root 2514966 a.2/native-compiler/lib/libc.a
-rw-r--r-- 1 root root 2514726 a.3/native-compiler/lib/libc.a
-rw-r--r-- 1 root root 2514646 a.4/native-compiler/lib/libc.a
text data bss dec filename
570356 1900 11480 583736 a.2/native-compiler/lib/libc.so
570148 1900 11480 583528 a.3/native-compiler/lib/libc.so
569637 1900 11480 583017 a.4/native-compiler/lib/libc.so
text data bss dec filename
336143 6560 22352 365055 a.2/root-filesystem/usr/bin/toybox
335999 6560 22352 364911 a.3/root-filesystem/usr/bin/toybox
335743 6560 22352 364655 a.4/root-filesystem/usr/bin/toybox
text data bss dec filename
321913 826 7520 330259 a.2/root-filesystem/bin/busybox
321801 826 7520 330147 a.3/root-filesystem/bin/busybox
321541 826 7520 329887 a.4/root-filesystem/bin/busybox
text data bss dec filename
1029945 8720 60192 1098857 a.2/native-compiler/bin/as
1029817 8720 60192 1098729 a.3/native-compiler/bin/as
1029609 8720 60192 1098521 a.4/native-compiler/bin/as
text data bss dec filename
1122513 9296 25120 1156929 a.2/native-compiler/bin/ld
1122369 9296 25120 1156785 a.3/native-compiler/bin/ld
1122161 9296 25120 1156577 a.4/native-compiler/bin/ld
text data bss dec filename
425725 50604 16416 492745 a.2/native-compiler/bin/bash
425629 50604 16416 492649 a.3/native-compiler/bin/bash
425437 50604 16416 492457 a.4/native-compiler/bin/bash
text data bss dec filename
140624 848 9440 150912
a.2/native-compiler/x86_64-unknown-linux/bin/collect2
140560 848 9440 150848
a.3/native-compiler/x86_64-unknown-linux/bin/collect2
140336 848 9440 150624
a.4/native-compiler/x86_64-unknown-linux/bin/collect2
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Results of Aboriginal/musl CFLAGS experiment
2015-10-23 7:35 Results of Aboriginal/musl CFLAGS experiment Denys Vlasenko
@ 2015-10-27 1:29 ` Rich Felker
0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2015-10-27 1:29 UTC (permalink / raw)
To: musl
On Fri, Oct 23, 2015 at 09:35:39AM +0200, Denys Vlasenko wrote:
> Step 3: add "-falign-jumps=1 -falign-labels=1"
> Step 4: add "-falign-functions=1 -falign-loops=1"
>
> Not particularly interesting - they do reduce size of every program I measured,
> but some (many?) people would prefer to leave it to gcc to decide when
> and how align code, for speed reasons. Anyway, here are stats:
We had these a long time ago, but I removed them in commit
a80847d86a8865a78fdbebe7f9e2533f7a74e010 because I thought they were
the default at -Os and only relevant to debloating -O3. However I've
heard some suggestions that -Os is no longer worthwhile and that -O2
with overrides to turn off the useless/harmful alignment would be a
better default. Do you have any input on this topic?
Rich
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-10-27 1:29 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-23 7:35 Results of Aboriginal/musl CFLAGS experiment Denys Vlasenko
2015-10-27 1:29 ` Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).