From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7478 Path: news.gmane.org!not-for-mail From: Andre McCurdy Newsgroups: gmane.linux.lib.musl.general Subject: Re: building musl libc.so with gcc -flto Date: Wed, 22 Apr 2015 22:34:40 -0700 Message-ID: References: <1429742932-6026-1-git-send-email-armccurdy@gmail.com> <20150423022309.GH6817@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1429767297 3914 80.91.229.3 (23 Apr 2015 05:34:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 23 Apr 2015 05:34:57 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7491-gllmg-musl=m.gmane.org@lists.openwall.com Thu Apr 23 07:34:57 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Yl9mo-0006Zk-Dt for gllmg-musl@m.gmane.org; Thu, 23 Apr 2015 07:34:54 +0200 Original-Received: (qmail 3707 invoked by uid 550); 23 Apr 2015 05:34:52 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 3689 invoked from network); 23 Apr 2015 05:34:51 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=I+VMTSQDk1extgV/yihfbQ3nv+JcGv/q1SBd1HyYrAk=; b=CF9zX2L6m5226HSWCTmmA2hdmU0kxd1ba27Sl6SjODV3ptzqfAXvEEdATIorzR+pV4 Z8Cae8rbVp8B3lmk+UL5Tw1KX3saTWIifOdAkylX8kfAwdrMPcuoW8SzrwErzMAXiCI7 Azpr7nzeRn3ZGmGu4ZnPe1yEupJZBUsP/P9nQeJMFIgpc8djm1OZapVGwTGQ9c+H8650 PPXIRWTFb/woivfZefKEaUWWWrypNVrZfVX4M4YgiEFI3DiEclOUhA3b1/owIVR3d3Dj TKZ75l9gVatz0XW0yyyp2J87yNfaUjCAHrr9dGX7FJCeybXh6ozyDwsIE+JaGoTpo5JH hUlw== X-Received: by 10.202.197.138 with SMTP id v132mr861474oif.17.1429767280228; Wed, 22 Apr 2015 22:34:40 -0700 (PDT) In-Reply-To: <20150423022309.GH6817@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:7478 Archived-At: On Wed, Apr 22, 2015 at 7:23 PM, Rich Felker wrote: > On Wed, Apr 22, 2015 at 03:48:52PM -0700, Andre McCurdy wrote: >> Hi all, >> >> Below are some observations from building musl libc.so with gcc's -flto >> (link time optimization) option. > > Interesting! > >> 1) With today's master (afbcac68), adding -flto to CFLAGS causes the >> build to fail: >> >> | `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin) >> | collect2: error: ld returned 1 exit status >> | make: *** [lib/libc.so] Error 1 >> >> Reverting f1faa0e1 (make _dlstart_c function use hidden visibility) >> seems to be a workaround. > > I think the problem is that LTO is garbage collecting "unused" symbols > before it gets to the step of linking with asm for which there is no > IR code, thereby losing anything that's only referenced from asm. A > better workaround might be to define _dlstart_c with a different name > as a non-hidden function (e.g. call it __dls1) and then make > _dlstart_c a hidden alias for it via: > > __attribute__((__visibility__("hidden"))) > void _dlstart_c(size_t *, size_t *); > > weak_alias(__dls1, _dlstart_c); > > If you get a chance to try that, let me know if it works. That change does fix the build, but the resulting binary fails to run: $ gdb ./lib/libc.so ... (gdb) run Starting program: /home/andre/.../lib/libc.so Program received signal SIGILL, Illegal instruction. 0x56572ab8 in _dlstart () (gdb) disassemble Dump of assembler code for function _dlstart: 0x56572aa0 <+0>: xor %ebp,%ebp 0x56572aa2 <+2>: mov %esp,%eax 0x56572aa4 <+4>: and $0xfffffff0,%esp 0x56572aa7 <+7>: push %eax 0x56572aa8 <+8>: push %eax 0x56572aa9 <+9>: call 0x56572aae <_dlstart+14> 0x56572aae <+14>: addl $0x7864a,(%esp) 0x56572ab5 <+21>: push %eax 0x56572ab6 <+22>: call 0x56572ab7 <_dlstart+23> 0x56572abb <+27>: nop 0x56572abc <+28>: lea 0x0(%esi,%eiz,1),%esi End of assembler dump. (gdb) > Another > option might be adding -Wl,-u,_dlstart_c to LDFLAGS. That change alone doesn't fix the build. >> 2) With f1faa0e1 reverted, the build succeeds, but with a warning about >> differing declarations for dummy_tsd and __pthread_tsd_main: >> >> | src/thread/pthread_create.c:169:1: warning: type of '__pthread_tsd_main' does not match original declaration >> | weak_alias(dummy_tsd, __pthread_tsd_main); >> | ^ >> | src/thread/pthread_key_create.c:4:7: note: previously declared here >> | void *__pthread_tsd_main[PTHREAD_KEYS_MAX] = { 0 }; >> | ^ > > This should be harmless but perhaps there's a better way it could be > done. > >> 3) Overall build times are similar, but archiving the best results >> with -flto relies on manually duplicating any 'make -j' options for >> the linker. Times below are from a quad core + hyperthreading system >> running 'make -j8 lib/libc.so': >> >> original : real 0m8.501s >> -flto : real 0m18.034s >> -flto=4 : real 0m9.885s >> -flto=8 : real 0m8.876s > > Yeah that would be expected. > >> 4) Changes in code size seem to be minor, except when compiling with >> -O3, where the code gets noticeably larger (presumably due to -flto >> giving a lot more scope for inlining?). Results below are from building >> with gcc 4.9.2 for 32bit x86: >> >> text data bss dec hex filename >> >> 536405 1416 8800 546621 8573d lib/libc.so ( -Os ) >> 536324 1324 8780 546428 8567c lib/libc.so.lto ( -Os ) >> >> 612028 1416 8928 622372 97f24 lib/libc.so ( -O2 ) >> 611701 1304 9132 622137 97e39 lib/libc.so.lto ( -O2 ) >> >> 687708 1416 8992 698116 aa704 lib/libc.so ( -O3 ) >> 713704 1312 9208 724224 b0d00 lib/libc.so.lto ( -O3 ) > > Also seems rather like what I would expect. Any idea if performance is > significantly better? It's not very comprehensive but you could try > libc-bench. I modified libc-bench so that it loops though everything in main() ten times and then ran the same libc-bench binary with each version of libc.so, sending output to /dev/null. The -O3 -flto build seems to be consistently very slightly *slower* than the non -flto version... ---- ./lib/libc.so.Os ---- 19.92user 9.88system 0:25.38elapsed 117%CPU (0avgtext+0avgdata 39344maxresident)k 0inputs+195360outputs (0major+416745minor)pagefaults 0swaps ---- ./lib/libc.so.O2 ---- 18.72user 9.83system 0:24.20elapsed 117%CPU (0avgtext+0avgdata 39348maxresident)k 0inputs+195360outputs (0major+417364minor)pagefaults 0swaps ---- ./lib/libc.so.O3 ---- 17.97user 9.77system 0:23.48elapsed 118%CPU (0avgtext+0avgdata 39344maxresident)k 0inputs+195360outputs (0major+418251minor)pagefaults 0swaps ---- ./lib/libc.so.lto.Os ---- 20.52user 9.79system 0:26.05elapsed 116%CPU (0avgtext+0avgdata 39344maxresident)k 0inputs+195360outputs (0major+418684minor)pagefaults 0swaps ---- ./lib/libc.so.lto.O2 ---- 18.58user 9.85system 0:24.13elapsed 117%CPU (0avgtext+0avgdata 39348maxresident)k 0inputs+195360outputs (0major+419825minor)pagefaults 0swaps ---- ./lib/libc.so.lto.O3 ---- 18.85user 9.77system 0:24.38elapsed 117%CPU (0avgtext+0avgdata 39344maxresident)k 0inputs+195360outputs (0major+419888minor)pagefaults 0swaps > > Rich