* building musl libc.so with gcc -flto
@ 2015-04-22 22:48 Andre McCurdy
2015-04-23 2:23 ` Rich Felker
0 siblings, 1 reply; 16+ messages in thread
From: Andre McCurdy @ 2015-04-22 22:48 UTC (permalink / raw)
To: musl; +Cc: Andre McCurdy
Hi all,
Below are some observations from building musl libc.so with gcc's -flto
(link time optimization) option.
1) With today's master (afbcac68), adding -flto to CFLAGS causes the
build to fail:
| `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin)
| collect2: error: ld returned 1 exit status
| make: *** [lib/libc.so] Error 1
Reverting f1faa0e1 (make _dlstart_c function use hidden visibility)
seems to be a workaround.
2) With f1faa0e1 reverted, the build succeeds, but with a warning about
differing declarations for dummy_tsd and __pthread_tsd_main:
| src/thread/pthread_create.c:169:1: warning: type of '__pthread_tsd_main' does not match original declaration
| weak_alias(dummy_tsd, __pthread_tsd_main);
| ^
| src/thread/pthread_key_create.c:4:7: note: previously declared here
| void *__pthread_tsd_main[PTHREAD_KEYS_MAX] = { 0 };
| ^
3) Overall build times are similar, but archieving the best results
with -flto relies on manually duplicating any 'make -j' options for
the linker. Times below are from a quad core + hyperthreading system
running 'make -j8 lib/libc.so':
original : real 0m8.501s
-flto : real 0m18.034s
-flto=4 : real 0m9.885s
-flto=8 : real 0m8.876s
4) Changes in code size seem to be minor, except when compiling with
-O3, where the code gets noticably larger (presumably due to -flto
giving a lot more scope for inlining?). Results below are from building
with gcc 4.9.2 for 32bit x86:
text data bss dec hex filename
536405 1416 8800 546621 8573d lib/libc.so ( -Os )
536324 1324 8780 546428 8567c lib/libc.so.lto ( -Os )
612028 1416 8928 622372 97f24 lib/libc.so ( -O2 )
611701 1304 9132 622137 97e39 lib/libc.so.lto ( -O2 )
687708 1416 8992 698116 aa704 lib/libc.so ( -O3 )
713704 1312 9208 724224 b0d00 lib/libc.so.lto ( -O3 )
Andre
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-22 22:48 building musl libc.so with gcc -flto Andre McCurdy
@ 2015-04-23 2:23 ` Rich Felker
2015-04-23 5:34 ` Andre McCurdy
2015-04-30 20:46 ` Andy Lutomirski
0 siblings, 2 replies; 16+ messages in thread
From: Rich Felker @ 2015-04-23 2:23 UTC (permalink / raw)
To: musl
On Wed, Apr 22, 2015 at 03:48:52PM -0700, Andre McCurdy wrote:
> Hi all,
>
> Below are some observations from building musl libc.so with gcc's -flto
> (link time optimization) option.
Interesting!
> 1) With today's master (afbcac68), adding -flto to CFLAGS causes the
> build to fail:
>
> | `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin)
> | collect2: error: ld returned 1 exit status
> | make: *** [lib/libc.so] Error 1
>
> Reverting f1faa0e1 (make _dlstart_c function use hidden visibility)
> seems to be a workaround.
I think the problem is that LTO is garbage collecting "unused" symbols
before it gets to the step of linking with asm for which there is no
IR code, thereby losing anything that's only referenced from asm. A
better workaround might be to define _dlstart_c with a different name
as a non-hidden function (e.g. call it __dls1) and then make
_dlstart_c a hidden alias for it via:
__attribute__((__visibility__("hidden")))
void _dlstart_c(size_t *, size_t *);
weak_alias(__dls1, _dlstart_c);
If you get a chance to try that, let me know if it works. Another
option might be adding -Wl,-u,_dlstart_c to LDFLAGS.
> 2) With f1faa0e1 reverted, the build succeeds, but with a warning about
> differing declarations for dummy_tsd and __pthread_tsd_main:
>
> | src/thread/pthread_create.c:169:1: warning: type of '__pthread_tsd_main' does not match original declaration
> | weak_alias(dummy_tsd, __pthread_tsd_main);
> | ^
> | src/thread/pthread_key_create.c:4:7: note: previously declared here
> | void *__pthread_tsd_main[PTHREAD_KEYS_MAX] = { 0 };
> | ^
This should be harmless but perhaps there's a better way it could be
done.
> 3) Overall build times are similar, but archieving the best results
> with -flto relies on manually duplicating any 'make -j' options for
> the linker. Times below are from a quad core + hyperthreading system
> running 'make -j8 lib/libc.so':
>
> original : real 0m8.501s
> -flto : real 0m18.034s
> -flto=4 : real 0m9.885s
> -flto=8 : real 0m8.876s
Yeah that would be expected.
> 4) Changes in code size seem to be minor, except when compiling with
> -O3, where the code gets noticably larger (presumably due to -flto
> giving a lot more scope for inlining?). Results below are from building
> with gcc 4.9.2 for 32bit x86:
>
> text data bss dec hex filename
>
> 536405 1416 8800 546621 8573d lib/libc.so ( -Os )
> 536324 1324 8780 546428 8567c lib/libc.so.lto ( -Os )
>
> 612028 1416 8928 622372 97f24 lib/libc.so ( -O2 )
> 611701 1304 9132 622137 97e39 lib/libc.so.lto ( -O2 )
>
> 687708 1416 8992 698116 aa704 lib/libc.so ( -O3 )
> 713704 1312 9208 724224 b0d00 lib/libc.so.lto ( -O3 )
Also seems rather like what I would expect. Any idea if performance is
significantly better? It's not very comprehensive but you could try
libc-bench.
Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-23 2:23 ` Rich Felker
@ 2015-04-23 5:34 ` Andre McCurdy
2015-04-23 9:45 ` Rich Felker
2015-04-30 20:46 ` Andy Lutomirski
1 sibling, 1 reply; 16+ messages in thread
From: Andre McCurdy @ 2015-04-23 5:34 UTC (permalink / raw)
To: musl
On Wed, Apr 22, 2015 at 7:23 PM, Rich Felker <dalias@libc.org> wrote:
> On Wed, Apr 22, 2015 at 03:48:52PM -0700, Andre McCurdy wrote:
>> Hi all,
>>
>> Below are some observations from building musl libc.so with gcc's -flto
>> (link time optimization) option.
>
> Interesting!
>
>> 1) With today's master (afbcac68), adding -flto to CFLAGS causes the
>> build to fail:
>>
>> | `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin)
>> | collect2: error: ld returned 1 exit status
>> | make: *** [lib/libc.so] Error 1
>>
>> Reverting f1faa0e1 (make _dlstart_c function use hidden visibility)
>> seems to be a workaround.
>
> I think the problem is that LTO is garbage collecting "unused" symbols
> before it gets to the step of linking with asm for which there is no
> IR code, thereby losing anything that's only referenced from asm. A
> better workaround might be to define _dlstart_c with a different name
> as a non-hidden function (e.g. call it __dls1) and then make
> _dlstart_c a hidden alias for it via:
>
> __attribute__((__visibility__("hidden")))
> void _dlstart_c(size_t *, size_t *);
>
> weak_alias(__dls1, _dlstart_c);
>
> If you get a chance to try that, let me know if it works.
That change does fix the build, but the resulting binary fails to run:
$ gdb ./lib/libc.so
...
(gdb) run
Starting program: /home/andre/.../lib/libc.so
Program received signal SIGILL, Illegal instruction.
0x56572ab8 in _dlstart ()
(gdb) disassemble
Dump of assembler code for function _dlstart:
0x56572aa0 <+0>: xor %ebp,%ebp
0x56572aa2 <+2>: mov %esp,%eax
0x56572aa4 <+4>: and $0xfffffff0,%esp
0x56572aa7 <+7>: push %eax
0x56572aa8 <+8>: push %eax
0x56572aa9 <+9>: call 0x56572aae <_dlstart+14>
0x56572aae <+14>: addl $0x7864a,(%esp)
0x56572ab5 <+21>: push %eax
0x56572ab6 <+22>: call 0x56572ab7 <_dlstart+23>
0x56572abb <+27>: nop
0x56572abc <+28>: lea 0x0(%esi,%eiz,1),%esi
End of assembler dump.
(gdb)
> Another
> option might be adding -Wl,-u,_dlstart_c to LDFLAGS.
That change alone doesn't fix the build.
>> 2) With f1faa0e1 reverted, the build succeeds, but with a warning about
>> differing declarations for dummy_tsd and __pthread_tsd_main:
>>
>> | src/thread/pthread_create.c:169:1: warning: type of '__pthread_tsd_main' does not match original declaration
>> | weak_alias(dummy_tsd, __pthread_tsd_main);
>> | ^
>> | src/thread/pthread_key_create.c:4:7: note: previously declared here
>> | void *__pthread_tsd_main[PTHREAD_KEYS_MAX] = { 0 };
>> | ^
>
> This should be harmless but perhaps there's a better way it could be
> done.
>
>> 3) Overall build times are similar, but archiving the best results
>> with -flto relies on manually duplicating any 'make -j' options for
>> the linker. Times below are from a quad core + hyperthreading system
>> running 'make -j8 lib/libc.so':
>>
>> original : real 0m8.501s
>> -flto : real 0m18.034s
>> -flto=4 : real 0m9.885s
>> -flto=8 : real 0m8.876s
>
> Yeah that would be expected.
>
>> 4) Changes in code size seem to be minor, except when compiling with
>> -O3, where the code gets noticeably larger (presumably due to -flto
>> giving a lot more scope for inlining?). Results below are from building
>> with gcc 4.9.2 for 32bit x86:
>>
>> text data bss dec hex filename
>>
>> 536405 1416 8800 546621 8573d lib/libc.so ( -Os )
>> 536324 1324 8780 546428 8567c lib/libc.so.lto ( -Os )
>>
>> 612028 1416 8928 622372 97f24 lib/libc.so ( -O2 )
>> 611701 1304 9132 622137 97e39 lib/libc.so.lto ( -O2 )
>>
>> 687708 1416 8992 698116 aa704 lib/libc.so ( -O3 )
>> 713704 1312 9208 724224 b0d00 lib/libc.so.lto ( -O3 )
>
> Also seems rather like what I would expect. Any idea if performance is
> significantly better? It's not very comprehensive but you could try
> libc-bench.
I modified libc-bench so that it loops though everything in main() ten
times and then ran the same libc-bench binary with each version of
libc.so, sending output to /dev/null.
The -O3 -flto build seems to be consistently very slightly *slower*
than the non -flto version...
----
./lib/libc.so.Os
----
19.92user 9.88system 0:25.38elapsed 117%CPU (0avgtext+0avgdata
39344maxresident)k
0inputs+195360outputs (0major+416745minor)pagefaults 0swaps
----
./lib/libc.so.O2
----
18.72user 9.83system 0:24.20elapsed 117%CPU (0avgtext+0avgdata
39348maxresident)k
0inputs+195360outputs (0major+417364minor)pagefaults 0swaps
----
./lib/libc.so.O3
----
17.97user 9.77system 0:23.48elapsed 118%CPU (0avgtext+0avgdata
39344maxresident)k
0inputs+195360outputs (0major+418251minor)pagefaults 0swaps
----
./lib/libc.so.lto.Os
----
20.52user 9.79system 0:26.05elapsed 116%CPU (0avgtext+0avgdata
39344maxresident)k
0inputs+195360outputs (0major+418684minor)pagefaults 0swaps
----
./lib/libc.so.lto.O2
----
18.58user 9.85system 0:24.13elapsed 117%CPU (0avgtext+0avgdata
39348maxresident)k
0inputs+195360outputs (0major+419825minor)pagefaults 0swaps
----
./lib/libc.so.lto.O3
----
18.85user 9.77system 0:24.38elapsed 117%CPU (0avgtext+0avgdata
39344maxresident)k
0inputs+195360outputs (0major+419888minor)pagefaults 0swaps
>
> Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-23 5:34 ` Andre McCurdy
@ 2015-04-23 9:45 ` Rich Felker
2015-04-28 0:16 ` Andre McCurdy
0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2015-04-23 9:45 UTC (permalink / raw)
To: musl
On Wed, Apr 22, 2015 at 10:34:40PM -0700, Andre McCurdy wrote:
> On Wed, Apr 22, 2015 at 7:23 PM, Rich Felker <dalias@libc.org> wrote:
> > On Wed, Apr 22, 2015 at 03:48:52PM -0700, Andre McCurdy wrote:
> >> Hi all,
> >>
> >> Below are some observations from building musl libc.so with gcc's -flto
> >> (link time optimization) option.
> >
> > Interesting!
> >
> >> 1) With today's master (afbcac68), adding -flto to CFLAGS causes the
> >> build to fail:
> >>
> >> | `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin)
> >> | collect2: error: ld returned 1 exit status
> >> | make: *** [lib/libc.so] Error 1
> >>
> >> Reverting f1faa0e1 (make _dlstart_c function use hidden visibility)
> >> seems to be a workaround.
> >
> > I think the problem is that LTO is garbage collecting "unused" symbols
> > before it gets to the step of linking with asm for which there is no
> > IR code, thereby losing anything that's only referenced from asm. A
> > better workaround might be to define _dlstart_c with a different name
> > as a non-hidden function (e.g. call it __dls1) and then make
> > _dlstart_c a hidden alias for it via:
> >
> > __attribute__((__visibility__("hidden")))
> > void _dlstart_c(size_t *, size_t *);
> >
> > weak_alias(__dls1, _dlstart_c);
> >
> > If you get a chance to try that, let me know if it works.
>
> That change does fix the build, but the resulting binary fails to run:
>
> $ gdb ./lib/libc.so
> ....
> (gdb) run
> Starting program: /home/andre/.../lib/libc.so
>
> Program received signal SIGILL, Illegal instruction.
> 0x56572ab8 in _dlstart ()
> (gdb) disassemble
> Dump of assembler code for function _dlstart:
> 0x56572aa0 <+0>: xor %ebp,%ebp
> 0x56572aa2 <+2>: mov %esp,%eax
> 0x56572aa4 <+4>: and $0xfffffff0,%esp
> 0x56572aa7 <+7>: push %eax
> 0x56572aa8 <+8>: push %eax
> 0x56572aa9 <+9>: call 0x56572aae <_dlstart+14>
> 0x56572aae <+14>: addl $0x7864a,(%esp)
> 0x56572ab5 <+21>: push %eax
> 0x56572ab6 <+22>: call 0x56572ab7 <_dlstart+23>
> 0x56572abb <+27>: nop
> 0x56572abc <+28>: lea 0x0(%esi,%eiz,1),%esi
> End of assembler dump.
> (gdb)
OK, it looks like the _dlstart_c symbol got removed before linking the
asm. What about selectively compiling this file with -fno-lto via
something like this in config.mak:
src/ldso/dlstart.lo: CFLAGS += -fno-lto
> > Also seems rather like what I would expect. Any idea if performance is
> > significantly better? It's not very comprehensive but you could try
> > libc-bench.
>
> I modified libc-bench so that it loops though everything in main() ten
> times and then ran the same libc-bench binary with each version of
> libc.so, sending output to /dev/null.
>
> The -O3 -flto build seems to be consistently very slightly *slower*
> than the non -flto version...
That makes the whole thing somewhat less interesting. LTO is probably
more interesting for static libc.
Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-23 9:45 ` Rich Felker
@ 2015-04-28 0:16 ` Andre McCurdy
2015-04-28 0:24 ` Rich Felker
0 siblings, 1 reply; 16+ messages in thread
From: Andre McCurdy @ 2015-04-28 0:16 UTC (permalink / raw)
To: musl
On Thu, Apr 23, 2015 at 2:45 AM, Rich Felker <dalias@libc.org> wrote:
> On Wed, Apr 22, 2015 at 10:34:40PM -0700, Andre McCurdy wrote:
>> On Wed, Apr 22, 2015 at 7:23 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Wed, Apr 22, 2015 at 03:48:52PM -0700, Andre McCurdy wrote:
>> >> Hi all,
>> >>
>> >> Below are some observations from building musl libc.so with gcc's -flto
>> >> (link time optimization) option.
>> >
>> > Interesting!
>> >
>> >> 1) With today's master (afbcac68), adding -flto to CFLAGS causes the
>> >> build to fail:
>> >>
>> >> | `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin)
>> >> | collect2: error: ld returned 1 exit status
>> >> | make: *** [lib/libc.so] Error 1
>> >>
>> >> Reverting f1faa0e1 (make _dlstart_c function use hidden visibility)
>> >> seems to be a workaround.
>> >
>> > I think the problem is that LTO is garbage collecting "unused" symbols
>> > before it gets to the step of linking with asm for which there is no
>> > IR code, thereby losing anything that's only referenced from asm. A
>> > better workaround might be to define _dlstart_c with a different name
>> > as a non-hidden function (e.g. call it __dls1) and then make
>> > _dlstart_c a hidden alias for it via:
>> >
>> > __attribute__((__visibility__("hidden")))
>> > void _dlstart_c(size_t *, size_t *);
>> >
>> > weak_alias(__dls1, _dlstart_c);
>> >
>> > If you get a chance to try that, let me know if it works.
>>
>> That change does fix the build, but the resulting binary fails to run:
>>
>> $ gdb ./lib/libc.so
>> ....
>> (gdb) run
>> Starting program: /home/andre/.../lib/libc.so
>>
>> Program received signal SIGILL, Illegal instruction.
>> 0x56572ab8 in _dlstart ()
>> (gdb) disassemble
>> Dump of assembler code for function _dlstart:
>> 0x56572aa0 <+0>: xor %ebp,%ebp
>> 0x56572aa2 <+2>: mov %esp,%eax
>> 0x56572aa4 <+4>: and $0xfffffff0,%esp
>> 0x56572aa7 <+7>: push %eax
>> 0x56572aa8 <+8>: push %eax
>> 0x56572aa9 <+9>: call 0x56572aae <_dlstart+14>
>> 0x56572aae <+14>: addl $0x7864a,(%esp)
>> 0x56572ab5 <+21>: push %eax
>> 0x56572ab6 <+22>: call 0x56572ab7 <_dlstart+23>
>> 0x56572abb <+27>: nop
>> 0x56572abc <+28>: lea 0x0(%esi,%eiz,1),%esi
>> End of assembler dump.
>> (gdb)
>
> OK, it looks like the _dlstart_c symbol got removed before linking the
> asm. What about selectively compiling this file with -fno-lto via
> something like this in config.mak:
>
> src/ldso/dlstart.lo: CFLAGS += -fno-lto
That works. Should I send a patch?
>> > Also seems rather like what I would expect. Any idea if performance is
>> > significantly better? It's not very comprehensive but you could try
>> > libc-bench.
>>
>> I modified libc-bench so that it loops though everything in main() ten
>> times and then ran the same libc-bench binary with each version of
>> libc.so, sending output to /dev/null.
>>
>> The -O3 -flto build seems to be consistently very slightly *slower*
>> than the non -flto version...
>
> That makes the whole thing somewhat less interesting. LTO is probably
> more interesting for static libc.
Yes, quite disappointing...
I'll try to experiment a little with static linking.
>
> Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-28 0:16 ` Andre McCurdy
@ 2015-04-28 0:24 ` Rich Felker
2015-04-28 6:23 ` Andre McCurdy
0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2015-04-28 0:24 UTC (permalink / raw)
To: musl
On Mon, Apr 27, 2015 at 05:16:12PM -0700, Andre McCurdy wrote:
> > OK, it looks like the _dlstart_c symbol got removed before linking the
> > asm. What about selectively compiling this file with -fno-lto via
> > something like this in config.mak:
> >
> > src/ldso/dlstart.lo: CFLAGS += -fno-lto
>
> That works. Should I send a patch?
Yes, but configure would need to detect support for -fno-lto and add
it appropriately. See what's done for CFLAGS_NOSSP. I suspect the crt
files also need -fno-lto in principle even if they're not currently
breaking for lack of it.
> >> > Also seems rather like what I would expect. Any idea if performance is
> >> > significantly better? It's not very comprehensive but you could try
> >> > libc-bench.
> >>
> >> I modified libc-bench so that it loops though everything in main() ten
> >> times and then ran the same libc-bench binary with each version of
> >> libc.so, sending output to /dev/null.
> >>
> >> The -O3 -flto build seems to be consistently very slightly *slower*
> >> than the non -flto version...
> >
> > That makes the whole thing somewhat less interesting. LTO is probably
> > more interesting for static libc.
>
> Yes, quite disappointing...
>
> I'll try to experiment a little with static linking.
Great. Let us know how it goes.
Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-28 0:24 ` Rich Felker
@ 2015-04-28 6:23 ` Andre McCurdy
2015-04-28 13:44 ` Rich Felker
0 siblings, 1 reply; 16+ messages in thread
From: Andre McCurdy @ 2015-04-28 6:23 UTC (permalink / raw)
To: musl
On Mon, Apr 27, 2015 at 5:24 PM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 27, 2015 at 05:16:12PM -0700, Andre McCurdy wrote:
>> > OK, it looks like the _dlstart_c symbol got removed before linking the
>> > asm. What about selectively compiling this file with -fno-lto via
>> > something like this in config.mak:
>> >
>> > src/ldso/dlstart.lo: CFLAGS += -fno-lto
>>
>> That works. Should I send a patch?
>
> Yes, but configure would need to detect support for -fno-lto and add
> it appropriately. See what's done for CFLAGS_NOSSP. I suspect the crt
> files also need -fno-lto in principle even if they're not currently
> breaking for lack of it.
Patch sent.
I think the crt files might be OK as they are, since the _start_c
symbol isn't being hidden?
>> >> > Also seems rather like what I would expect. Any idea if performance is
>> >> > significantly better? It's not very comprehensive but you could try
>> >> > libc-bench.
>> >>
>> >> I modified libc-bench so that it loops though everything in main() ten
>> >> times and then ran the same libc-bench binary with each version of
>> >> libc.so, sending output to /dev/null.
>> >>
>> >> The -O3 -flto build seems to be consistently very slightly *slower*
>> >> than the non -flto version...
>> >
>> > That makes the whole thing somewhat less interesting. LTO is probably
>> > more interesting for static libc.
>>
>> Yes, quite disappointing...
>>
>> I'll try to experiment a little with static linking.
>
> Great. Let us know how it goes.
>
> Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-28 6:23 ` Andre McCurdy
@ 2015-04-28 13:44 ` Rich Felker
2015-04-29 1:42 ` Andre McCurdy
0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2015-04-28 13:44 UTC (permalink / raw)
To: musl
On Mon, Apr 27, 2015 at 11:23:40PM -0700, Andre McCurdy wrote:
> On Mon, Apr 27, 2015 at 5:24 PM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 27, 2015 at 05:16:12PM -0700, Andre McCurdy wrote:
> >> > OK, it looks like the _dlstart_c symbol got removed before linking the
> >> > asm. What about selectively compiling this file with -fno-lto via
> >> > something like this in config.mak:
> >> >
> >> > src/ldso/dlstart.lo: CFLAGS += -fno-lto
> >>
> >> That works. Should I send a patch?
> >
> > Yes, but configure would need to detect support for -fno-lto and add
> > it appropriately. See what's done for CFLAGS_NOSSP. I suspect the crt
> > files also need -fno-lto in principle even if they're not currently
> > breaking for lack of it.
>
> Patch sent.
>
> I think the crt files might be OK as they are, since the _start_c
> symbol isn't being hidden?
I think you'll find the exact same thing happens if you use a crt1.o
produced from crt1.c for static linking with LTO. Note that on i386
(and x86_64) we still have a crt1.s which overrides crt1.c; I want to
remove it at some point. Temporarily removing/renaming it yourself
will allow you to test what happens with LTO on this file.
Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-28 13:44 ` Rich Felker
@ 2015-04-29 1:42 ` Andre McCurdy
2015-04-29 3:27 ` Rich Felker
0 siblings, 1 reply; 16+ messages in thread
From: Andre McCurdy @ 2015-04-29 1:42 UTC (permalink / raw)
To: musl
On Tue, Apr 28, 2015 at 6:44 AM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 27, 2015 at 11:23:40PM -0700, Andre McCurdy wrote:
>> On Mon, Apr 27, 2015 at 5:24 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Mon, Apr 27, 2015 at 05:16:12PM -0700, Andre McCurdy wrote:
>> >> > OK, it looks like the _dlstart_c symbol got removed before linking the
>> >> > asm. What about selectively compiling this file with -fno-lto via
>> >> > something like this in config.mak:
>> >> >
>> >> > src/ldso/dlstart.lo: CFLAGS += -fno-lto
>> >>
>> >> That works. Should I send a patch?
>> >
>> > Yes, but configure would need to detect support for -fno-lto and add
>> > it appropriately. See what's done for CFLAGS_NOSSP. I suspect the crt
>> > files also need -fno-lto in principle even if they're not currently
>> > breaking for lack of it.
>>
>> Patch sent.
>>
>> I think the crt files might be OK as they are, since the _start_c
>> symbol isn't being hidden?
>
> I think you'll find the exact same thing happens if you use a crt1.o
> produced from crt1.c for static linking with LTO. Note that on i386
> (and x86_64) we still have a crt1.s which overrides crt1.c; I want to
> remove it at some point. Temporarily removing/renaming it yourself
> will allow you to test what happens with LTO on this file.
Yes, you're right. The same workaround is needed for crt1.c, so my
original patch is incomplete.
The next issue, as Khem mentioned, is that AR and RANLIB need to be changed to:
AR = $(CROSS_COMPILE)gcc-ar
RANLIB = $(CROSS_COMPILE)gcc-ranlib
Is it safe to use these gcc-xx wrappers in all cases (after having
configure test that the toolchain provides them)? Or should they only
be used with LTO?
Beyond that I'm not able to statically link a hello.c test app using
LTO yet, since LTO linking with .a archives requires the gold linker
and I specifically have that disabled (to avoid issues I had seen
previously with musl+gold+dynamic linking). It looks like I need to
enable gold again before continuing. Are there any known issues, or
known success stories, when using musl with gold?
>
> Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-29 1:42 ` Andre McCurdy
@ 2015-04-29 3:27 ` Rich Felker
2015-05-01 5:48 ` Andre McCurdy
0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2015-04-29 3:27 UTC (permalink / raw)
To: musl
On Tue, Apr 28, 2015 at 06:42:13PM -0700, Andre McCurdy wrote:
> On Tue, Apr 28, 2015 at 6:44 AM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 27, 2015 at 11:23:40PM -0700, Andre McCurdy wrote:
> >> On Mon, Apr 27, 2015 at 5:24 PM, Rich Felker <dalias@libc.org> wrote:
> >> > On Mon, Apr 27, 2015 at 05:16:12PM -0700, Andre McCurdy wrote:
> >> >> > OK, it looks like the _dlstart_c symbol got removed before linking the
> >> >> > asm. What about selectively compiling this file with -fno-lto via
> >> >> > something like this in config.mak:
> >> >> >
> >> >> > src/ldso/dlstart.lo: CFLAGS += -fno-lto
> >> >>
> >> >> That works. Should I send a patch?
> >> >
> >> > Yes, but configure would need to detect support for -fno-lto and add
> >> > it appropriately. See what's done for CFLAGS_NOSSP. I suspect the crt
> >> > files also need -fno-lto in principle even if they're not currently
> >> > breaking for lack of it.
> >>
> >> Patch sent.
> >>
> >> I think the crt files might be OK as they are, since the _start_c
> >> symbol isn't being hidden?
> >
> > I think you'll find the exact same thing happens if you use a crt1.o
> > produced from crt1.c for static linking with LTO. Note that on i386
> > (and x86_64) we still have a crt1.s which overrides crt1.c; I want to
> > remove it at some point. Temporarily removing/renaming it yourself
> > will allow you to test what happens with LTO on this file.
>
> Yes, you're right. The same workaround is needed for crt1.c, so my
> original patch is incomplete.
>
> The next issue, as Khem mentioned, is that AR and RANLIB need to be changed to:
>
> AR = $(CROSS_COMPILE)gcc-ar
> RANLIB = $(CROSS_COMPILE)gcc-ranlib
>
> Is it safe to use these gcc-xx wrappers in all cases (after having
> configure test that the toolchain provides them)? Or should they only
> be used with LTO?
I have no idea. But why can't a normal archive produced with ar be
used for LTO? Inability to use standard toolchain components/build
scripts sounds like a fairly big problem for LTO deployability.
> Beyond that I'm not able to statically link a hello.c test app using
> LTO yet, since LTO linking with .a archives requires the gold linker
> and I specifically have that disabled (to avoid issues I had seen
> previously with musl+gold+dynamic linking). It looks like I need to
> enable gold again before continuing. Are there any known issues, or
> known success stories, when using musl with gold?
I would actually love to see new reports on gold with musl dynamic
linking. The new dynamic linker bootstrap design should be a lot more
robust against poor choices by the linker, but there could be stupid
details that are breaking still which might call for fixes on musl's
side.
If you have citations for the problems that previously existed with
musl+gold, it would be nice to see those again too just to make sure
the issues were properly taken care of and not just swept under a rug.
Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-23 2:23 ` Rich Felker
2015-04-23 5:34 ` Andre McCurdy
@ 2015-04-30 20:46 ` Andy Lutomirski
2015-04-30 23:44 ` Rich Felker
1 sibling, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2015-04-30 20:46 UTC (permalink / raw)
To: musl
On 04/22/2015 07:23 PM, Rich Felker wrote:
> On Wed, Apr 22, 2015 at 03:48:52PM -0700, Andre McCurdy wrote:
>> Hi all,
>>
>> Below are some observations from building musl libc.so with gcc's -flto
>> (link time optimization) option.
>
> Interesting!
>
>> 1) With today's master (afbcac68), adding -flto to CFLAGS causes the
>> build to fail:
>>
>> | `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin)
>> | collect2: error: ld returned 1 exit status
>> | make: *** [lib/libc.so] Error 1
>>
>> Reverting f1faa0e1 (make _dlstart_c function use hidden visibility)
>> seems to be a workaround.
>
> I think the problem is that LTO is garbage collecting "unused" symbols
> before it gets to the step of linking with asm for which there is no
> IR code, thereby losing anything that's only referenced from asm. A
> better workaround might be to define _dlstart_c with a different name
> as a non-hidden function (e.g. call it __dls1) and then make
> _dlstart_c a hidden alias for it via:
>
> __attribute__((__visibility__("hidden")))
> void _dlstart_c(size_t *, size_t *);
>
> weak_alias(__dls1, _dlstart_c);
>
> If you get a chance to try that, let me know if it works. Another
> option might be adding -Wl,-u,_dlstart_c to LDFLAGS.
Wouldn't adding __attribute__((externally_visible)) to the relevant
symbols be more appropriate? It's intended to solve exactly this problem.
--Andy
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Re: building musl libc.so with gcc -flto
2015-04-30 20:46 ` Andy Lutomirski
@ 2015-04-30 23:44 ` Rich Felker
2015-05-01 6:57 ` Alexander Monakov
0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2015-04-30 23:44 UTC (permalink / raw)
To: musl
On Thu, Apr 30, 2015 at 01:46:21PM -0700, Andy Lutomirski wrote:
> On 04/22/2015 07:23 PM, Rich Felker wrote:
> >On Wed, Apr 22, 2015 at 03:48:52PM -0700, Andre McCurdy wrote:
> >>Hi all,
> >>
> >>Below are some observations from building musl libc.so with gcc's -flto
> >>(link time optimization) option.
> >
> >Interesting!
> >
> >>1) With today's master (afbcac68), adding -flto to CFLAGS causes the
> >>build to fail:
> >>
> >> | `_dlstart_c' referenced in section `.text' of /tmp/cc8ceNIy.ltrans0.ltrans.o: defined in discarded section `.text' of src/ldso/dlstart.lo (symbol from plugin)
> >> | collect2: error: ld returned 1 exit status
> >> | make: *** [lib/libc.so] Error 1
> >>
> >>Reverting f1faa0e1 (make _dlstart_c function use hidden visibility)
> >>seems to be a workaround.
> >
> >I think the problem is that LTO is garbage collecting "unused" symbols
> >before it gets to the step of linking with asm for which there is no
> >IR code, thereby losing anything that's only referenced from asm. A
> >better workaround might be to define _dlstart_c with a different name
> >as a non-hidden function (e.g. call it __dls1) and then make
> >_dlstart_c a hidden alias for it via:
> >
> >__attribute__((__visibility__("hidden")))
> >void _dlstart_c(size_t *, size_t *);
> >
> >weak_alias(__dls1, _dlstart_c);
> >
> >If you get a chance to try that, let me know if it works. Another
> >option might be adding -Wl,-u,_dlstart_c to LDFLAGS.
>
> Wouldn't adding __attribute__((externally_visible)) to the relevant
> symbols be more appropriate? It's intended to solve exactly this
> problem.
I'm not clear whether it would be reliable to use this or not.
Semantically externally_visible and visibility=hidden are
contradictory. Even if we weren't trying to avoid relying on
additional GNU C features, I think it would be a bad idea to rely on
this working since the behavior under such contradictory annotations
could potentially vary widely between compilers.
Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-04-29 3:27 ` Rich Felker
@ 2015-05-01 5:48 ` Andre McCurdy
2015-05-01 10:10 ` Szabolcs Nagy
0 siblings, 1 reply; 16+ messages in thread
From: Andre McCurdy @ 2015-05-01 5:48 UTC (permalink / raw)
To: musl
On Tue, Apr 28, 2015 at 8:27 PM, Rich Felker <dalias@libc.org> wrote:
>
> I would actually love to see new reports on gold with musl dynamic
> linking. The new dynamic linker bootstrap design should be a lot more
> robust against poor choices by the linker, but there could be stupid
> details that are breaking still which might call for fixes on musl's
> side.
>
> If you have citations for the problems that previously existed with
> musl+gold, it would be nice to see those again too just to make sure
> the issues were properly taken care of and not just swept under a rug.
>
I'm able to reproduce the problem I saw previously with the latest
musl git version. Behaviour is that some binaries dynamically linked
with gold (notably busybox) seem to run well but most binaries
segfault at startup.
I'm using gcc 4.9.2 and binutils 2.25, but I should also mention that
I'm using OpenEmbedded to build the toolchain and musl support in OE
is still quite experimental...
Below is a link to a base64 encoded tar file containing two
dynamically linked "hello world" x86 binaries. Both were created using
the same OE toolchain (the only difference was the -fuse-ld=XXX option
used). "hello.bfd" runs well, "hello.gold" segfaults. Hopefully they
can give some clues about what's happening.
http://pastebin.com/raw.php?i=RKJBqAg1
Andre
--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Re: building musl libc.so with gcc -flto
2015-04-30 23:44 ` Rich Felker
@ 2015-05-01 6:57 ` Alexander Monakov
0 siblings, 0 replies; 16+ messages in thread
From: Alexander Monakov @ 2015-05-01 6:57 UTC (permalink / raw)
To: musl
> > Wouldn't adding __attribute__((externally_visible)) to the relevant
> > symbols be more appropriate? It's intended to solve exactly this
> > problem.
>
> I'm not clear whether it would be reliable to use this or not.
> Semantically externally_visible and visibility=hidden are
> contradictory. Even if we weren't trying to avoid relying on
> additional GNU C features, I think it would be a bad idea to rely on
> this working since the behavior under such contradictory annotations
> could potentially vary widely between compilers.
The attribute that's suitable for this case is "used", not
"externally_visible" (um, I already mentioned that in the other thread where I
explained why the reference from the asm is 'ignored'). Andy is not correct
in saying that "it's intended to solve exactly this problem". To quote GCC
manual,
externally_visible
This attribute, attached to a global variable or function, nullifies
the effect of the -fwhole-program command-line option, so the object
remains visible outside the current compilation unit.
used
This attribute, attached to a function, means that code must be
emitted for the function even if it appears that the function is not
referenced. This is useful, for example, when the function is
referenced only in inline assembly.
( https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html )
Alexander
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-05-01 5:48 ` Andre McCurdy
@ 2015-05-01 10:10 ` Szabolcs Nagy
2015-05-01 15:49 ` Rich Felker
0 siblings, 1 reply; 16+ messages in thread
From: Szabolcs Nagy @ 2015-05-01 10:10 UTC (permalink / raw)
To: musl
* Andre McCurdy <armccurdy@gmail.com> [2015-04-30 22:48:09 -0700]:
> I'm able to reproduce the problem I saw previously with the latest
> musl git version. Behaviour is that some binaries dynamically linked
> with gold (notably busybox) seem to run well but most binaries
> segfault at startup.
>
> I'm using gcc 4.9.2 and binutils 2.25, but I should also mention that
> I'm using OpenEmbedded to build the toolchain and musl support in OE
> is still quite experimental...
>
> Below is a link to a base64 encoded tar file containing two
> dynamically linked "hello world" x86 binaries. Both were created using
> the same OE toolchain (the only difference was the -fuse-ld=XXX option
> used). "hello.bfd" runs well, "hello.gold" segfaults. Hopefully they
> can give some clues about what's happening.
>
> http://pastebin.com/raw.php?i=RKJBqAg1
$ nm -D hello.gold
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
08049670 A __bss_start
w __deregister_frame_info
U __libc_start_main
w __register_frame_info
08049670 A _edata
08049690 A _end
U printf
$ nm -D hello.bfd
08049564 B __bss_start
U __libc_start_main
08049564 D _edata
08049584 B _end
U printf
i'm not sure where gold expects __register_frame_info to be defined..
the call chain is
__dls3 -> do_init_fini -> _init -> frame_dummy -> __register_frame_info@plt -> 0
objdump:
0804846e <frame_dummy>:
804846e: 55 push %ebp
804846f: b8 70 83 04 08 mov $0x8048370,%eax
8048474: 89 e5 mov %esp,%ebp
8048476: 83 ec 08 sub $0x8,%esp
8048479: 85 c0 test %eax,%eax
804847b: 74 14 je 8048491 <frame_dummy+0x23>
804847d: 50 push %eax
804847e: 50 push %eax
804847f: 68 78 96 04 08 push $0x8049678
8048484: 68 18 85 04 08 push $0x8048518
8048489: e8 e2 fe ff ff call 8048370 <__register_frame_info@plt>
...
08048370 <__register_frame_info@plt>:
8048370: ff 25 50 96 04 08 jmp *0x8049650
...
GOT at this point:
0x08049648: 0xf7f92263 (__libc_start_main)
0x0804964c: 0x00000000 (should be __deregister_frame_info?)
0x08049650: 0x00000000 (should be __register_frame_info?)
0x08049654: 0xf7fbedeb (printf)
readelf says:
Relocation section '.rel.plt' at offset 0x308 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
08049648 00000107 R_386_JUMP_SLOT 00000000 __libc_start_main
0804964c 00000907 R_386_JUMP_SLOT 08048360 __deregister_frame_inf
08049650 00000607 R_386_JUMP_SLOT 08048370 __register_frame_info
08049654 00000507 R_386_JUMP_SLOT 00000000 printf
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main
2: 00000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
3: 00000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
4: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
5: 00000000 0 FUNC GLOBAL DEFAULT UND printf
6: 08048370 0 FUNC WEAK DEFAULT UND __register_frame_info
7: 08049690 0 NOTYPE GLOBAL DEFAULT ABS _end
8: 08049670 0 NOTYPE GLOBAL DEFAULT ABS _edata
9: 08048360 0 FUNC WEAK DEFAULT UND __deregister_frame_info
10: 08049670 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
Symbol table '.symtab' contains 39 entries:
Num: Value Size Type Bind Vis Ndx Name
...
32: 00000000 0 FUNC WEAK DEFAULT UND __deregister_frame_info
33: 00000000 0 FUNC WEAK DEFAULT UND __register_frame_info
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: building musl libc.so with gcc -flto
2015-05-01 10:10 ` Szabolcs Nagy
@ 2015-05-01 15:49 ` Rich Felker
0 siblings, 0 replies; 16+ messages in thread
From: Rich Felker @ 2015-05-01 15:49 UTC (permalink / raw)
To: musl
On Fri, May 01, 2015 at 12:10:16PM +0200, Szabolcs Nagy wrote:
> * Andre McCurdy <armccurdy@gmail.com> [2015-04-30 22:48:09 -0700]:
> > I'm able to reproduce the problem I saw previously with the latest
> > musl git version. Behaviour is that some binaries dynamically linked
> > with gold (notably busybox) seem to run well but most binaries
> > segfault at startup.
> >
> > I'm using gcc 4.9.2 and binutils 2.25, but I should also mention that
> > I'm using OpenEmbedded to build the toolchain and musl support in OE
> > is still quite experimental...
> >
> > Below is a link to a base64 encoded tar file containing two
> > dynamically linked "hello world" x86 binaries. Both were created using
> > the same OE toolchain (the only difference was the -fuse-ld=XXX option
> > used). "hello.bfd" runs well, "hello.gold" segfaults. Hopefully they
> > can give some clues about what's happening.
> >
> > http://pastebin.com/raw.php?i=RKJBqAg1
>
> $ nm -D hello.gold
> w _ITM_deregisterTMCloneTable
> w _ITM_registerTMCloneTable
> w _Jv_RegisterClasses
> 08049670 A __bss_start
> w __deregister_frame_info
> U __libc_start_main
> w __register_frame_info
> 08049670 A _edata
> 08049690 A _end
> U printf
>
> $ nm -D hello.bfd
> 08049564 B __bss_start
> U __libc_start_main
> 08049564 D _edata
> 08049584 B _end
> U printf
>
> i'm not sure where gold expects __register_frame_info to be defined..
>
> the call chain is
>
> __dls3 -> do_init_fini -> _init -> frame_dummy -> __register_frame_info@plt -> 0
>
> objdump:
>
> 0804846e <frame_dummy>:
> 804846e: 55 push %ebp
> 804846f: b8 70 83 04 08 mov $0x8048370,%eax
> 8048474: 89 e5 mov %esp,%ebp
> 8048476: 83 ec 08 sub $0x8,%esp
> 8048479: 85 c0 test %eax,%eax
> 804847b: 74 14 je 8048491 <frame_dummy+0x23>
> 804847d: 50 push %eax
> 804847e: 50 push %eax
> 804847f: 68 78 96 04 08 push $0x8049678
> 8048484: 68 18 85 04 08 push $0x8048518
> 8048489: e8 e2 fe ff ff call 8048370 <__register_frame_info@plt>
> ....
> 08048370 <__register_frame_info@plt>:
> 8048370: ff 25 50 96 04 08 jmp *0x8049650
> ....
The problem is that gold does not know how to process relocations for
undefined weak references correctly. When the code in question is
PIC/PIE, the weak reference can be kept for resolving at runtime.
Instead of:
804846f: b8 70 83 04 08 mov $0x8048370,%eax
where the linker filled in a fixed address (the PLT slot) which the
code happily sees is non-zero and then calls it, PIC code would read
the address from the GOT. In non-PIC code, the linker (ld) *MUST*
resolve undefined weak references to the address zero; they are not
overridable at runtime because non-PIC doesn't support that.
This is a bug in gold, but I have no idea how it works at all, even
with glibc. The same issue should arise in gcc's crt files.
You can probably work around it for now by building the app as PIE.
Rich
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-05-01 15:49 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-22 22:48 building musl libc.so with gcc -flto Andre McCurdy
2015-04-23 2:23 ` Rich Felker
2015-04-23 5:34 ` Andre McCurdy
2015-04-23 9:45 ` Rich Felker
2015-04-28 0:16 ` Andre McCurdy
2015-04-28 0:24 ` Rich Felker
2015-04-28 6:23 ` Andre McCurdy
2015-04-28 13:44 ` Rich Felker
2015-04-29 1:42 ` Andre McCurdy
2015-04-29 3:27 ` Rich Felker
2015-05-01 5:48 ` Andre McCurdy
2015-05-01 10:10 ` Szabolcs Nagy
2015-05-01 15:49 ` Rich Felker
2015-04-30 20:46 ` Andy Lutomirski
2015-04-30 23:44 ` Rich Felker
2015-05-01 6:57 ` Alexander Monakov
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).