Further dynamic linker optimizations

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Further dynamic linker optimizations
@ 2015-06-30 20:04 Rich Felker
  2015-07-01  5:41 ` Timo Teras
  2015-07-07 18:39 ` Alexander Monakov
  0 siblings, 2 replies; 12+ messages in thread
From: Rich Felker @ 2015-06-30 20:04 UTC (permalink / raw)
  To: musl

Discussion on #musl with Timo Teräs has produced the following
results:

- Moving bloom filter size to struct dso gives 5% improvement in clang
  (built as 110 .so's) start time, simply because of a reduction of
  number of instructions in the hot path. So I think we should apply
  that patch.

- The whole outer for loop in find_sym is the hot path for
  performance. As such, eliminating the lazy calculation of gnu_hash
  and simply doing it before the loop should be a measurable win, just
  by removing the if (!ghm) branch.

- Even the check if (!dso->global) continue; has nontrivial cost.
  Since I want to replace this representation with a separate
  linked-list chain for global dsos anyway (for other reasons) I think
  that's worth prioritizing for performance too.

- We still don't save and reuse the last symbol lookup in do_relocs.
  Doing so could improve performance a lot when the same symbol is
  referenced multiple times from global data. When the only references
  are the GOT (thus only one per symbol), it's not going to help, but
  since it's outside the find_sym dso loop, it should not have
  measurable cost anyway.

- String comparison (dl_strcmp) is costly, but nontrivial to optimize.
  Word-at-a-time optimizations have issues with crossing pages, even
  on archs that don't require aligned access. Probably the right way
  forward here is to get an optimized general strcmp, then add a
  mechanism (function pointer in struct dso? or global?) for the
  dynamic linker to call dl_strcmp when relocating itself but the real
  strcmp later.

- The strength-reduction of remainder operations does not seem to
  provide worthwhile benefits yet, simply because so little of the
  overall time is spent on the division/remainder.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Further dynamic linker optimizations
  2015-06-30 20:04 Further dynamic linker optimizations Rich Felker
@ 2015-07-01  5:41 ` Timo Teras
  2015-07-01 14:03   ` Rich Felker
  2015-07-07 18:39 ` Alexander Monakov
  1 sibling, 1 reply; 12+ messages in thread
From: Timo Teras @ 2015-07-01  5:41 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Tue, 30 Jun 2015 16:04:54 -0400
Rich Felker <dalias@libc.org> wrote:

> Discussion on #musl with Timo Teräs has produced the following
> results:

Nice summary. Thanks!

> - The whole outer for loop in find_sym is the hot path for
>   performance. As such, eliminating the lazy calculation of gnu_hash
>   and simply doing it before the loop should be a measurable win, just
>   by removing the if (!ghm) branch.

Additional thought. We could do a skip list here. If we calculate the
gnu-hash unconditionally, we could bloom filter bits to construct a
skip list.

That is, we have next_symlookup[] array that has pointer for each
wordsize bits (or potentially a small multiple of it). And we would link
each dso in next_symlookup array corresponding to each bloom filter
bit (for dso without gnu-hash it'd have to go to all of them). Then on
lookup we could just use the calculated bloomfilter to follow the
correct symlookup chain next pointers.

If the pointer array size is less than the bloom filter size, the bloom
filter can be always reduced by |= individual elements together.
Though, it'd probably need some analysis on how this would work out. If
ORring all elements together always yields all bits set, this is kinda
useless.

This should be significant win on cases like clang when there are tens
of thousands of symbol lookups, and 100+ dsos. Trade off is of course
little memory and little extra time to setup the additional chains.

Thoughts?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Further dynamic linker optimizations
  2015-07-01  5:41 ` Timo Teras
@ 2015-07-01 14:03   ` Rich Felker
  2015-07-01 14:10     ` Timo Teras
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2015-07-01 14:03 UTC (permalink / raw)
  To: musl

On Wed, Jul 01, 2015 at 08:41:29AM +0300, Timo Teras wrote:
> On Tue, 30 Jun 2015 16:04:54 -0400
> Rich Felker <dalias@libc.org> wrote:
> 
> > Discussion on #musl with Timo Teräs has produced the following
> > results:
> 
> Nice summary. Thanks!
> 
> > - The whole outer for loop in find_sym is the hot path for
> >   performance. As such, eliminating the lazy calculation of gnu_hash
> >   and simply doing it before the loop should be a measurable win, just
> >   by removing the if (!ghm) branch.
> 
> Additional thought. We could do a skip list here. If we calculate the
> gnu-hash unconditionally, we could bloom filter bits to construct a
> skip list.

This wasn't something I recall discussing...

> That is, we have next_symlookup[] array that has pointer for each
> wordsize bits (or potentially a small multiple of it). And we would link
> each dso in next_symlookup array corresponding to each bloom filter
> bit (for dso without gnu-hash it'd have to go to all of them). Then on
> lookup we could just use the calculated bloomfilter to follow the
> correct symlookup chain next pointers.

This is a very large size increase, and perhaps notable startup time
increase, just for the sake of mislinked (clang) applications. It's
something like the idea I wanted to do in a static linker, albeit with
a much larger global table for where to start based on hash%largeval
rather than local next/skip tables per module. But I don't think it's
appropriate for dynamic linking.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Further dynamic linker optimizations
  2015-07-01 14:03   ` Rich Felker
@ 2015-07-01 14:10     ` Timo Teras
  0 siblings, 0 replies; 12+ messages in thread
From: Timo Teras @ 2015-07-01 14:10 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Wed, 1 Jul 2015 10:03:27 -0400
Rich Felker <dalias@libc.org> wrote:

> On Wed, Jul 01, 2015 at 08:41:29AM +0300, Timo Teras wrote:
> > On Tue, 30 Jun 2015 16:04:54 -0400
> > Rich Felker <dalias@libc.org> wrote:
> > 
> > > Discussion on #musl with Timo Teräs has produced the following
> > > results:
> > 
> > Nice summary. Thanks!
> > 
> > > - The whole outer for loop in find_sym is the hot path for
> > >   performance. As such, eliminating the lazy calculation of
> > > gnu_hash and simply doing it before the loop should be a
> > > measurable win, just by removing the if (!ghm) branch.
> > 
> > Additional thought. We could do a skip list here. If we calculate
> > the gnu-hash unconditionally, we could bloom filter bits to
> > construct a skip list.
> 
> This wasn't something I recall discussing...

Yes. That's why I said "additional thought". :)

> > That is, we have next_symlookup[] array that has pointer for each
> > wordsize bits (or potentially a small multiple of it). And we would
> > link each dso in next_symlookup array corresponding to each bloom
> > filter bit (for dso without gnu-hash it'd have to go to all of
> > them). Then on lookup we could just use the calculated bloomfilter
> > to follow the correct symlookup chain next pointers.
> 
> This is a very large size increase, and perhaps notable startup time
> increase, just for the sake of mislinked (clang) applications. It's
> something like the idea I wanted to do in a static linker, albeit with
> a much larger global table for where to start based on hash%largeval
> rather than local next/skip tables per module. But I don't think it's
> appropriate for dynamic linking.

Yes. And after some trivial benchmarking, it seems to not give any
significant improvement. bloomfilter using only one machine word is not
enough for anything. And doing anything larger gives too much memory
use overhead.

Just moving the gnu-hash calculation out of the loop and doing it
always + removing the ->global check will give already quite noticeable
boost.

Thanks.
Timo



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Further dynamic linker optimizations
  2015-06-30 20:04 Further dynamic linker optimizations Rich Felker
  2015-07-01  5:41 ` Timo Teras
@ 2015-07-07 18:39 ` Alexander Monakov
  2015-07-07 18:55   ` Rich Felker
  1 sibling, 1 reply; 12+ messages in thread
From: Alexander Monakov @ 2015-07-07 18:39 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1779 bytes --]

On Tue, 30 Jun 2015, Rich Felker wrote:

> Discussion on #musl with Timo Teräs has produced the following
> results:
> 
> - Moving bloom filter size to struct dso gives 5% improvement in clang
>   (built as 110 .so's) start time, simply because of a reduction of
>   number of instructions in the hot path. So I think we should apply
>   that patch.

I think most of the improvement here actually comes from fewer cache misses.
As a result, I think we should take this idea further and shuffle struct dso a
little bit so that fields accessed in the hot find_sym loop are packed
together, if possible.

> - The whole outer for loop in find_sym is the hot path for
>   performance. As such, eliminating the lazy calculation of gnu_hash
>   and simply doing it before the loop should be a measurable win, just
>   by removing the if (!ghm) branch.

On a related note, it's possible to avoid calculating sysv hash, if gnu-hash
is enabled system-wide, by not setting 'global' flag on the vdso item (as
mentioned on IRC in your conversation with Timo).

> - Even the check if (!dso->global) continue; has nontrivial cost.
>   Since I want to replace this representation with a separate
>   linked-list chain for global dsos anyway (for other reasons) I think
>   that's worth prioritizing for performance too.

I'm curious what the other reasons are? :)

> - The strength-reduction of remainder operations does not seem to
>   provide worthwhile benefits yet, simply because so little of the
>   overall time is spent on the division/remainder.

On IRC we noted that on AArch64 it's slower than native div/mod on our
microbenchmark, and on ARM the speedup is smaller than expected.  My testing
on x86 indicates that it's not profitable in the dynamic linker (not sure
why).

Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Further dynamic linker optimizations
  2015-07-07 18:39 ` Alexander Monakov
@ 2015-07-07 18:55   ` Rich Felker
  2015-07-08  5:48     ` Timo Teras
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2015-07-07 18:55 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]

On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote:
> On Tue, 30 Jun 2015, Rich Felker wrote:
> 
> > Discussion on #musl with Timo Teräs has produced the following
> > results:
> > 
> > - Moving bloom filter size to struct dso gives 5% improvement in clang
> >   (built as 110 .so's) start time, simply because of a reduction of
> >   number of instructions in the hot path. So I think we should apply
> >   that patch.
> 
> I think most of the improvement here actually comes from fewer cache misses.
> As a result, I think we should take this idea further and shuffle struct dso a
> little bit so that fields accessed in the hot find_sym loop are packed
> together, if possible.

I'm not entirely convinced; the 5% seems consistent with the number of
instructions in the code path. Can you confirm this with cache miss
measurements? Or just by obtaining better timings reordering data for
cache locality? Note that the head of struct dso has to remain fixed
(it's gdb ABI :/) but the rest is free to change.

> > - The whole outer for loop in find_sym is the hot path for
> >   performance. As such, eliminating the lazy calculation of gnu_hash
> >   and simply doing it before the loop should be a measurable win, just
> >   by removing the if (!ghm) branch.
> 
> On a related note, it's possible to avoid calculating sysv hash, if gnu-hash
> is enabled system-wide, by not setting 'global' flag on the vdso item (as
> mentioned on IRC in your conversation with Timo).

Yes, and I think this sounds like a worthwhile approach. Seeing
timings for it would be great. :-)

> > - Even the check if (!dso->global) continue; has nontrivial cost.
> >   Since I want to replace this representation with a separate
> >   linked-list chain for global dsos anyway (for other reasons) I think
> >   that's worth prioritizing for performance too.
> 
> I'm curious what the other reasons are? :)

Depending on an open question I have to the Austin Group list (sorry,
I can't get the archives to work to provide a link), changes may be
needed for semantic correctness. It's easier to describe the issue
with code. Compile the attached test case with the following commands:

gcc -shared -fPIC -DLIB -o libA.so dlorder.c
gcc -shared -fPIC -DLIB -o libB.so dlorder.c
gcc -o dlorder dlorder.c

On musl it prints 2 different addresses (the subsequent RTLD_GLOBAL
changes the definition of a symbol) which I think is wrong, but I
haven't yet checked what other implementations do.

> > - The strength-reduction of remainder operations does not seem to
> >   provide worthwhile benefits yet, simply because so little of the
> >   overall time is spent on the division/remainder.
> 
> On IRC we noted that on AArch64 it's slower than native div/mod on our
> microbenchmark, and on ARM the speedup is smaller than expected.  My testing
> on x86 indicates that it's not profitable in the dynamic linker (not sure
> why).

Agreed, but I think we do know why it's not profitable: at least in
the cases tested, the time spent on remainders is negligible anyway.

Rich

[-- Attachment #2: dlorder.c --]
[-- Type: text/plain, Size: 367 bytes --]

#ifdef LIB

int foo = 42;

#else

#include <dlfcn.h>
#include <stdio.h>

int main()
{
	void *h1, *h2, *hg;
	h1 = dlopen("./libA.so", RTLD_NOW|RTLD_LOCAL);
	h2 = dlopen("./libB.so", RTLD_NOW|RTLD_GLOBAL);
	hg = dlopen(0, RTLD_NOW|RTLD_GLOBAL);
	printf("%p\n", dlsym(hg, "foo"));
	dlopen("./libA.so", RTLD_NOW|RTLD_GLOBAL);
	printf("%p\n", dlsym(hg, "foo"));
}

#endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Further dynamic linker optimizations
  2015-07-07 18:55   ` Rich Felker
@ 2015-07-08  5:48     ` Timo Teras
  2015-08-05 22:37       ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Timo Teras @ 2015-07-08  5:48 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Tue, 7 Jul 2015 14:55:05 -0400
Rich Felker <dalias@libc.org> wrote:

> On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote:
> > On Tue, 30 Jun 2015, Rich Felker wrote:
> > 
> > > Discussion on #musl with Timo Teräs has produced the following
> > > results:
> > > 
> > > - Moving bloom filter size to struct dso gives 5% improvement in
> > > clang (built as 110 .so's) start time, simply because of a
> > > reduction of number of instructions in the hot path. So I think
> > > we should apply that patch.
> > 
> > I think most of the improvement here actually comes from fewer
> > cache misses. As a result, I think we should take this idea further
> > and shuffle struct dso a little bit so that fields accessed in the
> > hot find_sym loop are packed together, if possible.
> 
> I'm not entirely convinced; the 5% seems consistent with the number of
> instructions in the code path. Can you confirm this with cache miss
> measurements? Or just by obtaining better timings reordering data for
> cache locality? Note that the head of struct dso has to remain fixed
> (it's gdb ABI :/) but the rest is free to change.

I used cachegrind and callgrind to benchmark. In my case there was no
change in cache miss number - the speed up was purely based on running
less instructions on the hot path.

Though, I ran this on i7 with lot of cache. Cache misses could become
issue on smaller cpus. But I suspect the bloom filter is doing good
enough job to keep cache usage on sensible levels.

> > > - The whole outer for loop in find_sym is the hot path for
> > >   performance. As such, eliminating the lazy calculation of
> > > gnu_hash and simply doing it before the loop should be a
> > > measurable win, just by removing the if (!ghm) branch.
> > 
> > On a related note, it's possible to avoid calculating sysv hash, if
> > gnu-hash is enabled system-wide, by not setting 'global' flag on
> > the vdso item (as mentioned on IRC in your conversation with Timo).
> 
> Yes, and I think this sounds like a worthwhile approach. Seeing
> timings for it would be great. :-)

I told them earlier in IRC. But on the same i7 box and running "clang
--version" which has 100+ DT_NEEDED... removing vdso and thus sysv
hashing had magnitude of tens of milliseconds. (I wonder how it'd
perform if we calculated both sysv and gnu hashes at same time.)

Removing the 'global' flag testing, and making gnu-hash calculation
unconditional together were also a measurable speed-up. Around 5-10
milliseconds.

For reference, "time clang --version" on my Intel(R) Core(TM) i7-4510U:
- current musl release: ~160 ms
- current git master: ~90 ms
- ghashmask added: ~83 ms
- sysv hash calc removed: ~77 ms
- global test removed, unconditional gnu-hash: ~71 ms

As another reference, "clang --version" currently takes about 3 seconds
on Wandboard ARM box. But I have no numbers on the speed up on that box.

Thanks,
Timo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Further dynamic linker optimizations
  2015-07-08  5:48     ` Timo Teras
@ 2015-08-05 22:37       ` Andy Lutomirski
  2015-08-06  3:04         ` Rich Felker
  2015-08-06  4:32         ` Isaac Dunham
  0 siblings, 2 replies; 12+ messages in thread
From: Andy Lutomirski @ 2015-08-05 22:37 UTC (permalink / raw)
  To: musl, Rich Felker

On 07/07/2015 10:48 PM, Timo Teras wrote:
> On Tue, 7 Jul 2015 14:55:05 -0400
> Rich Felker <dalias@libc.org> wrote:
>
>> On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote:
>>> On Tue, 30 Jun 2015, Rich Felker wrote:
>>>
>>>> Discussion on #musl with Timo Teräs has produced the following
>>>> results:
>>>>
>>>> - Moving bloom filter size to struct dso gives 5% improvement in
>>>> clang (built as 110 .so's) start time, simply because of a
>>>> reduction of number of instructions in the hot path. So I think
>>>> we should apply that patch.
>>>
>>> I think most of the improvement here actually comes from fewer
>>> cache misses. As a result, I think we should take this idea further
>>> and shuffle struct dso a little bit so that fields accessed in the
>>> hot find_sym loop are packed together, if possible.
>>
>> I'm not entirely convinced; the 5% seems consistent with the number of
>> instructions in the code path. Can you confirm this with cache miss
>> measurements? Or just by obtaining better timings reordering data for
>> cache locality? Note that the head of struct dso has to remain fixed
>> (it's gdb ABI :/) but the rest is free to change.
>
> I used cachegrind and callgrind to benchmark. In my case there was no
> change in cache miss number - the speed up was purely based on running
> less instructions on the hot path.
>
> Though, I ran this on i7 with lot of cache. Cache misses could become
> issue on smaller cpus. But I suspect the bloom filter is doing good
> enough job to keep cache usage on sensible levels.
>
>>>> - The whole outer for loop in find_sym is the hot path for
>>>>    performance. As such, eliminating the lazy calculation of
>>>> gnu_hash and simply doing it before the loop should be a
>>>> measurable win, just by removing the if (!ghm) branch.
>>>
>>> On a related note, it's possible to avoid calculating sysv hash, if
>>> gnu-hash is enabled system-wide, by not setting 'global' flag on
>>> the vdso item (as mentioned on IRC in your conversation with Timo).
>>
>> Yes, and I think this sounds like a worthwhile approach. Seeing
>> timings for it would be great. :-)
>
> I told them earlier in IRC. But on the same i7 box and running "clang
> --version" which has 100+ DT_NEEDED... removing vdso and thus sysv
> hashing had magnitude of tens of milliseconds. (I wonder how it'd
> perform if we calculated both sysv and gnu hashes at same time.)

/me dons vdso maintainer hat.

I can add a GNU hash to the vdso quite easily (for Linux 4.3).  Would 
that be helpful?

--Andy


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Further dynamic linker optimizations
  2015-08-05 22:37       ` Andy Lutomirski
@ 2015-08-06  3:04         ` Rich Felker
  2015-08-06  4:32         ` Isaac Dunham
  1 sibling, 0 replies; 12+ messages in thread
From: Rich Felker @ 2015-08-06  3:04 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: musl

On Wed, Aug 05, 2015 at 03:37:25PM -0700, Andy Lutomirski wrote:
> >>>>- The whole outer for loop in find_sym is the hot path for
> >>>>   performance. As such, eliminating the lazy calculation of
> >>>>gnu_hash and simply doing it before the loop should be a
> >>>>measurable win, just by removing the if (!ghm) branch.
> >>>
> >>>On a related note, it's possible to avoid calculating sysv hash, if
> >>>gnu-hash is enabled system-wide, by not setting 'global' flag on
> >>>the vdso item (as mentioned on IRC in your conversation with Timo).
> >>
> >>Yes, and I think this sounds like a worthwhile approach. Seeing
> >>timings for it would be great. :-)
> >
> >I told them earlier in IRC. But on the same i7 box and running "clang
> >--version" which has 100+ DT_NEEDED... removing vdso and thus sysv
> >hashing had magnitude of tens of milliseconds. (I wonder how it'd
> >perform if we calculated both sysv and gnu hashes at same time.)
> 
> /me dons vdso maintainer hat.
> 
> I can add a GNU hash to the vdso quite easily (for Linux 4.3).
> Would that be helpful?

Yes, and I'd lean towards doing this unless you can see any
disadvantages to weigh it against (using more pages? would that
matter?). But either way I think we should make the change on the musl
side too. It doesn't make sense for the vdso to appear in the global
namespace unless it was actually pulled in by dlopen/RTLD_GLOBAL. For
actually using the vdso symbols, we don't use the dynamic linker
anyway; we look them up directly so that they work with static linking
(and because the way the dynamic linker/libc is linked precludes vdso
symbols getting used to resolve its own references, anyway).

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Further dynamic linker optimizations
  2015-08-05 22:37       ` Andy Lutomirski
  2015-08-06  3:04         ` Rich Felker
@ 2015-08-06  4:32         ` Isaac Dunham
  2015-08-06  9:33           ` Szabolcs Nagy
  1 sibling, 1 reply; 12+ messages in thread
From: Isaac Dunham @ 2015-08-06  4:32 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On Wed, Aug 05, 2015 at 03:37:25PM -0700, Andy Lutomirski wrote:
> On 07/07/2015 10:48 PM, Timo Teras wrote:
> >On Tue, 7 Jul 2015 14:55:05 -0400
> >Rich Felker <dalias@libc.org> wrote:
> >
> >>On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote:
> >>>On Tue, 30 Jun 2015, Rich Felker wrote:
> >>>
> >>>>Discussion on #musl with Timo Ter??s has produced the following
> >>>>results:
> >>>>
> >>>>- Moving bloom filter size to struct dso gives 5% improvement in
> >>>>clang (built as 110 .so's) start time, simply because of a
> >>>>reduction of number of instructions in the hot path. So I think
> >>>>we should apply that patch.
> >>>
> >>>I think most of the improvement here actually comes from fewer
> >>>cache misses. As a result, I think we should take this idea further
> >>>and shuffle struct dso a little bit so that fields accessed in the
> >>>hot find_sym loop are packed together, if possible.
> >>
> >>I'm not entirely convinced; the 5% seems consistent with the number of
> >>instructions in the code path. Can you confirm this with cache miss
> >>measurements? Or just by obtaining better timings reordering data for
> >>cache locality? Note that the head of struct dso has to remain fixed
> >>(it's gdb ABI :/) but the rest is free to change.
> >
> >I used cachegrind and callgrind to benchmark. In my case there was no
> >change in cache miss number - the speed up was purely based on running
> >less instructions on the hot path.
> >
> >Though, I ran this on i7 with lot of cache. Cache misses could become
> >issue on smaller cpus. But I suspect the bloom filter is doing good
> >enough job to keep cache usage on sensible levels.
> >
> >>>>- The whole outer for loop in find_sym is the hot path for
> >>>>   performance. As such, eliminating the lazy calculation of
> >>>>gnu_hash and simply doing it before the loop should be a
> >>>>measurable win, just by removing the if (!ghm) branch.
> >>>
> >>>On a related note, it's possible to avoid calculating sysv hash, if
> >>>gnu-hash is enabled system-wide, by not setting 'global' flag on
> >>>the vdso item (as mentioned on IRC in your conversation with Timo).
> >>
> >>Yes, and I think this sounds like a worthwhile approach. Seeing
> >>timings for it would be great. :-)
> >
> >I told them earlier in IRC. But on the same i7 box and running "clang
> >--version" which has 100+ DT_NEEDED... removing vdso and thus sysv
> >hashing had magnitude of tens of milliseconds. (I wonder how it'd
> >perform if we calculated both sysv and gnu hashes at same time.)
> 
> /me dons vdso maintainer hat.
> 
> I can add a GNU hash to the vdso quite easily (for Linux 4.3).  Would that
> be helpful?

Would this require a binutils version that supports GNU hashes?
And if so, would it be a hard build-time requirement?

Thanks,
Isaac Dunham


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Further dynamic linker optimizations
  2015-08-06  4:32         ` Isaac Dunham
@ 2015-08-06  9:33           ` Szabolcs Nagy
  2015-08-06 15:13             ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2015-08-06  9:33 UTC (permalink / raw)
  To: Isaac Dunham; +Cc: musl, Rich Felker, Andy Lutomirski

* Isaac Dunham <ibid.ag@gmail.com> [2015-08-05 21:32:53 -0700]:
> On Wed, Aug 05, 2015 at 03:37:25PM -0700, Andy Lutomirski wrote:
> > 
> > I can add a GNU hash to the vdso quite easily (for Linux 4.3).  Would that
> > be helpful?
> 
> Would this require a binutils version that supports GNU hashes?
> And if so, would it be a hard build-time requirement?
> 

vdso is only used at runtime, so static linker support is not
needed when you build applications.

i guess for building the kernel itself linking the vdso.so
will depend on --hash-style=gnu support in the target ld,
that is binutils 2.18.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: Further dynamic linker optimizations
  2015-08-06  9:33           ` Szabolcs Nagy
@ 2015-08-06 15:13             ` Andy Lutomirski
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2015-08-06 15:13 UTC (permalink / raw)
  To: Isaac Dunham, musl, Rich Felker, Andy Lutomirski

On Thu, Aug 6, 2015 at 2:33 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> * Isaac Dunham <ibid.ag@gmail.com> [2015-08-05 21:32:53 -0700]:
>> On Wed, Aug 05, 2015 at 03:37:25PM -0700, Andy Lutomirski wrote:
>> >
>> > I can add a GNU hash to the vdso quite easily (for Linux 4.3).  Would that
>> > be helpful?
>>
>> Would this require a binutils version that supports GNU hashes?
>> And if so, would it be a hard build-time requirement?
>>
>
> vdso is only used at runtime, so static linker support is not
> needed when you build applications.
>
> i guess for building the kernel itself linking the vdso.so
> will depend on --hash-style=gnu support in the target ld,
> that is binutils 2.18.

Yes, exactly.  I'll do this for x86, and I'll encourage the other arch
vdso maintainers to do the same thing.  If a kernel is built with old
binutils, then the gnu has won't be there.

--Andy


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-08-06 15:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-30 20:04 Further dynamic linker optimizations Rich Felker
2015-07-01  5:41 ` Timo Teras
2015-07-01 14:03   ` Rich Felker
2015-07-01 14:10     ` Timo Teras
2015-07-07 18:39 ` Alexander Monakov
2015-07-07 18:55   ` Rich Felker
2015-07-08  5:48     ` Timo Teras
2015-08-05 22:37       ` Andy Lutomirski
2015-08-06  3:04         ` Rich Felker
2015-08-06  4:32         ` Isaac Dunham
2015-08-06  9:33           ` Szabolcs Nagy
2015-08-06 15:13             ` Andy Lutomirski

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).