Re: Further dynamic linker optimizations

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Further dynamic linker optimizations
Date: Tue, 7 Jul 2015 14:55:05 -0400	[thread overview]
Message-ID: <20150707185505.GI1173@brightrain.aerifal.cx> (raw)
In-Reply-To: <alpine.LNX.2.11.1507072131470.11647@monopod.intra.ispras.ru>

[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]

On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote:
> On Tue, 30 Jun 2015, Rich Felker wrote:
> 
> > Discussion on #musl with Timo Teräs has produced the following
> > results:
> > 
> > - Moving bloom filter size to struct dso gives 5% improvement in clang
> >   (built as 110 .so's) start time, simply because of a reduction of
> >   number of instructions in the hot path. So I think we should apply
> >   that patch.
> 
> I think most of the improvement here actually comes from fewer cache misses.
> As a result, I think we should take this idea further and shuffle struct dso a
> little bit so that fields accessed in the hot find_sym loop are packed
> together, if possible.

I'm not entirely convinced; the 5% seems consistent with the number of
instructions in the code path. Can you confirm this with cache miss
measurements? Or just by obtaining better timings reordering data for
cache locality? Note that the head of struct dso has to remain fixed
(it's gdb ABI :/) but the rest is free to change.

> > - The whole outer for loop in find_sym is the hot path for
> >   performance. As such, eliminating the lazy calculation of gnu_hash
> >   and simply doing it before the loop should be a measurable win, just
> >   by removing the if (!ghm) branch.
> 
> On a related note, it's possible to avoid calculating sysv hash, if gnu-hash
> is enabled system-wide, by not setting 'global' flag on the vdso item (as
> mentioned on IRC in your conversation with Timo).

Yes, and I think this sounds like a worthwhile approach. Seeing
timings for it would be great. :-)

> > - Even the check if (!dso->global) continue; has nontrivial cost.
> >   Since I want to replace this representation with a separate
> >   linked-list chain for global dsos anyway (for other reasons) I think
> >   that's worth prioritizing for performance too.
> 
> I'm curious what the other reasons are? :)

Depending on an open question I have to the Austin Group list (sorry,
I can't get the archives to work to provide a link), changes may be
needed for semantic correctness. It's easier to describe the issue
with code. Compile the attached test case with the following commands:

gcc -shared -fPIC -DLIB -o libA.so dlorder.c
gcc -shared -fPIC -DLIB -o libB.so dlorder.c
gcc -o dlorder dlorder.c

On musl it prints 2 different addresses (the subsequent RTLD_GLOBAL
changes the definition of a symbol) which I think is wrong, but I
haven't yet checked what other implementations do.

> > - The strength-reduction of remainder operations does not seem to
> >   provide worthwhile benefits yet, simply because so little of the
> >   overall time is spent on the division/remainder.
> 
> On IRC we noted that on AArch64 it's slower than native div/mod on our
> microbenchmark, and on ARM the speedup is smaller than expected.  My testing
> on x86 indicates that it's not profitable in the dynamic linker (not sure
> why).

Agreed, but I think we do know why it's not profitable: at least in
the cases tested, the time spent on remainders is negligible anyway.

Rich

[-- Attachment #2: dlorder.c --]
[-- Type: text/plain, Size: 367 bytes --]

#ifdef LIB

int foo = 42;

#else

#include <dlfcn.h>
#include <stdio.h>

int main()
{
	void *h1, *h2, *hg;
	h1 = dlopen("./libA.so", RTLD_NOW|RTLD_LOCAL);
	h2 = dlopen("./libB.so", RTLD_NOW|RTLD_GLOBAL);
	hg = dlopen(0, RTLD_NOW|RTLD_GLOBAL);
	printf("%p\n", dlsym(hg, "foo"));
	dlopen("./libA.so", RTLD_NOW|RTLD_GLOBAL);
	printf("%p\n", dlsym(hg, "foo"));
}

#endif

next prev parent reply	other threads:[~2015-07-07 18:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-30 20:04 Rich Felker
2015-07-01  5:41 ` Timo Teras
2015-07-01 14:03   ` Rich Felker
2015-07-01 14:10     ` Timo Teras
2015-07-07 18:39 ` Alexander Monakov
2015-07-07 18:55   ` Rich Felker [this message]
2015-07-08  5:48     ` Timo Teras
2015-08-05 22:37       ` Andy Lutomirski
2015-08-06  3:04         ` Rich Felker
2015-08-06  4:32         ` Isaac Dunham
2015-08-06  9:33           ` Szabolcs Nagy
2015-08-06 15:13             ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150707185505.GI1173@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).