From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8262
Path: news.gmane.org!not-for-mail
From: Isaac Dunham <ibid.ag@gmail.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Re: Further dynamic linker optimizations
Date: Wed, 5 Aug 2015 21:32:53 -0700
Message-ID: <20150806043252.GB1900@localhost>
References: <20150630200454.GA28127@brightrain.aerifal.cx>
 <alpine.LNX.2.11.1507072131470.11647@monopod.intra.ispras.ru>
 <20150707185505.GI1173@brightrain.aerifal.cx>
 <20150708084816.6d557b73@vostro>
 <55C29025.8070507@kernel.org>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1438835593 32183 80.91.229.3 (6 Aug 2015 04:33:13 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 6 Aug 2015 04:33:13 +0000 (UTC)
Cc: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Original-X-From: musl-return-8275-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 06 06:33:10 2015
Return-path: <musl-return-8275-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-8275-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1ZNCrd-00066R-Qe
	for gllmg-musl@m.gmane.org; Thu, 06 Aug 2015 06:33:09 +0200
Original-Received: (qmail 7444 invoked by uid 550); 6 Aug 2015 04:33:07 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 7420 invoked from network); 6 Aug 2015 04:33:07 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-type:content-disposition:in-reply-to:user-agent;
        bh=9DRLpvN9Ew/gPOdjjbH9MjEi9GsxVPNLQuEKGMdtmIo=;
        b=dTTYg2v8t1nXGajL8goBEZiMw8uxxcb8vAsIvjBuW1oanCzRXKwpnEghfzbxVeptve
         LRZemnjI2MnARw3mEQ0PSyV9haq2rrSH5+pARidA+KlSap2zJAmnGt1jUQnnu9PDAn0K
         rUQdld4EFlfYWEjVORLdN3WR91U6qZdslRYc0WEQsiJNwiRnJ4H7Rh8j/hG1l2MdJUuC
         x2z3DUQCgbUeK/F27CnJ83YGaVzlMGUuv3ImCzoJF+dkyebCA+jNU6T2SvsMFPaRC/L4
         qaAiukFNIpGMKoOBTzKIqFERWuWKGzs69QW02vCpMENJ246JdJMiDnCLAuBDTLDKgNcE
         NQcg==
X-Received: by 10.68.253.6 with SMTP id zw6mr26978790pbc.150.1438835575158;
        Wed, 05 Aug 2015 21:32:55 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <55C29025.8070507@kernel.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Xref: news.gmane.org gmane.linux.lib.musl.general:8262
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/8262>

On Wed, Aug 05, 2015 at 03:37:25PM -0700, Andy Lutomirski wrote:
> On 07/07/2015 10:48 PM, Timo Teras wrote:
> >On Tue, 7 Jul 2015 14:55:05 -0400
> >Rich Felker <dalias@libc.org> wrote:
> >
> >>On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote:
> >>>On Tue, 30 Jun 2015, Rich Felker wrote:
> >>>
> >>>>Discussion on #musl with Timo Ter??s has produced the following
> >>>>results:
> >>>>
> >>>>- Moving bloom filter size to struct dso gives 5% improvement in
> >>>>clang (built as 110 .so's) start time, simply because of a
> >>>>reduction of number of instructions in the hot path. So I think
> >>>>we should apply that patch.
> >>>
> >>>I think most of the improvement here actually comes from fewer
> >>>cache misses. As a result, I think we should take this idea further
> >>>and shuffle struct dso a little bit so that fields accessed in the
> >>>hot find_sym loop are packed together, if possible.
> >>
> >>I'm not entirely convinced; the 5% seems consistent with the number of
> >>instructions in the code path. Can you confirm this with cache miss
> >>measurements? Or just by obtaining better timings reordering data for
> >>cache locality? Note that the head of struct dso has to remain fixed
> >>(it's gdb ABI :/) but the rest is free to change.
> >
> >I used cachegrind and callgrind to benchmark. In my case there was no
> >change in cache miss number - the speed up was purely based on running
> >less instructions on the hot path.
> >
> >Though, I ran this on i7 with lot of cache. Cache misses could become
> >issue on smaller cpus. But I suspect the bloom filter is doing good
> >enough job to keep cache usage on sensible levels.
> >
> >>>>- The whole outer for loop in find_sym is the hot path for
> >>>>   performance. As such, eliminating the lazy calculation of
> >>>>gnu_hash and simply doing it before the loop should be a
> >>>>measurable win, just by removing the if (!ghm) branch.
> >>>
> >>>On a related note, it's possible to avoid calculating sysv hash, if
> >>>gnu-hash is enabled system-wide, by not setting 'global' flag on
> >>>the vdso item (as mentioned on IRC in your conversation with Timo).
> >>
> >>Yes, and I think this sounds like a worthwhile approach. Seeing
> >>timings for it would be great. :-)
> >
> >I told them earlier in IRC. But on the same i7 box and running "clang
> >--version" which has 100+ DT_NEEDED... removing vdso and thus sysv
> >hashing had magnitude of tens of milliseconds. (I wonder how it'd
> >perform if we calculated both sysv and gnu hashes at same time.)
> 
> /me dons vdso maintainer hat.
> 
> I can add a GNU hash to the vdso quite easily (for Linux 4.3).  Would that
> be helpful?

Would this require a binutils version that supports GNU hashes?
And if so, would it be a hard build-time requirement?

Thanks,
Isaac Dunham