From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8262 Path: news.gmane.org!not-for-mail From: Isaac Dunham Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: Further dynamic linker optimizations Date: Wed, 5 Aug 2015 21:32:53 -0700 Message-ID: <20150806043252.GB1900@localhost> References: <20150630200454.GA28127@brightrain.aerifal.cx> <20150707185505.GI1173@brightrain.aerifal.cx> <20150708084816.6d557b73@vostro> <55C29025.8070507@kernel.org> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1438835593 32183 80.91.229.3 (6 Aug 2015 04:33:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 6 Aug 2015 04:33:13 +0000 (UTC) Cc: Rich Felker To: musl@lists.openwall.com Original-X-From: musl-return-8275-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 06 06:33:10 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZNCrd-00066R-Qe for gllmg-musl@m.gmane.org; Thu, 06 Aug 2015 06:33:09 +0200 Original-Received: (qmail 7444 invoked by uid 550); 6 Aug 2015 04:33:07 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 7420 invoked from network); 6 Aug 2015 04:33:07 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9DRLpvN9Ew/gPOdjjbH9MjEi9GsxVPNLQuEKGMdtmIo=; b=dTTYg2v8t1nXGajL8goBEZiMw8uxxcb8vAsIvjBuW1oanCzRXKwpnEghfzbxVeptve LRZemnjI2MnARw3mEQ0PSyV9haq2rrSH5+pARidA+KlSap2zJAmnGt1jUQnnu9PDAn0K rUQdld4EFlfYWEjVORLdN3WR91U6qZdslRYc0WEQsiJNwiRnJ4H7Rh8j/hG1l2MdJUuC x2z3DUQCgbUeK/F27CnJ83YGaVzlMGUuv3ImCzoJF+dkyebCA+jNU6T2SvsMFPaRC/L4 qaAiukFNIpGMKoOBTzKIqFERWuWKGzs69QW02vCpMENJ246JdJMiDnCLAuBDTLDKgNcE NQcg== X-Received: by 10.68.253.6 with SMTP id zw6mr26978790pbc.150.1438835575158; Wed, 05 Aug 2015 21:32:55 -0700 (PDT) Content-Disposition: inline In-Reply-To: <55C29025.8070507@kernel.org> User-Agent: Mutt/1.5.23 (2014-03-12) Xref: news.gmane.org gmane.linux.lib.musl.general:8262 Archived-At: On Wed, Aug 05, 2015 at 03:37:25PM -0700, Andy Lutomirski wrote: > On 07/07/2015 10:48 PM, Timo Teras wrote: > >On Tue, 7 Jul 2015 14:55:05 -0400 > >Rich Felker wrote: > > > >>On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote: > >>>On Tue, 30 Jun 2015, Rich Felker wrote: > >>> > >>>>Discussion on #musl with Timo Ter??s has produced the following > >>>>results: > >>>> > >>>>- Moving bloom filter size to struct dso gives 5% improvement in > >>>>clang (built as 110 .so's) start time, simply because of a > >>>>reduction of number of instructions in the hot path. So I think > >>>>we should apply that patch. > >>> > >>>I think most of the improvement here actually comes from fewer > >>>cache misses. As a result, I think we should take this idea further > >>>and shuffle struct dso a little bit so that fields accessed in the > >>>hot find_sym loop are packed together, if possible. > >> > >>I'm not entirely convinced; the 5% seems consistent with the number of > >>instructions in the code path. Can you confirm this with cache miss > >>measurements? Or just by obtaining better timings reordering data for > >>cache locality? Note that the head of struct dso has to remain fixed > >>(it's gdb ABI :/) but the rest is free to change. > > > >I used cachegrind and callgrind to benchmark. In my case there was no > >change in cache miss number - the speed up was purely based on running > >less instructions on the hot path. > > > >Though, I ran this on i7 with lot of cache. Cache misses could become > >issue on smaller cpus. But I suspect the bloom filter is doing good > >enough job to keep cache usage on sensible levels. > > > >>>>- The whole outer for loop in find_sym is the hot path for > >>>> performance. As such, eliminating the lazy calculation of > >>>>gnu_hash and simply doing it before the loop should be a > >>>>measurable win, just by removing the if (!ghm) branch. > >>> > >>>On a related note, it's possible to avoid calculating sysv hash, if > >>>gnu-hash is enabled system-wide, by not setting 'global' flag on > >>>the vdso item (as mentioned on IRC in your conversation with Timo). > >> > >>Yes, and I think this sounds like a worthwhile approach. Seeing > >>timings for it would be great. :-) > > > >I told them earlier in IRC. But on the same i7 box and running "clang > >--version" which has 100+ DT_NEEDED... removing vdso and thus sysv > >hashing had magnitude of tens of milliseconds. (I wonder how it'd > >perform if we calculated both sysv and gnu hashes at same time.) > > /me dons vdso maintainer hat. > > I can add a GNU hash to the vdso quite easily (for Linux 4.3). Would that > be helpful? Would this require a binutils version that supports GNU hashes? And if so, would it be a hard build-time requirement? Thanks, Isaac Dunham