From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8260 Path: news.gmane.org!not-for-mail From: Andy Lutomirski Newsgroups: gmane.linux.lib.musl.general Subject: Re: Further dynamic linker optimizations Date: Wed, 5 Aug 2015 15:37:25 -0700 Message-ID: <55C29025.8070507@kernel.org> References: <20150630200454.GA28127@brightrain.aerifal.cx> <20150707185505.GI1173@brightrain.aerifal.cx> <20150708084816.6d557b73@vostro> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1438814270 22295 80.91.229.3 (5 Aug 2015 22:37:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 5 Aug 2015 22:37:50 +0000 (UTC) To: musl@lists.openwall.com, Rich Felker Original-X-From: musl-return-8273-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 06 00:37:50 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZN7Jj-0000yQ-UQ for gllmg-musl@m.gmane.org; Thu, 06 Aug 2015 00:37:48 +0200 Original-Received: (qmail 32198 invoked by uid 550); 5 Aug 2015 22:37:44 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 32143 invoked from network); 5 Aug 2015 22:37:40 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 In-Reply-To: <20150708084816.6d557b73@vostro> X-Spam-Status: No, score=-0.2 required=5.0 tests=BAYES_00,UNPARSEABLE_RELAY, URIBL_BLACK autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Xref: news.gmane.org gmane.linux.lib.musl.general:8260 Archived-At: On 07/07/2015 10:48 PM, Timo Teras wrote: > On Tue, 7 Jul 2015 14:55:05 -0400 > Rich Felker wrote: > >> On Tue, Jul 07, 2015 at 09:39:09PM +0300, Alexander Monakov wrote: >>> On Tue, 30 Jun 2015, Rich Felker wrote: >>> >>>> Discussion on #musl with Timo Teräs has produced the following >>>> results: >>>> >>>> - Moving bloom filter size to struct dso gives 5% improvement in >>>> clang (built as 110 .so's) start time, simply because of a >>>> reduction of number of instructions in the hot path. So I think >>>> we should apply that patch. >>> >>> I think most of the improvement here actually comes from fewer >>> cache misses. As a result, I think we should take this idea further >>> and shuffle struct dso a little bit so that fields accessed in the >>> hot find_sym loop are packed together, if possible. >> >> I'm not entirely convinced; the 5% seems consistent with the number of >> instructions in the code path. Can you confirm this with cache miss >> measurements? Or just by obtaining better timings reordering data for >> cache locality? Note that the head of struct dso has to remain fixed >> (it's gdb ABI :/) but the rest is free to change. > > I used cachegrind and callgrind to benchmark. In my case there was no > change in cache miss number - the speed up was purely based on running > less instructions on the hot path. > > Though, I ran this on i7 with lot of cache. Cache misses could become > issue on smaller cpus. But I suspect the bloom filter is doing good > enough job to keep cache usage on sensible levels. > >>>> - The whole outer for loop in find_sym is the hot path for >>>> performance. As such, eliminating the lazy calculation of >>>> gnu_hash and simply doing it before the loop should be a >>>> measurable win, just by removing the if (!ghm) branch. >>> >>> On a related note, it's possible to avoid calculating sysv hash, if >>> gnu-hash is enabled system-wide, by not setting 'global' flag on >>> the vdso item (as mentioned on IRC in your conversation with Timo). >> >> Yes, and I think this sounds like a worthwhile approach. Seeing >> timings for it would be great. :-) > > I told them earlier in IRC. But on the same i7 box and running "clang > --version" which has 100+ DT_NEEDED... removing vdso and thus sysv > hashing had magnitude of tens of milliseconds. (I wonder how it'd > perform if we calculated both sysv and gnu hashes at same time.) /me dons vdso maintainer hat. I can add a GNU hash to the vdso quite easily (for Linux 4.3). Would that be helpful? --Andy