From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8072
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Further dynamic linker optimizations
Date: Tue, 30 Jun 2015 16:04:54 -0400
Message-ID: <20150630200454.GA28127@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1435694721 24235 80.91.229.3 (30 Jun 2015 20:05:21 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 30 Jun 2015 20:05:21 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-8085-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jun 30 22:05:19 2015
Return-path: <musl-return-8085-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-8085-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1ZA1mP-0004Vm-Dv
	for gllmg-musl@m.gmane.org; Tue, 30 Jun 2015 22:05:17 +0200
Original-Received: (qmail 27720 invoked by uid 550); 30 Jun 2015 20:05:15 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 26592 invoked from network); 30 Jun 2015 20:05:07 -0000
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:8072
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/8072>

Discussion on #musl with Timo Teräs has produced the following
results:

- Moving bloom filter size to struct dso gives 5% improvement in clang
  (built as 110 .so's) start time, simply because of a reduction of
  number of instructions in the hot path. So I think we should apply
  that patch.

- The whole outer for loop in find_sym is the hot path for
  performance. As such, eliminating the lazy calculation of gnu_hash
  and simply doing it before the loop should be a measurable win, just
  by removing the if (!ghm) branch.

- Even the check if (!dso->global) continue; has nontrivial cost.
  Since I want to replace this representation with a separate
  linked-list chain for global dsos anyway (for other reasons) I think
  that's worth prioritizing for performance too.

- We still don't save and reuse the last symbol lookup in do_relocs.
  Doing so could improve performance a lot when the same symbol is
  referenced multiple times from global data. When the only references
  are the GOT (thus only one per symbol), it's not going to help, but
  since it's outside the find_sym dso loop, it should not have
  measurable cost anyway.

- String comparison (dl_strcmp) is costly, but nontrivial to optimize.
  Word-at-a-time optimizations have issues with crossing pages, even
  on archs that don't require aligned access. Probably the right way
  forward here is to get an optimized general strcmp, then add a
  mechanism (function pointer in struct dso? or global?) for the
  dynamic linker to call dl_strcmp when relocating itself but the real
  strcmp later.

- The strength-reduction of remainder operations does not seem to
  provide worthwhile benefits yet, simply because so little of the
  overall time is spent on the division/remainder.

Rich