From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7099 Path: news.gmane.org!not-for-mail From: =?UTF-8?B?6YKT5bCn?= Newsgroups: gmane.linux.lib.musl.general Subject: Re: x86[_64] memset and rep stos Date: Wed, 25 Feb 2015 15:54:31 +0800 Message-ID: References: <20150225061204.GA25485@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1424850893 6054 80.91.229.3 (25 Feb 2015 07:54:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 25 Feb 2015 07:54:53 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7112-gllmg-musl=m.gmane.org@lists.openwall.com Wed Feb 25 08:54:53 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YQWo0-0002mZ-I5 for gllmg-musl@m.gmane.org; Wed, 25 Feb 2015 08:54:52 +0100 Original-Received: (qmail 11846 invoked by uid 550); 25 Feb 2015 07:54:50 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 11797 invoked from network); 25 Feb 2015 07:54:42 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=VKSSiCsLYOyYWQZPCSjtnnrK8ldHp2Qg2ZrZdQC8qck=; b=TBWDXofaTivP2FHL+NZHq9NzyC51BGbJKNq83FKrDVAKs/6sikrukvp/LjYRlhBBse ZcwAHbuoVPslIhIuYyYBw74UzJYn7/vEDycZyVhPsUGk9SzGo5kaqyHkhRZYge9IvcEB As7Op/p77+n0lodP2jD3TXMJavBgynME35/KtDPk+qt5Pg1bjZubK0JATbfctW8v1ZOc FavnhsIKA5gf/7+QxWwcS55Fww21sC3alSM51BRisPpP62JZnNMcFmoJnqQ1WA7MseKA YwIi0aXr7E+HBNFE8/rlzd2luY+tsTaQYUilSMtk9og6z0vj40xF9wFqZwAFjfv+fDmL LuCw== X-Received: by 10.112.205.68 with SMTP id le4mr1592924lbc.96.1424850871142; Tue, 24 Feb 2015 23:54:31 -0800 (PST) In-Reply-To: <20150225061204.GA25485@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:7099 Archived-At: I'm not an expert on micro optimization, but why not use a dynamic routine selection system which would select the optimal routine for a given CPU during program initialization. The routine selection algorithm could simply be a predefined static table look up. IMO, only very small number of functions (like memset, memcpy) would benefit from such a system, so no code size overhead to worry about. On Wed, Feb 25, 2015 at 2:12 PM, Rich Felker wrote: > Doing some timings on the new proposed memset code, I found it was > pathologically slow on my Atom D510 (32-bit) when reaching sizes > around 2k - 16k. Like 4x slower than the old code. Apparently the > issue is that the work being done to align the destination mod 4 > misaligns it mod higher powers of two, and "rep stos" performs > pathologically bad when it's not cache-line-aligned, or something like > that. On my faster 64-bit system alignment mod 16 also seems to make a > difference, but less - it's 1.5x slower misaligned mod 16. > > I also found that on the 32-bit Atom, there seems to be a huge jump in > speed at size 1024 -- sizes just below 1024 are roughly 2x slower. > Since it otherwise doesn't make a measurable difference, it seems > preferable _not_ to try to reduce the length of the rep stos to avoid > writing the same bytes multiple times but simply use the max allowable > length. > > Combined with the first issue, it seems we should "round up to a > multiple of 16" rather than "add 16 then round down to a multiple of > 16". Not only does this avoid reducing the length of the rep stos; it > also preserves any higher-than-16 alignment that might be preexisting, > in case even higher alignments are faster. > > Rich