From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7043
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: [PATCH] x86_64/memset: use "small block" code for blocks
 up to 30 bytes long
Date: Sun, 15 Feb 2015 10:03:13 -0500
Message-ID: <20150215150313.GO23507@brightrain.aerifal.cx>
References: <1423845589-5920-1-git-send-email-vda.linux@googlemail.com>
 <20150214193533.GK23507@brightrain.aerifal.cx>
 <20150215040655.GM23507@brightrain.aerifal.cx>
 <CAK1hOcPQ=mADeAUP3i-Xt3rvHmgUrVVoz2yUEOkUEYQ2xRVN2g@mail.gmail.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1424012616 3025 80.91.229.3 (15 Feb 2015 15:03:36 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 15 Feb 2015 15:03:36 +0000 (UTC)
Cc: musl <musl@lists.openwall.com>
To: Denys Vlasenko <vda.linux@googlemail.com>
Original-X-From: musl-return-7056-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 15 16:03:35 2015
Return-path: <musl-return-7056-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-7056-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1YN0jP-0001bJ-Em
	for gllmg-musl@m.gmane.org; Sun, 15 Feb 2015 16:03:35 +0100
Original-Received: (qmail 30018 invoked by uid 550); 15 Feb 2015 15:03:34 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 29986 invoked from network); 15 Feb 2015 15:03:30 -0000
Content-Disposition: inline
In-Reply-To: <CAK1hOcPQ=mADeAUP3i-Xt3rvHmgUrVVoz2yUEOkUEYQ2xRVN2g@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:7043
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/7043>

On Sun, Feb 15, 2015 at 03:07:06PM +0100, Denys Vlasenko wrote:
> On Sun, Feb 15, 2015 at 5:06 AM, Rich Felker <dalias@libc.org> wrote:
> >> The main change whose value I really question is the conditional
> >> widen_rax. If the value isn't used until a few cycles after the imul
> >> instruction, doing it unconditionally is probably cheaper than testing
> >> and branching even when the branch is predictable.
> >
> > To elaborate, simply replacing the unconditional imul with an
> > unconditional xor %eax,%eax in my best variant so far, I was only able
> > to save one cycle. So I don't see any way a test, branch, and
> > conditional imul could be less expensive than the unconditional imul.
> 
> So imul elimination is a (tiny) win even on our CPUs, which happen
> to be the _fastest_ CPUs in regards to 64x64 imul (3 cycles).

No, it's a small (maybe you'd call it tiny) loss on them. That was my
point. It's only a tiny win when you rip out the conditional entirely
and just hard-code memset to always write zeros. (BTW, IIRC one OS had
a bug like that which went unnoticed for years... :)

> Just because we don't personally see a hit from 6-cycle imul of AMD CPUs,
> it does not mean people who do use those CPUs don't exist. Have heart...

Did you test the version I attached? I think there should be at least
4-5 cycles between when the imul is launched and when the result is
used, so I'm failing to see how the latency is a big deal.

Rich