From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8809
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: [PATCH 2/3] i386/memset: do not fetch fill char from
 memory again
Date: Wed, 4 Nov 2015 21:54:33 -0500
Message-ID: <20151105025433.GW8645@brightrain.aerifal.cx>
References: <1444674635-25421-1-git-send-email-vda.linux@googlemail.com>
 <1444674635-25421-2-git-send-email-vda.linux@googlemail.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1446692091 1303 80.91.229.3 (5 Nov 2015 02:54:51 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 5 Nov 2015 02:54:51 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-8822-gllmg-musl=m.gmane.org@lists.openwall.com Thu Nov 05 03:54:51 2015
Return-path: <musl-return-8822-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-8822-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1ZuAhO-0001M9-J5
	for gllmg-musl@m.gmane.org; Thu, 05 Nov 2015 03:54:50 +0100
Original-Received: (qmail 3864 invoked by uid 550); 5 Nov 2015 02:54:48 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 3842 invoked from network); 5 Nov 2015 02:54:45 -0000
Content-Disposition: inline
In-Reply-To: <1444674635-25421-2-git-send-email-vda.linux@googlemail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:8809
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/8809>

On Mon, Oct 12, 2015 at 08:30:33PM +0200, Denys Vlasenko wrote:
>  shl $16,%edx
>  mov 8(%esp),%dl
>  mov 8(%esp),%dh
> 
> The above code has two register merge stalls, and it goes to load unit
> to fetch the data. I don't know what's worse. Both are not pleasant.

Do you have measurements to back this?

> Replace them with IMUL. It has ~3 cycle latency, but no stalls.

While we probably don't need to care about ancient chips like 486 or
original Pentium for performance purposes (altho maybe Quark?), I'd
rather not do anything that would make performance catastrophically
worse on them unless it actually has significant (measurable) benefit
for modern systems. The code as is was written to be non-hostile to
systems where imul has some nontrivial cost.

> Move it a bit up to hide its latency.

The movement puts it before the branch which exits early, which is
probably a huge performance loss on old cpus.

Of course even better than evidence that your code helps a lot on
modern cpus would be evidence that it doesn't hurt at all on old ones.
Anyone have a 486 or 586 lying around to run timings on? I suppose I
could see if my old K6 still boots...

Rich