From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8809 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH 2/3] i386/memset: do not fetch fill char from memory again Date: Wed, 4 Nov 2015 21:54:33 -0500 Message-ID: <20151105025433.GW8645@brightrain.aerifal.cx> References: <1444674635-25421-1-git-send-email-vda.linux@googlemail.com> <1444674635-25421-2-git-send-email-vda.linux@googlemail.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1446692091 1303 80.91.229.3 (5 Nov 2015 02:54:51 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 5 Nov 2015 02:54:51 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-8822-gllmg-musl=m.gmane.org@lists.openwall.com Thu Nov 05 03:54:51 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZuAhO-0001M9-J5 for gllmg-musl@m.gmane.org; Thu, 05 Nov 2015 03:54:50 +0100 Original-Received: (qmail 3864 invoked by uid 550); 5 Nov 2015 02:54:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 3842 invoked from network); 5 Nov 2015 02:54:45 -0000 Content-Disposition: inline In-Reply-To: <1444674635-25421-2-git-send-email-vda.linux@googlemail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:8809 Archived-At: On Mon, Oct 12, 2015 at 08:30:33PM +0200, Denys Vlasenko wrote: > shl $16,%edx > mov 8(%esp),%dl > mov 8(%esp),%dh > > The above code has two register merge stalls, and it goes to load unit > to fetch the data. I don't know what's worse. Both are not pleasant. Do you have measurements to back this? > Replace them with IMUL. It has ~3 cycle latency, but no stalls. While we probably don't need to care about ancient chips like 486 or original Pentium for performance purposes (altho maybe Quark?), I'd rather not do anything that would make performance catastrophically worse on them unless it actually has significant (measurable) benefit for modern systems. The code as is was written to be non-hostile to systems where imul has some nontrivial cost. > Move it a bit up to hide its latency. The movement puts it before the branch which exits early, which is probably a huge performance loss on old cpus. Of course even better than evidence that your code helps a lot on modern cpus would be evidence that it doesn't hurt at all on old ones. Anyone have a 486 or 586 lying around to run timings on? I suppose I could see if my old K6 still boots... Rich