From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6974 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations Date: Tue, 10 Feb 2015 17:36:48 -0500 Message-ID: <20150210223648.GN23507@brightrain.aerifal.cx> References: <1423589457-8407-1-git-send-email-vda.linux@googlemail.com> <20150210205047.GK23507@brightrain.aerifal.cx> <20150210213756.GM23507@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1423607824 27676 80.91.229.3 (10 Feb 2015 22:37:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 10 Feb 2015 22:37:04 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-6987-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 10 23:37:04 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YLJQU-0005L6-VL for gllmg-musl@m.gmane.org; Tue, 10 Feb 2015 23:37:03 +0100 Original-Received: (qmail 24454 invoked by uid 550); 10 Feb 2015 22:37:01 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 24440 invoked from network); 10 Feb 2015 22:37:00 -0000 Content-Disposition: inline In-Reply-To: <20150210213756.GM23507@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:6974 Archived-At: On Tue, Feb 10, 2015 at 04:37:56PM -0500, Rich Felker wrote: > On Tue, Feb 10, 2015 at 10:08:29PM +0100, Denys Vlasenko wrote: > > On Tue, Feb 10, 2015 at 9:50 PM, Rich Felker wrote: > > > On Tue, Feb 10, 2015 at 06:30:56PM +0100, Denys Vlasenko wrote: > > >> "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use > > >> 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead. > > >> [...] > > > > > > Do you want to go ahead with these patches as-is, or consider some of > > > the other ideas we discussed off-list like avoiding the 64-bit imul > > > entirely in the small-n case? If you think that's easy as another > > > incremental change I'll go ahead with these > > > > I think you can apply these patches without waiting > > for potential future improvements. > > OK. Based on some casual testing on my Celeron 847: > > - For small sizes, your patches make significant improvement, 20-30%. > > - For rep stosq path, the improvement is minimal (roughly 1-2 cycles). > > - Using 32-bit imul instead of 64-bit makes no difference at all. > > I'll review the patches again for correctness, but so far they look > good, and it doesn't look like these are things we'd want to back out > or rewrite for subsequent improvements anyway. > > Thanks! One more trivial change I might do: since the non-rep-stosq path is faster for small sizes, changing the jb 1f to jbe 1f significantly improves 16-byte memsets with no additional code changes. Rich