From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6972 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations Date: Tue, 10 Feb 2015 16:37:56 -0500 Message-ID: <20150210213756.GM23507@brightrain.aerifal.cx> References: <1423589457-8407-1-git-send-email-vda.linux@googlemail.com> <20150210205047.GK23507@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1423604291 2068 80.91.229.3 (10 Feb 2015 21:38:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 10 Feb 2015 21:38:11 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-6985-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 10 22:38:11 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YLIVX-000383-B4 for gllmg-musl@m.gmane.org; Tue, 10 Feb 2015 22:38:11 +0100 Original-Received: (qmail 17804 invoked by uid 550); 10 Feb 2015 21:38:09 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 17793 invoked from network); 10 Feb 2015 21:38:08 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:6972 Archived-At: On Tue, Feb 10, 2015 at 10:08:29PM +0100, Denys Vlasenko wrote: > On Tue, Feb 10, 2015 at 9:50 PM, Rich Felker wrote: > > On Tue, Feb 10, 2015 at 06:30:56PM +0100, Denys Vlasenko wrote: > >> "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use > >> 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead. > >> [...] > > > > Do you want to go ahead with these patches as-is, or consider some of > > the other ideas we discussed off-list like avoiding the 64-bit imul > > entirely in the small-n case? If you think that's easy as another > > incremental change I'll go ahead with these > > I think you can apply these patches without waiting > for potential future improvements. OK. Based on some casual testing on my Celeron 847: - For small sizes, your patches make significant improvement, 20-30%. - For rep stosq path, the improvement is minimal (roughly 1-2 cycles). - Using 32-bit imul instead of 64-bit makes no difference at all. I'll review the patches again for correctness, but so far they look good, and it doesn't look like these are things we'd want to back out or rewrite for subsequent improvements anyway. Thanks! Rich