From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6963 Path: news.gmane.org!not-for-mail From: Denys Vlasenko Newsgroups: gmane.linux.lib.musl.general Subject: [PATCH 1/2] x86_64/memset: simple optimizations Date: Tue, 10 Feb 2015 18:30:56 +0100 Message-ID: <1423589457-8407-1-git-send-email-vda.linux@googlemail.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1423589477 24831 80.91.229.3 (10 Feb 2015 17:31:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 10 Feb 2015 17:31:17 +0000 (UTC) Cc: Denys Vlasenko To: musl@lists.openwall.com Original-X-From: musl-return-6976-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 10 18:31:17 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YLEea-0005eC-Mg for gllmg-musl@m.gmane.org; Tue, 10 Feb 2015 18:31:16 +0100 Original-Received: (qmail 7992 invoked by uid 550); 10 Feb 2015 17:31:15 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 7979 invoked from network); 10 Feb 2015 17:31:14 -0000 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 Xref: news.gmane.org gmane.linux.lib.musl.general:6963 Archived-At: "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead. 64-bit imul is slow, move it as far up as possible so that the result (rax) has more time to be ready by the time we start using it in mem stores. There is no need to shuffle registers in preparation to "rep movs" if we are not going to take that code path. Thus, patch moves "jump if len < 16" instructions up, and changes alternate code path to use rdx and rdi instead of rcx and r8. Signed-off-by: Denys Vlasenko --- src/string/x86_64/memset.s | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/src/string/x86_64/memset.s b/src/string/x86_64/memset.s index fc06eef..263336b 100644 --- a/src/string/x86_64/memset.s +++ b/src/string/x86_64/memset.s @@ -1,41 +1,43 @@ .global memset .type memset,@function memset: - and $0xff,%esi + movzbl %sil,%esi mov $0x101010101010101,%rax - mov %rdx,%rcx - mov %rdi,%r8 + # 64-bit imul has 3-7 cycles latency, launch early imul %rsi,%rax - cmp $16,%rcx + + cmp $16,%rdx jb 1f - mov %rax,-8(%rdi,%rcx) + mov %rdx,%rcx + mov %rdi,%r8 shr $3,%rcx + mov %rax,-8(%rdi,%rdx) rep stosq mov %r8,%rax ret -1: test %ecx,%ecx +1: test %edx,%edx jz 1f mov %al,(%rdi) - mov %al,-1(%rdi,%rcx) - cmp $2,%ecx + mov %al,-1(%rdi,%rdx) + cmp $2,%edx jbe 1f mov %al,1(%rdi) - mov %al,-2(%rdi,%rcx) - cmp $4,%ecx + mov %al,-2(%rdi,%rdx) + cmp $4,%edx jbe 1f mov %eax,(%rdi) - mov %eax,-4(%rdi,%rcx) - cmp $8,%ecx + mov %eax,-4(%rdi,%rdx) + cmp $8,%edx jbe 1f mov %eax,4(%rdi) - mov %eax,-8(%rdi,%rcx) + mov %eax,-8(%rdi,%rdx) -1: mov %r8,%rax +1: mov %rdi,%rax ret -- 1.8.1.4