mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Denys Vlasenko <vda.linux@googlemail.com>
To: musl@lists.openwall.com, Rich Felker <dalias@libc.org>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Subject: [PATCH] x86_64/memset: use "small block" code for blocks up to 30 bytes long
Date: Fri, 13 Feb 2015 17:39:49 +0100	[thread overview]
Message-ID: <1423845589-5920-1-git-send-email-vda.linux@googlemail.com> (raw)

Before this change, we were using it only for 15-byte blocks and smaller.
Measurements on Sandy Bridge CPU show that "rep stosq" setup time
is high enough to dominate speed of fills well above that size:

31 byte block: 3.279282 bytes/ns
30 byte block: 3.173499 bytes/ns
..
20 byte block: 2.116552 bytes/ns
..
16 byte block: 1.799337 bytes/ns
15 byte block: 5.074332 bytes/ns
14 byte block: 4.736135 bytes/ns
13 byte block: 4.398852 bytes/ns
12 byte block: 4.060479 bytes/ns
11 byte block: 3.723065 bytes/ns
10 byte block: 3.384556 bytes/ns
 9 byte block: 2.867677 bytes/ns
 8 byte block: 2.257382 bytes/ns
 7 byte block: 1.975605 bytes/ns
 6 byte block: 1.693388 bytes/ns
 5 byte block: 1.411434 bytes/ns
 4 byte block: 1.129147 bytes/ns
 3 byte block: 0.847030 bytes/ns
 2 byte block: 0.616008 bytes/ns
 1 byte block: 0.308069 bytes/ns

The patch does not increase the number of branches, but is able to handle
blocks up to 30 bytes. After the patch, timings are:

32 byte block: 3.384681 bytes/ns
31 byte block: 3.279118 bytes/ns
30 byte block: 10.128968 bytes/ns
29 byte block: 9.793798 bytes/ns
28 byte block: 9.456081 bytes/ns
27 byte block: 9.120555 bytes/ns
26 byte block: 8.782757 bytes/ns
25 byte block: 8.446654 bytes/ns
24 byte block: 8.109310 bytes/ns
23 byte block: 7.773063 bytes/ns
22 byte block: 7.434663 bytes/ns
21 byte block: 7.098760 bytes/ns
20 byte block: 6.760724 bytes/ns
19 byte block: 6.424286 bytes/ns
18 byte block: 6.086166 bytes/ns
17 byte block: 5.749441 bytes/ns
16 byte block: 5.411120 bytes/ns
15 byte block: 5.074234 bytes/ns
14 byte block: 3.947913 bytes/ns
13 byte block: 3.666643 bytes/ns
12 byte block: 3.384641 bytes/ns
11 byte block: 3.103178 bytes/ns
10 byte block: 2.821105 bytes/ns
 9 byte block: 2.539481 bytes/ns
 8 byte block: 2.257338 bytes/ns
 7 byte block: 1.975530 bytes/ns
 6 byte block: 1.693337 bytes/ns
 5 byte block: 1.411388 bytes/ns
 4 byte block: 1.129111 bytes/ns
 3 byte block: 0.846994 bytes/ns
 2 byte block: 0.615982 bytes/ns
 1 byte block: 0.308056 bytes/ns

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
---
 src/string/x86_64/memset.s | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/src/string/x86_64/memset.s b/src/string/x86_64/memset.s
index ea61687..81adbb2 100644
--- a/src/string/x86_64/memset.s
+++ b/src/string/x86_64/memset.s
@@ -2,13 +2,13 @@
 .type memset,@function
 memset:
 	movzbq %sil,%rax
-	cmp $16,%rdx
-	jb .Less_than_16
-
 	test %esi,%esi
 	jnz .L_widen_rax  # unlikely
 .L_widened:
 
+	cmp $31,%rdx
+	jb .Less_than_31
+
 	mov %rdi,%r8
 
 	test $7,%dil
@@ -43,7 +43,7 @@ memset:
 	jmp .L_aligned
 
 
-.Less_than_16:
+.Less_than_31:
 	test %edx,%edx
 	jz .L_ret
 
@@ -52,20 +52,18 @@ memset:
 	cmp $2,%edx
 	jbe .L_ret
 
-	mov %al,1(%rdi)
-	mov %al,-2(%rdi,%rdx)
-	# 32-bit imul has 3-4 cycles latency
-	imul $0x1010101,%eax
-	cmp $4,%edx
+	mov %ax,1(%rdi)
+	mov %ax,(-1-2)(%rdi,%rdx)
+	cmp $6,%edx
 	jbe .L_ret
 
-	mov %eax,(%rdi)
-	mov %eax,-4(%rdi,%rdx)
-	cmp $8,%edx
+	mov %eax,(1+2)(%rdi)
+	mov %eax,(-1-2-4)(%rdi,%rdx)
+	cmp $14,%edx
 	jbe .L_ret
 
-	mov %eax,4(%rdi)
-	mov %eax,-8(%rdi,%rdx)
+	mov %rax,(1+2+4)(%rdi)
+	mov %rax,(-1-2-4-8)(%rdi,%rdx)
 .L_ret:
 	mov %rdi,%rax
 	ret
-- 
1.8.1.4



             reply	other threads:[~2015-02-13 16:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-13 16:39 Denys Vlasenko [this message]
2015-02-14 19:35 ` Rich Felker
2015-02-15  4:06   ` Rich Felker
2015-02-15 14:07     ` Denys Vlasenko
2015-02-15 15:03       ` Rich Felker
2015-02-15 21:44         ` Denys Vlasenko
2015-02-15 22:55           ` Rich Felker
2015-02-16 10:09             ` Denys Vlasenko
2015-02-16 15:12               ` Rich Felker
2015-02-16 17:36           ` Rich Felker
2015-02-17 13:08             ` Denys Vlasenko
2015-02-17 16:12               ` Rich Felker
2015-02-17 16:51                 ` Denys Vlasenko
2015-02-17 17:30                   ` Denys Vlasenko
2015-02-17 17:40                   ` Rich Felker
2015-02-17 18:53                     ` Denys Vlasenko
2015-02-17 21:12                       ` Rich Felker
2015-02-18  9:05                         ` Denys Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1423845589-5920-1-git-send-email-vda.linux@googlemail.com \
    --to=vda.linux@googlemail.com \
    --cc=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).