From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Subject: Draft of improved memset.s for i386
Date: Mon, 23 Feb 2015 20:09:52 -0500 [thread overview]
Message-ID: <20150224010952.GA10683@brightrain.aerifal.cx> (raw)
[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]
Here's a draft of an improved i386 memset.s based on the principles
Denys Vlasenko and I discussed on his and my x86_64 versions. Compared
to the current code, it reduces entry/exit overhead, increases the
length supported in the non-rep-stosl path, and aligns the rep-stosl.
My tests don't measure the misalignment penalty, but even in the
aligned case the rep-stosl path is slightly faster (~5 cycles per run,
out of at least 64 cycles and the non-rep-stosl path is significantly
faster (e.g. 33 vs 51 cycles at size 16 and 40 vs 57 at size 32).
Empirically the byte-register-access/left-shift method of extending
the fill value to a word performs better than imul for me, but the
margin is very small (at most 1 cycle). Since we support much older
cpus (like actual 486) where imul could be really slow, I think this
is the right approach in principle too. I used imul in the rep-stosl
path but haven't tested whether it's faster there.
The non-rep-stosl path only goes up to size 62. I think sizes up to
126 could benefit from it, but the string of stores was getting really
long.
Correctness has not been tested so there may be stupid bugs.
Rich
[-- Attachment #2: memset-draft.s --]
[-- Type: text/plain, Size: 1091 bytes --]
.global memset
.type memset,@function
memset:
mov 12(%esp),%ecx
cmp $62,%ecx
ja 2f
movzbl 8(%esp),%edx
mov 4(%esp),%eax
test %ecx,%ecx
jz 1f
mov %dl,%dh
mov %dl,(%eax)
mov %dl,-1(%eax,%ecx)
cmp $2,%ecx
jbe 1f
mov %dx,1(%eax)
mov %dx,(-1-2)(%eax,%ecx)
cmp $6,%ecx
jbe 1f
shl $8,%edx
mov %dh,%dl
shl $8,%edx
mov %dh,%dl
mov %edx,(1+2)(%eax)
mov %edx,(-1-2-4)(%eax,%ecx)
cmp $14,%ecx
jbe 1f
mov %edx,(1+2+4)(%eax)
mov %edx,(1+2+4+4)(%eax)
mov %edx,(-1-2-4-8)(%eax,%ecx)
mov %edx,(-1-2-4-4)(%eax,%ecx)
cmp $30,%ecx
jbe 1f
mov %edx,(1+2+4+8)(%eax)
mov %edx,(1+2+4+8+4)(%eax)
mov %edx,(1+2+4+8+8)(%eax)
mov %edx,(1+2+4+8+12)(%eax)
mov %edx,(-1-2-4-8-16)(%eax,%ecx)
mov %edx,(-1-2-4-8-12)(%eax,%ecx)
mov %edx,(-1-2-4-8-8)(%eax,%ecx)
mov %edx,(-1-2-4-8-4)(%eax,%ecx)
1: ret
2: mov %edi,12(%esp)
movzbl 8(%esp),%eax
mov $0x01010101,%edx
imul %edx,%eax
mov %ecx,%edx
lea -5(%ecx),%ecx
mov 4(%esp),%edi
shr $2, %ecx
mov %eax,(%edi)
mov %eax,-8(%edi,%edx)
mov %eax,-4(%edi,%edx)
add $4,%edi
and $-4,%edi
rep
stosl
mov 4(%esp),%eax
ret
next reply other threads:[~2015-02-24 1:09 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-24 1:09 Rich Felker [this message]
2015-02-24 3:02 ` Denys Vlasenko
2015-02-24 3:06 ` Denys Vlasenko
2015-02-24 3:18 ` Rich Felker
2015-02-24 5:36 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150224010952.GA10683@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=musl@lists.openwall.com \
--cc=vda.linux@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).