mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Subject: Draft of improved memset.s for i386
Date: Mon, 23 Feb 2015 20:09:52 -0500	[thread overview]
Message-ID: <20150224010952.GA10683@brightrain.aerifal.cx> (raw)

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

Here's a draft of an improved i386 memset.s based on the principles
Denys Vlasenko and I discussed on his and my x86_64 versions. Compared
to the current code, it reduces entry/exit overhead, increases the
length supported in the non-rep-stosl path, and aligns the rep-stosl.

My tests don't measure the misalignment penalty, but even in the
aligned case the rep-stosl path is slightly faster (~5 cycles per run,
out of at least 64 cycles and the non-rep-stosl path is significantly
faster (e.g. 33 vs 51 cycles at size 16 and 40 vs 57 at size 32).

Empirically the byte-register-access/left-shift method of extending
the fill value to a word performs better than imul for me, but the
margin is very small (at most 1 cycle). Since we support much older
cpus (like actual 486) where imul could be really slow, I think this
is the right approach in principle too. I used imul in the rep-stosl
path but haven't tested whether it's faster there.

The non-rep-stosl path only goes up to size 62. I think sizes up to
126 could benefit from it, but the string of stores was getting really
long.

Correctness has not been tested so there may be stupid bugs.

Rich

[-- Attachment #2: memset-draft.s --]
[-- Type: text/plain, Size: 1091 bytes --]

.global memset
.type memset,@function
memset:
	mov 12(%esp),%ecx
	cmp $62,%ecx
	ja 2f

	movzbl 8(%esp),%edx
	mov 4(%esp),%eax
	test %ecx,%ecx
	jz 1f

	mov %dl,%dh

	mov %dl,(%eax)
	mov %dl,-1(%eax,%ecx)
	cmp $2,%ecx
	jbe 1f

	mov %dx,1(%eax)
	mov %dx,(-1-2)(%eax,%ecx)
	cmp $6,%ecx
	jbe 1f

	shl $8,%edx
	mov %dh,%dl
	shl $8,%edx
	mov %dh,%dl

	mov %edx,(1+2)(%eax)
	mov %edx,(-1-2-4)(%eax,%ecx)
	cmp $14,%ecx
	jbe 1f

	mov %edx,(1+2+4)(%eax)
	mov %edx,(1+2+4+4)(%eax)
	mov %edx,(-1-2-4-8)(%eax,%ecx)
	mov %edx,(-1-2-4-4)(%eax,%ecx)
	cmp $30,%ecx
	jbe 1f

	mov %edx,(1+2+4+8)(%eax)
	mov %edx,(1+2+4+8+4)(%eax)
	mov %edx,(1+2+4+8+8)(%eax)
	mov %edx,(1+2+4+8+12)(%eax)
	mov %edx,(-1-2-4-8-16)(%eax,%ecx)
	mov %edx,(-1-2-4-8-12)(%eax,%ecx)
	mov %edx,(-1-2-4-8-8)(%eax,%ecx)
	mov %edx,(-1-2-4-8-4)(%eax,%ecx)

1:	ret 	

2:	mov %edi,12(%esp)
	movzbl 8(%esp),%eax
	mov $0x01010101,%edx
	imul %edx,%eax

	mov %ecx,%edx
	lea -5(%ecx),%ecx
	mov 4(%esp),%edi
	shr $2, %ecx

	mov %eax,(%edi)
	mov %eax,-8(%edi,%edx)
	mov %eax,-4(%edi,%edx)
	add $4,%edi
	and $-4,%edi
	rep
	stosl
	mov 4(%esp),%eax
	ret

             reply	other threads:[~2015-02-24  1:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24  1:09 Rich Felker [this message]
2015-02-24  3:02 ` Denys Vlasenko
2015-02-24  3:06   ` Denys Vlasenko
2015-02-24  3:18     ` Rich Felker
2015-02-24  5:36       ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150224010952.GA10683@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=vda.linux@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).