mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: ARM memcpy post-0.9.12-release thread
Date: Tue, 30 Jul 2013 22:26:31 -0400	[thread overview]
Message-ID: <20130731022631.GA6655@brightrain.aerifal.cx> (raw)

Hi all (especially Andre),

I've been doing some experimenting with ARM memcpy, and I have not
found any way to beat the Bionic asm file for misaligned copies. The
best I could do with simple inline asm (reading multi-words and
writing byte-at-a-time or vice versa) improved the performance nearly
40% compared to musl's current code, but it was still worse than half
the speed of the Bionic asm.

For the aligned case, however, as I've said before, the Bionic code
runs 10% slower for me than the C-with-inline-asm I posted to the
list. Commenting out the prefetch code in the Bionic version brings
the performance up to the same as my version.

I also found that the Bionic code was mysteriously crashing on the
real system I test on (it worked on my toolchain with qemu). On
further investigation, the test system's toolchain had -mthumb (with
thumb2) as the default; adding -marm made it work. Both ways the asm
was being interpreted as arm; the problem was that the *calling* code
being thumb broke it. The solution was adding .type memcpy,%function
to the asm file. Without that, the linker cannot know that the symbol
it's resolving is a function name and thus that it has to adjust the
low bit of the relocated address as a flag for whether the code is arm
or thumb. I've now got the code working reliably it seems.

Sizes so far:
Current C code: 260 bytes
My best-attempt inline asm: 352 bytes
Bionic (with prefetch removed): 764 bytes

Obviously the Bionic code is a bit larger than the others and than I'd
like it to be, but it looks really hard to trim it down without
ruining performance for misaligned copies; roughly half of the asm
covers the misaligned case, which is expensive because you have three
different code paths for different ways it can be off mod 4.

One other issue we have to consider if we go with the Bionic code is
that we'd need to add sub-arch asm dirs to use it. As-is, the code is
hard-coded for little endian. It will shuffle the byte order badly
when copying on a big endian machine.

Some rough times (128k copy repeated 10000 times):

Aligned case:
Current C code: 1.2s
My best-attempt C code: 0.75s
My best-attempt inline asm: 0.57s
Bionic asm: 0.63s
Bionic asm without prefetch: 0.57s

Misaligned case:
Current C code: 4.7s
My best-attempt inline asm: 2.9s
Bionic asm: 1.1s

Rich


             reply	other threads:[~2013-07-31  2:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-31  2:26 Rich Felker [this message]
2013-07-31  3:13 ` Harald Becker
2013-07-31  3:23   ` Rich Felker
2013-07-31  4:18     ` Harald Becker
2013-07-31  6:13       ` Rich Felker
2013-08-02 20:41 ` Rich Felker
2013-08-02 22:03   ` Andre Renaud
2013-08-03  0:01     ` Rich Felker
2013-08-05 21:24     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130731022631.GA6655@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).