Re: Optimized C memcpy - Andrew Bradford

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Andrew Bradford <andrew@bradfordembedded.com>
To: musl@lists.openwall.com
Subject: Re: Optimized C memcpy
Date: Thu, 08 Aug 2013 08:59:19 -0400	[thread overview]
Message-ID: <1375966759.14128.7503991.4EB49755@webmail.messagingengine.com> (raw)
In-Reply-To: <20130807182123.GA17670@brightrain.aerifal.cx>

On Wed, Aug 7, 2013, at 02:21 PM, Rich Felker wrote:
> Attached is the latest version of my "pure C" (modulo aliasing issues)
> memcpy implementation. Compiled with -O3 on arm, it matches the
> performance of the assembly language memcpy from Bionic for aligned
> copies, and is only 25% slower than the asm for misaligned copies. And
> it's only mildly larger. It uses the same principle as the Bionic
> code: large block copies as aligned 32-bit units for aligned copies,
> and aligned-load, bitshift-then-or, aligned-store for misaligned
> copies. This should, in principle, work well on typical risc archs
> that have plenty of registers but no misaligned load or store support.
> 
> Unfortunately it only works on little-endian (I haven't though much
> yet about how it could be adapted to big-endian), but testing it on
> qemu-ppc with the endian check disabled (thus wrong behavior)
> suggested that this approach would work well on there too if we could
> adapt it. Of course tests under qemu are not worth much; the ARM tests
> were on real hardware and I'd like to see real-hardware results for
> others archs (mipsel?) too.
> 
> This is not a replacement for the ARM asm (which is still better), but
> it's a step towards avoiding the need to have written-by-hand assembly
> for every single new arch we add as a prerequisite for tolerable
> performance.

Sorry if this has been discussed before but Google isn't much help.  Why
is 32 bytes chosen as the block size over other sizes?

It seems that the code would be fewer lines if blocks were 4 bytes,
hence easier to read, verify, and understand.  What's the performance
penalty -- I assume there has to be one -- that I'm not understanding
which drives the choice of 32 byte blocks?

Thanks,
Andrew

next prev parent reply	other threads:[~2013-08-08 12:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-07 18:21 Rich Felker
2013-08-08 12:59 ` Andrew Bradford [this message]
2013-08-08 13:03   ` Andrew Bradford
2013-08-08 13:17     ` Luca Barbato
2013-08-08 15:15     ` Rich Felker
2013-08-08 20:17       ` Andre Renaud
2013-08-08 20:26         ` Rich Felker
2013-08-09  5:02 ` Rob Landley
2013-08-11  5:11 ` Optimized C memcpy [updated] Rich Felker
2013-08-11  6:20   ` Rich Felker
2013-08-11  8:13     ` Rich Felker
2013-08-11 11:14       ` Luca Barbato
2013-08-11 11:27         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1375966759.14128.7503991.4EB49755@webmail.messagingengine.com \
    --to=andrew@bradfordembedded.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).