Re: musl libc, memcpy

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@aerifal.cx>
To: Kim Walisch <kim.walisch@gmail.com>
Cc: musl@lists.openwall.com
Subject: Re: musl libc, memcpy
Date: Mon, 30 Jul 2012 16:41:00 -0400	[thread overview]
Message-ID: <20120730204100.GY544@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAE_DJ110cVdicu-wPe_Ndeg9ih+g3AXZ_hNoGgX+DftJm6q=mA@mail.gmail.com>

Hi,

I'm replying with the list CC'd so others can comment too. Sorry I
haven't gotten a chance to try this code or review it in detail yet.
What follows is a short initial commentary but I'll give it some more
attention soon.

On Sun, Jul 29, 2012 at 11:41:47AM +0200, Kim Walisch wrote:
> Hi,
> 
> I have been reading through several libc implementations on the
> internet for the past days and for fun I have written a fast yet
> portable memcpy implementation. It uses more code than your
> implementation but I do not think it is bloated. Some quick benchmarks
> that I ran on my Intel Core-i5 670 3.46GHz  (Red Hat 6.2 x86_64)
> indicate that my implemenation runs about 50 percent faster than yours
> for aligned data and up to 10 times faster for unaligned data using
> gcc-4.7. The Intel C compiler even vectorizes the main copying loop
> using SSE instructions (if compiled with icc -O2 -xHost) which gives a
> performance better than glibc's memcpy on my system. I would be happy
> to hear your opinion about my memcpy implementation.

I'd like to know what block sizes you were looking at, because for
memcpy that makes all the difference in the world:

For very small blocks (down to 1 byte), performance will be dominated
by conditional branches picking what to do.

For very large blocks (larger than cache), performance will be
memory-bound and even byte-at-a-time copying might be competitive.

Theoretically, there's only a fairly small range of sizes where the
algorithm used matters a lot.

> /* CPU architectures that support fast unaligned memory access */
> #if defined(__i386) || defined(__x86_64)
> # define UNALIGNED_MEMORY_ACCESS
> #endif

I don't think this is necessary or useful. If we want better
performance on these archs, a tiny asm file that does almost nothing
but "rep movsd" is known to be the fastest solution on 32-bit x86, and
is at least the second-fastest on 64-bit, with the faster solutions
not being available on all cpus. On pretty much all other archs,
unaligned access is illegal.

> static void *internal_memcpy_uintptr(void *dest, const void *src, size_t n)
> {
> 	char *d = (char*) dest;
> 	const char *s = (const char*) src;
> 	size_t bytes_iteration = sizeof(uintptr_t) * 8;
> 
> 	while (n >= bytes_iteration)
> 	{
> 		((uintptr_t*)d)[0] = ((const uintptr_t*)s)[0];
> 		((uintptr_t*)d)[1] = ((const uintptr_t*)s)[1];
> 		((uintptr_t*)d)[2] = ((const uintptr_t*)s)[2];
> 		((uintptr_t*)d)[3] = ((const uintptr_t*)s)[3];
> 		((uintptr_t*)d)[4] = ((const uintptr_t*)s)[4];
> 		((uintptr_t*)d)[5] = ((const uintptr_t*)s)[5];
> 		((uintptr_t*)d)[6] = ((const uintptr_t*)s)[6];
> 		((uintptr_t*)d)[7] = ((const uintptr_t*)s)[7];
> 		d += bytes_iteration;
> 		s += bytes_iteration;
> 		n -= bytes_iteration;
> 	}

This is just manual loop unrolling, no? GCC should do the equivalent
if you ask it to aggressively unroll loops, including the
vectorization; if not, that seems like a GCC bug.

Rich

next      parent reply	other threads:[~2012-07-30 20:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAE_DJ110cVdicu-wPe_Ndeg9ih+g3AXZ_hNoGgX+DftJm6q=mA@mail.gmail.com>
2012-07-30 20:41 ` Rich Felker [this message]
2012-07-31  1:25   ` Luca Barbato
     [not found]   ` <CAE_DJ1328EfKQp6t33bq0k+9Hbo0Fvu6dhvO2OOePcx2xa3QeQ@mail.gmail.com>
2012-08-01  4:27     ` Rich Felker
2012-08-01  5:40       ` Rich Felker
2012-08-01  6:19         ` Rich Felker
2012-08-03 23:22           ` John Spencer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120730204100.GY544@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=kim.walisch@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).