mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Andre Renaud <andre@bluewatersys.com>
To: musl@lists.openwall.com
Subject: Re: Thinking about release
Date: Wed, 10 Jul 2013 10:26:46 +1200	[thread overview]
Message-ID: <CAPfzE3ZsMpC9d4VDZyHabhKOffOQW0dnG7Nwpm8EqVBLUXNZKg@mail.gmail.com> (raw)
In-Reply-To: <CAPfzE3ZTxynUeJjq7KWijZGhsV==NymW4vqLhnQbEYCXRxVf-g@mail.gmail.com>

Replying to myself

> Certainly if there was a more straight forward C implementation that
> achieved similar results that would be superior. However the existing
> musl C memcpy code is already optimised to some degree (doing 32-bit
> rather than 8-bit copies), and it is difficult to convince gcc to use
> the load-multiple & store-multiple instructions via C code I've found,
> without resorting to pretty horrible C code. It may still be
> preferable to the assembler though. At this stage I haven't
> benchmarked this - I'll see if I can come up with something.

As a comparison, the existing memcpy.c implementation tries to copy
sizeof(size_t) bytes at a time, which on ARM is 4. This ends up being
a standard load/store. However GCC is smart enough to know that it can
use ldm/stm instructions for copying structures > 4 bytes. So if we
change memcpy.c to use a structure whose size is > 4 (ie: 16), instead
of size_t for it's basic copy unit, we do see some improvements:

typedef struct multiple_size_t {
    size_t d[4];
} multiple_size_t;

#define SS (sizeof(multiple_size_t))
#define ALIGN (sizeof(multiple_size_t)-1)

void *my_memcpy(void * restrict dest, const void * restrict src, size_t n)
{
    unsigned char *d = dest;
    const unsigned char *s = src;

    if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN))
        goto misaligned;

    for (; ((uintptr_t)d & ALIGN) && n; n--) *d++ = *s++;
    if (n) {
        multiple_size_t *wd = (void *)d;
        const struct multiple_size_t *ws = (const void *)s;

        for (; n>=SS; n-=SS) *wd++ = *ws++;

        d = (void *)wd;
        s = (const void *)ws;
misaligned:
        for (; n; n--) *d++ = *s++;
    }
    return dest;

}

This results in 95MB/s on my platform (up from 65MB/s for the existing
memcpy.c, and down from 105MB/s with the asm optimised version). It is
essentially identically readable to the existing memcpy.c. I'm not
really famiilar with any other cpu architectures, so I'm not sure if
this would improve, or hurt, performance on other platforms.

Any comments on using something like this for memcpy instead?
Obviously this gives you a higher penalty if the size of the area to
be copied is between sizeof(size_t) and sizeof(multiple_size_t).

Regards,
Andre


  reply	other threads:[~2013-07-09 22:26 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  1:25 Rich Felker
2013-06-13  1:33 ` Andre Renaud
2013-06-13  1:43   ` Rich Felker
2013-07-09  5:06     ` Andre Renaud
2013-07-09  5:37       ` Rich Felker
2013-07-09  6:24         ` Harald Becker
2013-07-09 21:28         ` Andre Renaud
2013-07-09 22:26           ` Andre Renaud [this message]
2013-07-10  6:42             ` Jens Gustedt
2013-07-10  7:50               ` Rich Felker
2013-07-10 22:44             ` Andre Renaud
2013-07-11  3:37               ` Rich Felker
2013-07-11  4:04                 ` Andre Renaud
2013-07-11  5:10                   ` Andre Renaud
2013-07-11 12:46                     ` Rich Felker
2013-07-11 22:34                       ` Andre Renaud
2013-07-12  3:16                         ` Rich Felker
2013-07-12  3:36                           ` Andre Renaud
2013-07-12  4:16                             ` Rich Felker
2013-07-24  1:34                               ` Andre Renaud
2013-07-24  3:48                                 ` Rich Felker
2013-07-24  4:40                                   ` Andre Renaud
2013-07-28  8:09                                     ` Rich Felker
2013-07-11  5:27                 ` Daniel Cegiełka
2013-07-11 12:49                   ` Rich Felker
2013-07-15  4:25                 ` Rob Landley
2013-07-10 19:42           ` Rich Felker
2013-07-14  6:37             ` Rob Landley
2013-07-11  4:30           ` Strake
2013-07-11  4:33             ` Rich Felker
2013-07-10 19:38         ` Rob Landley
2013-07-10 20:34           ` Andre Renaud
2013-07-10 20:49             ` Nathan McSween
2013-07-10 21:01             ` Rich Felker
2013-06-13 15:46 ` Isaac
2013-06-26  1:44 ` Rich Felker
2013-06-26 10:19   ` Szabolcs Nagy
2013-06-26 14:21     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPfzE3ZsMpC9d4VDZyHabhKOffOQW0dnG7Nwpm8EqVBLUXNZKg@mail.gmail.com \
    --to=andre@bluewatersys.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).