mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Andre Renaud <andre@bluewatersys.com>
To: musl@lists.openwall.com
Subject: Re: Thinking about release
Date: Fri, 12 Jul 2013 10:34:31 +1200	[thread overview]
Message-ID: <CAPfzE3aKG7JYE_u3oVDfkF2xDSdhzdrY3ui-H0bUduQXUOQ6Vg@mail.gmail.com> (raw)
In-Reply-To: <20130711124613.GO29800@brightrain.aerifal.cx>

Hi Rich,

> You need both instructions in the same asm block, and proper
> constraints. As it is, whether the registers keep their values between
> the two separate asm blocks is up to the compiler's whims.
>
> With the proper constraints ("+r" type), the s+=SS and d+=SS are
> unnecessary, as a bonus. Also there's no reason to force alignment to
> SS for this loop; that will simply prevent it from being used as much
> for smaller copies. I would use SS==sizeof(size_t) and then write 8*SS
> in the for loop.
>
> Last night I was in the process of writing something very similar, but
> I put the for loop in asm too and didn't finish it. If it performs
> just as well with the loop in C, I like your version better.

I've rejiggled it a bit, and it appears to be working. I wasn't
entirely sure what you meant about the proper constraints. There is an
additional reason why 8*4 was used for the align - to force the whole
loop to work in cache-line blocks. I've now done this explicitly on
the lead-in by doing the first few copies as 32-bit, then going to the
full cache-line asm. This has the same performance as the fully native
assembler. However to get that I had to use the same trick that the
native assembler uses - doing a load of the next block prior to
storing this one. I'm a bit concerned that this would mean we'd be
doing a read that was out of bounds, and I can't entirely see why this
wouldn't be happening with the existing assembler (but I'm presuming
it doesn't). Any comments on this side of it?

#define SS sizeof(size_t)
#define ALIGN (SS - 1)
void * noinline my_asm_memcpy(void * restrict dest, const void *
restrict src, size_t n)
{
    unsigned char *d = dest;
    const unsigned char *s = src;

    if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN))
        goto misaligned;

    /* ARM has 32-byte cache lines, so get us aligned to that */
    for (; ((uintptr_t)d & ((8 * SS) - 1)) && n; n-=SS) {
            *(size_t *)d = *(size_t *)s;
            d += SS;
            s+= SS;
    }
    /* Do full cache line read/writes */
    if (n) {
        for (; n>=(8 * SS); n-= (8 * SS)) {
                __asm__ (
                        "ldmia %0, {r4-r11}\n"
                        "add %0, %0, %4\n"
                        "bic r12, %0, %5\n"
                        "ldrhi r12, [%0]\n"
                        "stmia %1, {r4-r11}\n"
                        "add %1, %1, %4"
                        : "=r"(s), "=r"(d)
                        : "0"(s), "1"(d), "i"(8 * SS), "i"((8 * SS) - 1)
                        : "r4", "r5", "r6", "r7", "r8",
                          "r9", "r10", "r11", "r12");
        }

misaligned:
        for (; n; n--) *d++ = *s++;
    }
    return dest;

}

Regards,
Andre


  reply	other threads:[~2013-07-11 22:34 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  1:25 Rich Felker
2013-06-13  1:33 ` Andre Renaud
2013-06-13  1:43   ` Rich Felker
2013-07-09  5:06     ` Andre Renaud
2013-07-09  5:37       ` Rich Felker
2013-07-09  6:24         ` Harald Becker
2013-07-09 21:28         ` Andre Renaud
2013-07-09 22:26           ` Andre Renaud
2013-07-10  6:42             ` Jens Gustedt
2013-07-10  7:50               ` Rich Felker
2013-07-10 22:44             ` Andre Renaud
2013-07-11  3:37               ` Rich Felker
2013-07-11  4:04                 ` Andre Renaud
2013-07-11  5:10                   ` Andre Renaud
2013-07-11 12:46                     ` Rich Felker
2013-07-11 22:34                       ` Andre Renaud [this message]
2013-07-12  3:16                         ` Rich Felker
2013-07-12  3:36                           ` Andre Renaud
2013-07-12  4:16                             ` Rich Felker
2013-07-24  1:34                               ` Andre Renaud
2013-07-24  3:48                                 ` Rich Felker
2013-07-24  4:40                                   ` Andre Renaud
2013-07-28  8:09                                     ` Rich Felker
2013-07-11  5:27                 ` Daniel Cegiełka
2013-07-11 12:49                   ` Rich Felker
2013-07-15  4:25                 ` Rob Landley
2013-07-10 19:42           ` Rich Felker
2013-07-14  6:37             ` Rob Landley
2013-07-11  4:30           ` Strake
2013-07-11  4:33             ` Rich Felker
2013-07-10 19:38         ` Rob Landley
2013-07-10 20:34           ` Andre Renaud
2013-07-10 20:49             ` Nathan McSween
2013-07-10 21:01             ` Rich Felker
2013-06-13 15:46 ` Isaac
2013-06-26  1:44 ` Rich Felker
2013-06-26 10:19   ` Szabolcs Nagy
2013-06-26 14:21     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPfzE3aKG7JYE_u3oVDfkF2xDSdhzdrY3ui-H0bUduQXUOQ6Vg@mail.gmail.com \
    --to=andre@bluewatersys.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).