From: Andre Renaud <andre@bluewatersys.com>
To: musl@lists.openwall.com
Subject: Re: Thinking about release
Date: Fri, 12 Jul 2013 10:34:31 +1200 [thread overview]
Message-ID: <CAPfzE3aKG7JYE_u3oVDfkF2xDSdhzdrY3ui-H0bUduQXUOQ6Vg@mail.gmail.com> (raw)
In-Reply-To: <20130711124613.GO29800@brightrain.aerifal.cx>
Hi Rich,
> You need both instructions in the same asm block, and proper
> constraints. As it is, whether the registers keep their values between
> the two separate asm blocks is up to the compiler's whims.
>
> With the proper constraints ("+r" type), the s+=SS and d+=SS are
> unnecessary, as a bonus. Also there's no reason to force alignment to
> SS for this loop; that will simply prevent it from being used as much
> for smaller copies. I would use SS==sizeof(size_t) and then write 8*SS
> in the for loop.
>
> Last night I was in the process of writing something very similar, but
> I put the for loop in asm too and didn't finish it. If it performs
> just as well with the loop in C, I like your version better.
I've rejiggled it a bit, and it appears to be working. I wasn't
entirely sure what you meant about the proper constraints. There is an
additional reason why 8*4 was used for the align - to force the whole
loop to work in cache-line blocks. I've now done this explicitly on
the lead-in by doing the first few copies as 32-bit, then going to the
full cache-line asm. This has the same performance as the fully native
assembler. However to get that I had to use the same trick that the
native assembler uses - doing a load of the next block prior to
storing this one. I'm a bit concerned that this would mean we'd be
doing a read that was out of bounds, and I can't entirely see why this
wouldn't be happening with the existing assembler (but I'm presuming
it doesn't). Any comments on this side of it?
#define SS sizeof(size_t)
#define ALIGN (SS - 1)
void * noinline my_asm_memcpy(void * restrict dest, const void *
restrict src, size_t n)
{
unsigned char *d = dest;
const unsigned char *s = src;
if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN))
goto misaligned;
/* ARM has 32-byte cache lines, so get us aligned to that */
for (; ((uintptr_t)d & ((8 * SS) - 1)) && n; n-=SS) {
*(size_t *)d = *(size_t *)s;
d += SS;
s+= SS;
}
/* Do full cache line read/writes */
if (n) {
for (; n>=(8 * SS); n-= (8 * SS)) {
__asm__ (
"ldmia %0, {r4-r11}\n"
"add %0, %0, %4\n"
"bic r12, %0, %5\n"
"ldrhi r12, [%0]\n"
"stmia %1, {r4-r11}\n"
"add %1, %1, %4"
: "=r"(s), "=r"(d)
: "0"(s), "1"(d), "i"(8 * SS), "i"((8 * SS) - 1)
: "r4", "r5", "r6", "r7", "r8",
"r9", "r10", "r11", "r12");
}
misaligned:
for (; n; n--) *d++ = *s++;
}
return dest;
}
Regards,
Andre
next prev parent reply other threads:[~2013-07-11 22:34 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-13 1:25 Rich Felker
2013-06-13 1:33 ` Andre Renaud
2013-06-13 1:43 ` Rich Felker
2013-07-09 5:06 ` Andre Renaud
2013-07-09 5:37 ` Rich Felker
2013-07-09 6:24 ` Harald Becker
2013-07-09 21:28 ` Andre Renaud
2013-07-09 22:26 ` Andre Renaud
2013-07-10 6:42 ` Jens Gustedt
2013-07-10 7:50 ` Rich Felker
2013-07-10 22:44 ` Andre Renaud
2013-07-11 3:37 ` Rich Felker
2013-07-11 4:04 ` Andre Renaud
2013-07-11 5:10 ` Andre Renaud
2013-07-11 12:46 ` Rich Felker
2013-07-11 22:34 ` Andre Renaud [this message]
2013-07-12 3:16 ` Rich Felker
2013-07-12 3:36 ` Andre Renaud
2013-07-12 4:16 ` Rich Felker
2013-07-24 1:34 ` Andre Renaud
2013-07-24 3:48 ` Rich Felker
2013-07-24 4:40 ` Andre Renaud
2013-07-28 8:09 ` Rich Felker
2013-07-11 5:27 ` Daniel Cegiełka
2013-07-11 12:49 ` Rich Felker
2013-07-15 4:25 ` Rob Landley
2013-07-10 19:42 ` Rich Felker
2013-07-14 6:37 ` Rob Landley
2013-07-11 4:30 ` Strake
2013-07-11 4:33 ` Rich Felker
2013-07-10 19:38 ` Rob Landley
2013-07-10 20:34 ` Andre Renaud
2013-07-10 20:49 ` Nathan McSween
2013-07-10 21:01 ` Rich Felker
2013-06-13 15:46 ` Isaac
2013-06-26 1:44 ` Rich Felker
2013-06-26 10:19 ` Szabolcs Nagy
2013-06-26 14:21 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPfzE3aKG7JYE_u3oVDfkF2xDSdhzdrY3ui-H0bUduQXUOQ6Vg@mail.gmail.com \
--to=andre@bluewatersys.com \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).