mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: Re: Thinking about release
Date: Thu, 11 Jul 2013 08:46:13 -0400	[thread overview]
Message-ID: <20130711124613.GO29800@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAPfzE3aoD4mpO9RrV-enuXxkCvMPY_7rEE6e9w8NuX-ntEqtqA@mail.gmail.com>

On Thu, Jul 11, 2013 at 05:10:41PM +1200, Andre Renaud wrote:
> > I can't see any obvious reason why this shouldn't work, although the
> > assembler as it stands makes pretty heavy use of all the registers,
> > and I can't immediately see how to rework it to free up 2 more (I can
> > free up 1 by dropping the attempted preload). Given my (lack of)
> > skills with ARM assembler, I'm not sure I'll be able to look too
> > deeply into either of these options, but I'll have a go at the inline
> > ASM version to force 8*4byte loads to see if it improves things.
> 
> I've given it a bit of a go, and at first it appears to be working
> (although I don't exactly have a comprehensive test suite, so this is
> very preliminary). Anyone with some more ARM assembler experience is
> welcome to chip in with a comment.
> 
> I also managed to mess up my last set of benchmarking - I'd indicated
> that I got 65 vs 95 vs 105, however I'd stuffed up the fact that the
> first call would have poor cache performance. Once I corrected that
> the results have become more like 65(naive) vs 105(typedef) vs
> 113(asm).
> 
> Using the below code, it becomes 65(naive), 113(inline asm), 113(full
> asm). So the inline is able to do perform as we'd expect. Assuming
> that it is technically correct (which is probably the biggest
> question).

It's not.

> #define SS (8 * 4)
> #define ALIGN (SS - 1)
> void * noinline my_asm_memcpy(void * restrict dest, const void *
> restrict src, size_t n)
> {
>     unsigned char *d = dest;
>     const unsigned char *s = src;
> 
>     if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN))
>         goto misaligned;
> 
>     for (; ((uintptr_t)d & ALIGN) && n; n--) *d++ = *s++;
>     if (n) {
>         for (; n>=SS; n-= SS) {
>                 __asm__("ldmia %0, {r4-r11}"
>                                 : "=r" (s)
>                                 : "0" (s)
>                                 : "r4", "r5", "r6", "r7", "r8", "r9",
> "r10", "r11");
>                 s+=SS;
>                 __asm__("stmia %0, {r4-r11}"
>                                 : "=r" (d)
>                                 :"0" (d));
>                 d+=SS;

You need both instructions in the same asm block, and proper
constraints. As it is, whether the registers keep their values between
the two separate asm blocks is up to the compiler's whims.

With the proper constraints ("+r" type), the s+=SS and d+=SS are
unnecessary, as a bonus. Also there's no reason to force alignment to
SS for this loop; that will simply prevent it from being used as much
for smaller copies. I would use SS==sizeof(size_t) and then write 8*SS
in the for loop.

Last night I was in the process of writing something very similar, but
I put the for loop in asm too and didn't finish it. If it performs
just as well with the loop in C, I like your version better.

Rich


  reply	other threads:[~2013-07-11 12:46 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  1:25 Rich Felker
2013-06-13  1:33 ` Andre Renaud
2013-06-13  1:43   ` Rich Felker
2013-07-09  5:06     ` Andre Renaud
2013-07-09  5:37       ` Rich Felker
2013-07-09  6:24         ` Harald Becker
2013-07-09 21:28         ` Andre Renaud
2013-07-09 22:26           ` Andre Renaud
2013-07-10  6:42             ` Jens Gustedt
2013-07-10  7:50               ` Rich Felker
2013-07-10 22:44             ` Andre Renaud
2013-07-11  3:37               ` Rich Felker
2013-07-11  4:04                 ` Andre Renaud
2013-07-11  5:10                   ` Andre Renaud
2013-07-11 12:46                     ` Rich Felker [this message]
2013-07-11 22:34                       ` Andre Renaud
2013-07-12  3:16                         ` Rich Felker
2013-07-12  3:36                           ` Andre Renaud
2013-07-12  4:16                             ` Rich Felker
2013-07-24  1:34                               ` Andre Renaud
2013-07-24  3:48                                 ` Rich Felker
2013-07-24  4:40                                   ` Andre Renaud
2013-07-28  8:09                                     ` Rich Felker
2013-07-11  5:27                 ` Daniel Cegiełka
2013-07-11 12:49                   ` Rich Felker
2013-07-15  4:25                 ` Rob Landley
2013-07-10 19:42           ` Rich Felker
2013-07-14  6:37             ` Rob Landley
2013-07-11  4:30           ` Strake
2013-07-11  4:33             ` Rich Felker
2013-07-10 19:38         ` Rob Landley
2013-07-10 20:34           ` Andre Renaud
2013-07-10 20:49             ` Nathan McSween
2013-07-10 21:01             ` Rich Felker
2013-06-13 15:46 ` Isaac
2013-06-26  1:44 ` Rich Felker
2013-06-26 10:19   ` Szabolcs Nagy
2013-06-26 14:21     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130711124613.GO29800@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).