mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Andre Renaud <andre@bluewatersys.com>
To: musl@lists.openwall.com
Cc: Rich Felker <dalias@aerifal.cx>
Subject: Re: Thinking about release
Date: Thu, 11 Jul 2013 17:10:41 +1200	[thread overview]
Message-ID: <CAPfzE3aoD4mpO9RrV-enuXxkCvMPY_7rEE6e9w8NuX-ntEqtqA@mail.gmail.com> (raw)
In-Reply-To: <CAPfzE3ZMGwEvs2n_4LCKzMv0FROS55_1N+HdBw7HgNhexgM+eA@mail.gmail.com>

> I can't see any obvious reason why this shouldn't work, although the
> assembler as it stands makes pretty heavy use of all the registers,
> and I can't immediately see how to rework it to free up 2 more (I can
> free up 1 by dropping the attempted preload). Given my (lack of)
> skills with ARM assembler, I'm not sure I'll be able to look too
> deeply into either of these options, but I'll have a go at the inline
> ASM version to force 8*4byte loads to see if it improves things.

I've given it a bit of a go, and at first it appears to be working
(although I don't exactly have a comprehensive test suite, so this is
very preliminary). Anyone with some more ARM assembler experience is
welcome to chip in with a comment.

I also managed to mess up my last set of benchmarking - I'd indicated
that I got 65 vs 95 vs 105, however I'd stuffed up the fact that the
first call would have poor cache performance. Once I corrected that
the results have become more like 65(naive) vs 105(typedef) vs
113(asm).

Using the below code, it becomes 65(naive), 113(inline asm), 113(full
asm). So the inline is able to do perform as we'd expect. Assuming
that it is technically correct (which is probably the biggest
question).

#define SS (8 * 4)
#define ALIGN (SS - 1)
void * noinline my_asm_memcpy(void * restrict dest, const void *
restrict src, size_t n)
{
    unsigned char *d = dest;
    const unsigned char *s = src;

    if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN))
        goto misaligned;

    for (; ((uintptr_t)d & ALIGN) && n; n--) *d++ = *s++;
    if (n) {
        for (; n>=SS; n-= SS) {
                __asm__("ldmia %0, {r4-r11}"
                                : "=r" (s)
                                : "0" (s)
                                : "r4", "r5", "r6", "r7", "r8", "r9",
"r10", "r11");
                s+=SS;
                __asm__("stmia %0, {r4-r11}"
                                : "=r" (d)
                                :"0" (d));
                d+=SS;
        }

misaligned:
        for (; n; n--) *d++ = *s++;
    }
    return dest;
}

Regards,
Andre


  reply	other threads:[~2013-07-11  5:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  1:25 Rich Felker
2013-06-13  1:33 ` Andre Renaud
2013-06-13  1:43   ` Rich Felker
2013-07-09  5:06     ` Andre Renaud
2013-07-09  5:37       ` Rich Felker
2013-07-09  6:24         ` Harald Becker
2013-07-09 21:28         ` Andre Renaud
2013-07-09 22:26           ` Andre Renaud
2013-07-10  6:42             ` Jens Gustedt
2013-07-10  7:50               ` Rich Felker
2013-07-10 22:44             ` Andre Renaud
2013-07-11  3:37               ` Rich Felker
2013-07-11  4:04                 ` Andre Renaud
2013-07-11  5:10                   ` Andre Renaud [this message]
2013-07-11 12:46                     ` Rich Felker
2013-07-11 22:34                       ` Andre Renaud
2013-07-12  3:16                         ` Rich Felker
2013-07-12  3:36                           ` Andre Renaud
2013-07-12  4:16                             ` Rich Felker
2013-07-24  1:34                               ` Andre Renaud
2013-07-24  3:48                                 ` Rich Felker
2013-07-24  4:40                                   ` Andre Renaud
2013-07-28  8:09                                     ` Rich Felker
2013-07-11  5:27                 ` Daniel Cegiełka
2013-07-11 12:49                   ` Rich Felker
2013-07-15  4:25                 ` Rob Landley
2013-07-10 19:42           ` Rich Felker
2013-07-14  6:37             ` Rob Landley
2013-07-11  4:30           ` Strake
2013-07-11  4:33             ` Rich Felker
2013-07-10 19:38         ` Rob Landley
2013-07-10 20:34           ` Andre Renaud
2013-07-10 20:49             ` Nathan McSween
2013-07-10 21:01             ` Rich Felker
2013-06-13 15:46 ` Isaac
2013-06-26  1:44 ` Rich Felker
2013-06-26 10:19   ` Szabolcs Nagy
2013-06-26 14:21     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPfzE3aoD4mpO9RrV-enuXxkCvMPY_7rEE6e9w8NuX-ntEqtqA@mail.gmail.com \
    --to=andre@bluewatersys.com \
    --cc=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).