From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3630
Path: news.gmane.org!not-for-mail
From: Andre Renaud <andre@bluewatersys.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Thinking about release
Date: Fri, 12 Jul 2013 10:34:31 +1200
Message-ID: <CAPfzE3aKG7JYE_u3oVDfkF2xDSdhzdrY3ui-H0bUduQXUOQ6Vg@mail.gmail.com>
References: <CAPfzE3a0h=2NFqgnBqXj3J2q7VgYjqZ19Ab=0LAe5u5SvWXHaA@mail.gmail.com>
	<20130613014314.GC29800@brightrain.aerifal.cx>
	<CAPfzE3aerGrdmTkj15o0CTVtt8TZpTyAnSAj1Joau+Jb_cNGUA@mail.gmail.com>
	<20130709053711.GO29800@brightrain.aerifal.cx>
	<CAPfzE3ZTxynUeJjq7KWijZGhsV==NymW4vqLhnQbEYCXRxVf-g@mail.gmail.com>
	<CAPfzE3ZsMpC9d4VDZyHabhKOffOQW0dnG7Nwpm8EqVBLUXNZKg@mail.gmail.com>
	<CAPfzE3YDFjqHxRaZFeiy0CvbYWYGKzgDGEp-71xSz-03GhNTxw@mail.gmail.com>
	<20130711033754.GL29800@brightrain.aerifal.cx>
	<CAPfzE3ZMGwEvs2n_4LCKzMv0FROS55_1N+HdBw7HgNhexgM+eA@mail.gmail.com>
	<CAPfzE3aoD4mpO9RrV-enuXxkCvMPY_7rEE6e9w8NuX-ntEqtqA@mail.gmail.com>
	<20130711124613.GO29800@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-Trace: ger.gmane.org 1373582086 25103 80.91.229.3 (11 Jul 2013 22:34:46 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 11 Jul 2013 22:34:46 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-3634-gllmg-musl=m.gmane.org@lists.openwall.com Fri Jul 12 00:34:47 2013
Return-path: <musl-return-3634-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-3634-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1UxPRk-0001LP-S9
	for gllmg-musl@plane.gmane.org; Fri, 12 Jul 2013 00:34:44 +0200
Original-Received: (qmail 24526 invoked by uid 550); 11 Jul 2013 22:34:44 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 24518 invoked from network); 11 Jul 2013 22:34:44 -0000
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20120113;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:x-gm-message-state;
        bh=x5s+3ttgi8akWxISe4O/ALaJnxkAPgzaQsalRksc06c=;
        b=XF8k6jyYVTi9vNF5vIXohYbL+iEPRDa2Dxlr2/L7tSdTIFJT393PiU4MD2AMBsUgqt
         En01+2N+DsFcxjjzuC6erbmDxfwd0r4PUnyf1Is+GQx2SDCH6jkp4ryw2PTKgrcSaGV8
         vUiiQpIUMIbHPNQ0/LC0oZj1hZZGW6WsKPA3XLkaL+bE84m0rstn+PYBYBBCoNRIMJRW
         GRSy3QTVI4rsteQKA7Mq0TnPE8PGv6qF34gESbHHrBTfcu3w76HXr0EPEv6aEw9Xn6J7
         84ornvbg8b+/gAyGqKq1eSry4EMX0v2Jv3AekC9PHMVNFkGtBbMwPHuSwQRHIY2BvO9u
         bjIg==
X-Received: by 10.58.34.69 with SMTP id x5mr23084932vei.11.1373582071668; Thu,
 11 Jul 2013 15:34:31 -0700 (PDT)
In-Reply-To: <20130711124613.GO29800@brightrain.aerifal.cx>
X-Gm-Message-State: ALoCoQkVM0HrUORBL0s0lhlw5gHqHa5Ut7F/ITsLpA4TsVJw5p6XlVHPVff0suN1e/pNNCmlDR8v
Xref: news.gmane.org gmane.linux.lib.musl.general:3630
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/3630>

Hi Rich,

> You need both instructions in the same asm block, and proper
> constraints. As it is, whether the registers keep their values between
> the two separate asm blocks is up to the compiler's whims.
>
> With the proper constraints ("+r" type), the s+=SS and d+=SS are
> unnecessary, as a bonus. Also there's no reason to force alignment to
> SS for this loop; that will simply prevent it from being used as much
> for smaller copies. I would use SS==sizeof(size_t) and then write 8*SS
> in the for loop.
>
> Last night I was in the process of writing something very similar, but
> I put the for loop in asm too and didn't finish it. If it performs
> just as well with the loop in C, I like your version better.

I've rejiggled it a bit, and it appears to be working. I wasn't
entirely sure what you meant about the proper constraints. There is an
additional reason why 8*4 was used for the align - to force the whole
loop to work in cache-line blocks. I've now done this explicitly on
the lead-in by doing the first few copies as 32-bit, then going to the
full cache-line asm. This has the same performance as the fully native
assembler. However to get that I had to use the same trick that the
native assembler uses - doing a load of the next block prior to
storing this one. I'm a bit concerned that this would mean we'd be
doing a read that was out of bounds, and I can't entirely see why this
wouldn't be happening with the existing assembler (but I'm presuming
it doesn't). Any comments on this side of it?

#define SS sizeof(size_t)
#define ALIGN (SS - 1)
void * noinline my_asm_memcpy(void * restrict dest, const void *
restrict src, size_t n)
{
    unsigned char *d = dest;
    const unsigned char *s = src;

    if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN))
        goto misaligned;

    /* ARM has 32-byte cache lines, so get us aligned to that */
    for (; ((uintptr_t)d & ((8 * SS) - 1)) && n; n-=SS) {
            *(size_t *)d = *(size_t *)s;
            d += SS;
            s+= SS;
    }
    /* Do full cache line read/writes */
    if (n) {
        for (; n>=(8 * SS); n-= (8 * SS)) {
                __asm__ (
                        "ldmia %0, {r4-r11}\n"
                        "add %0, %0, %4\n"
                        "bic r12, %0, %5\n"
                        "ldrhi r12, [%0]\n"
                        "stmia %1, {r4-r11}\n"
                        "add %1, %1, %4"
                        : "=r"(s), "=r"(d)
                        : "0"(s), "1"(d), "i"(8 * SS), "i"((8 * SS) - 1)
                        : "r4", "r5", "r6", "r7", "r8",
                          "r9", "r10", "r11", "r12");
        }

misaligned:
        for (; n; n--) *d++ = *s++;
    }
    return dest;

}

Regards,
Andre