From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3630 Path: news.gmane.org!not-for-mail From: Andre Renaud Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thinking about release Date: Fri, 12 Jul 2013 10:34:31 +1200 Message-ID: References: <20130613014314.GC29800@brightrain.aerifal.cx> <20130709053711.GO29800@brightrain.aerifal.cx> <20130711033754.GL29800@brightrain.aerifal.cx> <20130711124613.GO29800@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1373582086 25103 80.91.229.3 (11 Jul 2013 22:34:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 11 Jul 2013 22:34:46 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3634-gllmg-musl=m.gmane.org@lists.openwall.com Fri Jul 12 00:34:47 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UxPRk-0001LP-S9 for gllmg-musl@plane.gmane.org; Fri, 12 Jul 2013 00:34:44 +0200 Original-Received: (qmail 24526 invoked by uid 550); 11 Jul 2013 22:34:44 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 24518 invoked from network); 11 Jul 2013 22:34:44 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=x5s+3ttgi8akWxISe4O/ALaJnxkAPgzaQsalRksc06c=; b=XF8k6jyYVTi9vNF5vIXohYbL+iEPRDa2Dxlr2/L7tSdTIFJT393PiU4MD2AMBsUgqt En01+2N+DsFcxjjzuC6erbmDxfwd0r4PUnyf1Is+GQx2SDCH6jkp4ryw2PTKgrcSaGV8 vUiiQpIUMIbHPNQ0/LC0oZj1hZZGW6WsKPA3XLkaL+bE84m0rstn+PYBYBBCoNRIMJRW GRSy3QTVI4rsteQKA7Mq0TnPE8PGv6qF34gESbHHrBTfcu3w76HXr0EPEv6aEw9Xn6J7 84ornvbg8b+/gAyGqKq1eSry4EMX0v2Jv3AekC9PHMVNFkGtBbMwPHuSwQRHIY2BvO9u bjIg== X-Received: by 10.58.34.69 with SMTP id x5mr23084932vei.11.1373582071668; Thu, 11 Jul 2013 15:34:31 -0700 (PDT) In-Reply-To: <20130711124613.GO29800@brightrain.aerifal.cx> X-Gm-Message-State: ALoCoQkVM0HrUORBL0s0lhlw5gHqHa5Ut7F/ITsLpA4TsVJw5p6XlVHPVff0suN1e/pNNCmlDR8v Xref: news.gmane.org gmane.linux.lib.musl.general:3630 Archived-At: Hi Rich, > You need both instructions in the same asm block, and proper > constraints. As it is, whether the registers keep their values between > the two separate asm blocks is up to the compiler's whims. > > With the proper constraints ("+r" type), the s+=SS and d+=SS are > unnecessary, as a bonus. Also there's no reason to force alignment to > SS for this loop; that will simply prevent it from being used as much > for smaller copies. I would use SS==sizeof(size_t) and then write 8*SS > in the for loop. > > Last night I was in the process of writing something very similar, but > I put the for loop in asm too and didn't finish it. If it performs > just as well with the loop in C, I like your version better. I've rejiggled it a bit, and it appears to be working. I wasn't entirely sure what you meant about the proper constraints. There is an additional reason why 8*4 was used for the align - to force the whole loop to work in cache-line blocks. I've now done this explicitly on the lead-in by doing the first few copies as 32-bit, then going to the full cache-line asm. This has the same performance as the fully native assembler. However to get that I had to use the same trick that the native assembler uses - doing a load of the next block prior to storing this one. I'm a bit concerned that this would mean we'd be doing a read that was out of bounds, and I can't entirely see why this wouldn't be happening with the existing assembler (but I'm presuming it doesn't). Any comments on this side of it? #define SS sizeof(size_t) #define ALIGN (SS - 1) void * noinline my_asm_memcpy(void * restrict dest, const void * restrict src, size_t n) { unsigned char *d = dest; const unsigned char *s = src; if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN)) goto misaligned; /* ARM has 32-byte cache lines, so get us aligned to that */ for (; ((uintptr_t)d & ((8 * SS) - 1)) && n; n-=SS) { *(size_t *)d = *(size_t *)s; d += SS; s+= SS; } /* Do full cache line read/writes */ if (n) { for (; n>=(8 * SS); n-= (8 * SS)) { __asm__ ( "ldmia %0, {r4-r11}\n" "add %0, %0, %4\n" "bic r12, %0, %5\n" "ldrhi r12, [%0]\n" "stmia %1, {r4-r11}\n" "add %1, %1, %4" : "=r"(s), "=r"(d) : "0"(s), "1"(d), "i"(8 * SS), "i"((8 * SS) - 1) : "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12"); } misaligned: for (; n; n--) *d++ = *s++; } return dest; } Regards, Andre