From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3626 Path: news.gmane.org!not-for-mail From: Andre Renaud Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thinking about release Date: Thu, 11 Jul 2013 17:10:41 +1200 Message-ID: References: <20130613012517.GA5859@brightrain.aerifal.cx> <20130613014314.GC29800@brightrain.aerifal.cx> <20130709053711.GO29800@brightrain.aerifal.cx> <20130711033754.GL29800@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1373519454 17135 80.91.229.3 (11 Jul 2013 05:10:54 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 11 Jul 2013 05:10:54 +0000 (UTC) Cc: Rich Felker To: musl@lists.openwall.com Original-X-From: musl-return-3630-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jul 11 07:10:54 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Ux99a-0006Sg-4v for gllmg-musl@plane.gmane.org; Thu, 11 Jul 2013 07:10:54 +0200 Original-Received: (qmail 26053 invoked by uid 550); 11 Jul 2013 05:10:53 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 26045 invoked from network); 11 Jul 2013 05:10:53 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=uZaEdD5bWHBLYgqnpKxhsqLEXGmTkXFz0Bb6LM9aj/Q=; b=NyXLUyPBOVbSZM3XkWLQdVjTa2oBf5U5o8o7+oIfOWhe8/+tgV0F8N7QxEt9kO1OAi WYrePs64at5WYK+l+nwTbIpLX7QahfE5+sepGbPZRRv0ipwQpuWq9Cc42y9RGRJCmvta KhbDMyllbB4LjWA1waSE9xY3WyfkJtuJyYiXbQFMA7dST9PnxWbmYAcKl+dhtR/x05uw gsD9pSWnltTG0vXvwTbX8LLT+tMrfpoyTFofiY97q8xUyv+jw5vZfk+IolbVRiI112W4 WkYHznLTXQuSF9Aaj1gfL+tagn7GPsp/o+k4FG6lae6gQd3VYrwXqLSESKneKbp39Ega 60RQ== X-Received: by 10.52.165.239 with SMTP id zb15mr17436273vdb.44.1373519441611; Wed, 10 Jul 2013 22:10:41 -0700 (PDT) In-Reply-To: X-Gm-Message-State: ALoCoQnhwzbIX6BonKNlMXT1lQOr9xf0ZmiKu/rMBYIsco/kbVMV0ACP+fkyycjfW9N0swSMI9Jc Xref: news.gmane.org gmane.linux.lib.musl.general:3626 Archived-At: > I can't see any obvious reason why this shouldn't work, although the > assembler as it stands makes pretty heavy use of all the registers, > and I can't immediately see how to rework it to free up 2 more (I can > free up 1 by dropping the attempted preload). Given my (lack of) > skills with ARM assembler, I'm not sure I'll be able to look too > deeply into either of these options, but I'll have a go at the inline > ASM version to force 8*4byte loads to see if it improves things. I've given it a bit of a go, and at first it appears to be working (although I don't exactly have a comprehensive test suite, so this is very preliminary). Anyone with some more ARM assembler experience is welcome to chip in with a comment. I also managed to mess up my last set of benchmarking - I'd indicated that I got 65 vs 95 vs 105, however I'd stuffed up the fact that the first call would have poor cache performance. Once I corrected that the results have become more like 65(naive) vs 105(typedef) vs 113(asm). Using the below code, it becomes 65(naive), 113(inline asm), 113(full asm). So the inline is able to do perform as we'd expect. Assuming that it is technically correct (which is probably the biggest question). #define SS (8 * 4) #define ALIGN (SS - 1) void * noinline my_asm_memcpy(void * restrict dest, const void * restrict src, size_t n) { unsigned char *d = dest; const unsigned char *s = src; if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN)) goto misaligned; for (; ((uintptr_t)d & ALIGN) && n; n--) *d++ = *s++; if (n) { for (; n>=SS; n-= SS) { __asm__("ldmia %0, {r4-r11}" : "=r" (s) : "0" (s) : "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11"); s+=SS; __asm__("stmia %0, {r4-r11}" : "=r" (d) :"0" (d)); d+=SS; } misaligned: for (; n; n--) *d++ = *s++; } return dest; } Regards, Andre