From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3714 Path: news.gmane.org!not-for-mail From: Andre Renaud Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thinking about release Date: Wed, 24 Jul 2013 16:40:16 +1200 Message-ID: References: <20130711033754.GL29800@brightrain.aerifal.cx> <20130711124613.GO29800@brightrain.aerifal.cx> <20130712031615.GS29800@brightrain.aerifal.cx> <20130712041609.GV29800@brightrain.aerifal.cx> <20130724034843.GP3249@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1374640829 29796 80.91.229.3 (24 Jul 2013 04:40:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 24 Jul 2013 04:40:29 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3718-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 24 06:40:31 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V1qsH-0000bF-Hn for gllmg-musl@plane.gmane.org; Wed, 24 Jul 2013 06:40:29 +0200 Original-Received: (qmail 7687 invoked by uid 550); 24 Jul 2013 04:40:28 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 7676 invoked from network); 24 Jul 2013 04:40:28 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=dYSmRvT1aTcinQFe3scEACajLjdz/ibtYL4NdC2fZoU=; b=ALZpWBcF3n9QzBtHjlJgldT36ueWc9gweCzgRcHOiLeBtXSwlYsfw1m+5doNj5t1pf qjBM6xD4fHHtw0QeCcilbQq6uzT/dWALjxBdV4snQ77NBLEtFWNw5WMHuAC+yqJaZeWL /leHktiO0Jg00L9exac10CwhwRypTOZkWc6Av4WkD0K/1U/sCgbTkBPzF/w/fiv9Lf6x +a+u4sJLeC4LEHEHLdR5Lgcg72yrHiynv69POEeMdn+9qILVin1zkug3uLeHGM5yRfbG KVUUaMYRsTo+O7/WJ/SbFKzdesQN9vN8MAza7gYYe4BAqr0SIP2DhdRU7oL83+AvD6f1 19Dg== X-Received: by 10.58.152.3 with SMTP id uu3mr13156514veb.16.1374640816716; Tue, 23 Jul 2013 21:40:16 -0700 (PDT) In-Reply-To: <20130724034843.GP3249@brightrain.aerifal.cx> X-Gm-Message-State: ALoCoQm1RdjvKteZNT71qe9aEulN9TuoeR3DAsy8oCnORITxr3e72bTzkuuqRPzIEUumyWz9M5n0 Xref: news.gmane.org gmane.linux.lib.musl.general:3714 Archived-At: Hi Rich, > It looks buggy as-is; as far as I can tell, it will crash if src/dest > are aligned with respect to each other but not aligned mod 4, i.e. the > code starts out copying word-at-a-time rather than byte-at-a-time. Yes, you are correct, I'd messed that up while looking at the cache alignment stuff (along with anoter small size related bug). Fixing it is relatively straight forward though: #define SS sizeof(size_t) #define ALIGN (SS - 1) void * noinline my_asm_memcpy(void * restrict dest, const void * restrict src, size_t n) { unsigned char *d = dest; const unsigned char *s = src; if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN)) goto misaligned; /* Get them word aligned */ for (; ((uintptr_t)d & ALIGN) && n; n--) *d++ = *s++; /* ARM has 32-byte cache lines, so align to that for performance */ for (; ((uintptr_t)d & ((8 * SS) - 1)) && n >= SS; n-=SS) { *(size_t *)d = *(size_t *)s; d += SS; s += SS; } /* Do full cache line read/writes */ for (; n>=(8 * SS); n-= (8 * SS)) __asm__ __volatile__( "ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" "ldrhi r12, [%1]\n" "stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" : "+r"(d), "+r"(s) : : "a4", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "r12", "memory"); misaligned: for (; n; n--) *d++ = *s++; return dest; } > I think the C version would be acceptable if we get the bugs fixed and > test it well, but I'd also like to still keep the asm under > consideration. There are lots of cases not covered by the C version, > like misaligned copies (important for strings, not for much else). Do > you think these cases are important? At the moment the mis-aligned copies perform terribly (18MB/s vs glibc @ 100MB/s). However the existing C implementation in musl is no different, so we're not degrading the current system. We're essentially missing the non-congruent copying stuff from the asm code. I'll have a look at this and see if I can write a similar C version. Regards, Andre