From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/2876 Path: news.gmane.org!not-for-mail From: Andre Renaud Newsgroups: gmane.linux.lib.musl.general Subject: Re: ARM optimisations Date: Sun, 3 Mar 2013 09:33:54 +1300 Message-ID: References: <20130228233051.GO20323@brightrain.aerifal.cx> <1362198799.21837.5@driftwood> <20130302113441.GQ6181@port70.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1362256446 3379 80.91.229.3 (2 Mar 2013 20:34:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 2 Mar 2013 20:34:06 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-2877-gllmg-musl=m.gmane.org@lists.openwall.com Sat Mar 02 21:34:30 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UBt8X-00043k-QL for gllmg-musl@plane.gmane.org; Sat, 02 Mar 2013 21:34:29 +0100 Original-Received: (qmail 15668 invoked by uid 550); 2 Mar 2013 20:34:07 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 15658 invoked from network); 2 Mar 2013 20:34:06 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=HJ614L2RGdoNYYgiB+vvbGQpaRsjhYzwvUfXE3ZorYQ=; b=DaWYSmOyWlq6uI3oGJC1/G1wCxy+Ancg/tAl74ZLTwvaxJa70jtMGhFap/jOsHAzK2 qFgECN+wdEf0SkqXcRG8+VoPK14j1WCh8LsspIUuzhg2Fci5CtUes05MF1az69SI4GLM Nlwwmv57Kc8Pf3ECDAFCWLDIiVk4quKpBCzMyMyG/JuO0FKNLn68Ue1NG3FhpA0Hx29T XLhITf4oBYdxK5c70uX+IkSxKvdN0yB8aytztVRtzoPnZ39GlHQAukE0atEDpfS2Z5ln Gmh2MJSsriLreEcZ2q9AUKt8j/fgXeLu4b29rElnHH+7LY+al9iIETY7CdkrJ04oJiha gg4Q== X-Received: by 10.60.32.161 with SMTP id k1mr12530660oei.21.1362256434486; Sat, 02 Mar 2013 12:33:54 -0800 (PST) In-Reply-To: <20130302113441.GQ6181@port70.net> X-Gm-Message-State: ALoCoQmPLhA0I9HL1TAEnxGe6lCirhBiGBx+nUMBNKDYbK3ivRwERKBkaA64OsgstCNwBaA7jmmp Xref: news.gmane.org gmane.linux.lib.musl.general:2876 Archived-At: On 3 March 2013 00:34, Szabolcs Nagy wrote: > * Rob Landley [2013-03-01 22:33:19 -0600]: >> I'd actually say that armv5 is probably the one to optimize for, >> because it's somewhere over 80% of the installed base of arm systems >> and generally provides an additonal 25% speedup from armv4 to armv5. >> Anything lower than that can use C, anything newer than that can >> benefit from an armv5 version vs C. > ... >> I believe armv6 was mostly just SMP extensions, so not worth >> optimizing memcpy for. armv7 is nice but not uibiquitous the way >> armv5 is, and armv7 brings with it the "thumb2" instruction set >> which means you'd need 2 versions depending on what target you >> wanted to compile for... > > a quick research shows that > > glibc has ifdefs for armv5te and armv4t optimizations > http://sourceware.org/git/?p=glibc.git;a=blob;f=ports/sysdeps/arm/memcpy.S > > linaro has armv7 optimized version > http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/src/linaro-a9/memcpy.S > > olibc (the bionic one not the openbsd one) has armv7+neon optimized memcpy > https://github.com/olibc/olibc/blob/master/libc/arch-arm/bionic/memcpy.S The bionic code uses a couple of pre-processor tricks to combine the ARMv4 & ARMv5 code, specifically around the PLD and CALIGN instructions. Since (I assume) bionic is built at compile time for a specific CPU, it is relatively easy to do these, however I got the impression (and may be mistaken) that we were trying to avoid compile time CPU detection in favour of run-time CPU detection. If that is the case, then you would need two separate implementations (possibly with some code sharing), and I thought that the overall code-size bloat that this would bring wouldn't be worth it. This is especially true when you talk about ARM NEON/v7, as it is essentially completely different, so you'd end up with somewhere between 300% & 500% code size increase on ARM to support all three platforms (based on the current implementation going from 1k to 1.5k when I used the ASM optimised version). Having said all that, I do tend to agree that the ARMv4 platforms are relatively archaic, and simply not having an optimised version for them could be an acceptable alternative. ARMv5t is probably still too popular to ignore. Regards, Andre