From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/2872 Path: news.gmane.org!not-for-mail From: Rob Landley Newsgroups: gmane.linux.lib.musl.general Subject: Re: ARM optimisations Date: Fri, 01 Mar 2013 22:33:19 -0600 Message-ID: <1362198799.21837.5@driftwood> References: <20130228233051.GO20323@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; DelSp=Yes; Format=Flowed Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1362202954 32461 80.91.229.3 (2 Mar 2013 05:42:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 2 Mar 2013 05:42:34 +0000 (UTC) Cc: musl@lists.openwall.com To: musl@lists.openwall.com Original-X-From: musl-return-2873-gllmg-musl=m.gmane.org@lists.openwall.com Sat Mar 02 06:42:58 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UBfDj-0003U4-3l for gllmg-musl@plane.gmane.org; Sat, 02 Mar 2013 06:42:55 +0100 Original-Received: (qmail 18262 invoked by uid 550); 2 Mar 2013 05:42:32 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 18254 invoked from network); 2 Mar 2013 05:42:32 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:subject:to:cc:in-reply-to:x-mailer:message-id :mime-version:content-type:content-disposition :content-transfer-encoding:x-gm-message-state; bh=UaYxfGsD5YMtMNKohxAJw2uQSon+8DX3qY5payfoJTg=; b=k6Qx1F5PfVABpbE8eTv/7P4+J6bA48QLn9qf2Rw0uFnA7q8hNbH/GTw4rG/b7sJdtW w5JMSjVGX5ukpKz8eC9MT1zU1TxFLBUw2K4TjKizbny+wuew2p/PgyMcdv4HikMc1ejG YxFnyJ8lI/GxpsA/R8sYTH702iwjhVpTFY+tbU3DLg4JK6rF8q9BjElgVXo89fAiODCz DtEgdWhkj8Sd7P9PesIShwPDJArM5IZ0NtBYdYRE8wwMiHFDoQhPd9cRFm+ExCasclZI t5MIsrT2tzaDTaRyhYhf+jjQFTVp08Vu3OfrYhxCtgG/8zVTQlWJ4FH5r4AEHDiq7MfO wpWA== X-Received: by 10.42.148.71 with SMTP id q7mr10832385icv.53.1362202940463; Fri, 01 Mar 2013 21:42:20 -0800 (PST) In-Reply-To: <20130228233051.GO20323@brightrain.aerifal.cx> (from dalias@aerifal.cx on Thu Feb 28 17:30:51 2013) X-Mailer: Balsa 2.4.11 Content-Disposition: inline X-Gm-Message-State: ALoCoQlHluy0g2Tjy43We/IqJPihw8RDyFJIPrDy/LkrKK0LbUNl3zBSs63iWdBJqF35nhRsSK+J Xref: news.gmane.org gmane.linux.lib.musl.general:2872 Archived-At: On 02/28/2013 05:30:51 PM, Rich Felker wrote: > On Fri, Mar 01, 2013 at 12:15:21PM +1300, Andre Renaud wrote: > > Hi, > > Can anyone tell me what the policy for musl is regarding ARM =20 > optimised > > assembly implementations of functions such as memcpy/memmove? I =20 > notice > > that there are i386/x86_64 versions for some of these. Doing some > > simple testing on an ARM platform I found that an ARM asm > > implementation of memcpy is ~80% faster than the C one currently in > > MUSL (this is on an ARMv5, so no NEON instructions or similar). > > > > I don't think I'm capable of writing the optimised version entirely > > myself, however there are various implementations floating around in > > libraries such as bionic etc... Is it possible to have BSD licensed > > code brought in to musl (which is MIT licensed)? >=20 > ARM optimizations are welcome as long as they're thoroughly tested, > not heavily bloated, and support all v4 (including no-thumb) and later > cpu models, either by using universally-available features or > conditioning use of features on the .hidden __hwcap provided in musl. Out of curiosity, why armv4 no thumb? I'd actually say that armv5 is probably the one to optimize for, =20 because it's somewhere over 80% of the installed base of arm systems =20 and generally provides an additonal 25% speedup from armv4 to armv5. =20 Anything lower than that can use C, anything newer than that can =20 benefit from an armv5 version vs C. The reason armv4t _without_ thumb isn't interesting is you need at =20 least armv4t to use EABI, and I had to patch my compiler to make even =20 that work because telling it EABI hardwired output to <=3D armv5l even =20 though that wasn't technically required. (Presumably since fixed but =20 the point is nobody _noticed_ for several years.) Newer compilers have dropped support for OABI entirely, and armv4t =20 systems aren't that common. (They existed, the tin can tools nail board =20 used one, but the generic C code works for them. Point is I'm not sure =20 they're worth _optimizing_ for if it costs the vast majority of systems =20 a 25% performance hit and we don't want to maintain multiple versions. =20 If you _have_ an armv5 version, the armv4 one won't/shouldn't get much =20 testing.) I believe armv6 was mostly just SMP extensions, so not worth optimizing =20 memcpy for. armv7 is nice but not uibiquitous the way armv5 is, and =20 armv7 brings with it the "thumb2" instruction set which means you'd =20 need 2 versions depending on what target you wanted to compile for... Rob=