From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/2873 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: ARM optimisations Date: Sat, 2 Mar 2013 01:21:02 -0500 Message-ID: <20130302062102.GP20323@brightrain.aerifal.cx> References: <20130228233051.GO20323@brightrain.aerifal.cx> <1362198799.21837.5@driftwood> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1362205274 15401 80.91.229.3 (2 Mar 2013 06:21:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 2 Mar 2013 06:21:14 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-2874-gllmg-musl=m.gmane.org@lists.openwall.com Sat Mar 02 07:21:38 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UBfpB-00005z-Rc for gllmg-musl@plane.gmane.org; Sat, 02 Mar 2013 07:21:38 +0100 Original-Received: (qmail 11932 invoked by uid 550); 2 Mar 2013 06:21:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 11922 invoked from network); 2 Mar 2013 06:21:15 -0000 Content-Disposition: inline In-Reply-To: <1362198799.21837.5@driftwood> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:2873 Archived-At: On Fri, Mar 01, 2013 at 10:33:19PM -0600, Rob Landley wrote: > On 02/28/2013 05:30:51 PM, Rich Felker wrote: > >On Fri, Mar 01, 2013 at 12:15:21PM +1300, Andre Renaud wrote: > >> Hi, > >> Can anyone tell me what the policy for musl is regarding ARM > >optimised > >> assembly implementations of functions such as memcpy/memmove? I > >notice > >> that there are i386/x86_64 versions for some of these. Doing some > >> simple testing on an ARM platform I found that an ARM asm > >> implementation of memcpy is ~80% faster than the C one currently in > >> MUSL (this is on an ARMv5, so no NEON instructions or similar). > >> > >> I don't think I'm capable of writing the optimised version entirely > >> myself, however there are various implementations floating around in > >> libraries such as bionic etc... Is it possible to have BSD licensed > >> code brought in to musl (which is MIT licensed)? > > > >ARM optimizations are welcome as long as they're thoroughly tested, > >not heavily bloated, and support all v4 (including no-thumb) and later > >cpu models, either by using universally-available features or > >conditioning use of features on the .hidden __hwcap provided in musl. > > Out of curiosity, why armv4 no thumb? > > I'd actually say that armv5 is probably the one to optimize for, > because it's somewhere over 80% of the installed base of arm systems > and generally provides an additonal 25% speedup from armv4 to armv5. > Anything lower than that can use C, anything newer than that can > benefit from an armv5 version vs C. > > The reason armv4t _without_ thumb isn't interesting is you need at > least armv4t to use EABI, and I had to patch my compiler to make This is a compiler bug. If the compiler can be made to generate proper return code, EABI works with armv4 (non-thumb) too. > Newer compilers have dropped support for OABI entirely, and armv4t OABI is not supported by musl at all. The intent is simply not to _preclude_ use of non-thumb, even though there are other obstacles to its use now. > systems aren't that common. (They existed, the tin can tools nail > board used one, but the generic C code works for them. Point is I'm > not sure they're worth _optimizing_ for if it costs the vast > majority of systems a 25% performance hit and we don't want to > maintain multiple versions. If you _have_ an armv5 version, the > armv4 one won't/shouldn't get much testing.) Can you explain why you think a version that's v4 compatible will be that much slower? If so, v5 code can be used as long as it checks __hwcap and falls back to a simple working version... Rich