From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/2872
Path: news.gmane.org!not-for-mail
From: Rob Landley <rob@landley.net>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: ARM optimisations
Date: Fri, 01 Mar 2013 22:33:19 -0600
Message-ID: <1362198799.21837.5@driftwood>
References: <20130228233051.GO20323@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; DelSp=Yes; Format=Flowed
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1362202954 32461 80.91.229.3 (2 Mar 2013 05:42:34 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 2 Mar 2013 05:42:34 +0000 (UTC)
Cc: musl@lists.openwall.com
To: musl@lists.openwall.com
Original-X-From: musl-return-2873-gllmg-musl=m.gmane.org@lists.openwall.com Sat Mar 02 06:42:58 2013
Return-path: <musl-return-2873-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-2873-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1UBfDj-0003U4-3l
	for gllmg-musl@plane.gmane.org; Sat, 02 Mar 2013 06:42:55 +0100
Original-Received: (qmail 18262 invoked by uid 550); 2 Mar 2013 05:42:32 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 18254 invoked from network); 2 Mar 2013 05:42:32 -0000
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20120113;
        h=x-received:date:from:subject:to:cc:in-reply-to:x-mailer:message-id
         :mime-version:content-type:content-disposition
         :content-transfer-encoding:x-gm-message-state;
        bh=UaYxfGsD5YMtMNKohxAJw2uQSon+8DX3qY5payfoJTg=;
        b=k6Qx1F5PfVABpbE8eTv/7P4+J6bA48QLn9qf2Rw0uFnA7q8hNbH/GTw4rG/b7sJdtW
         w5JMSjVGX5ukpKz8eC9MT1zU1TxFLBUw2K4TjKizbny+wuew2p/PgyMcdv4HikMc1ejG
         YxFnyJ8lI/GxpsA/R8sYTH702iwjhVpTFY+tbU3DLg4JK6rF8q9BjElgVXo89fAiODCz
         DtEgdWhkj8Sd7P9PesIShwPDJArM5IZ0NtBYdYRE8wwMiHFDoQhPd9cRFm+ExCasclZI
         t5MIsrT2tzaDTaRyhYhf+jjQFTVp08Vu3OfrYhxCtgG/8zVTQlWJ4FH5r4AEHDiq7MfO
         wpWA==
X-Received: by 10.42.148.71 with SMTP id q7mr10832385icv.53.1362202940463;
        Fri, 01 Mar 2013 21:42:20 -0800 (PST)
In-Reply-To: <20130228233051.GO20323@brightrain.aerifal.cx> (from
	dalias@aerifal.cx on Thu Feb 28 17:30:51 2013)
X-Mailer: Balsa 2.4.11
Content-Disposition: inline
X-Gm-Message-State: ALoCoQlHluy0g2Tjy43We/IqJPihw8RDyFJIPrDy/LkrKK0LbUNl3zBSs63iWdBJqF35nhRsSK+J
Xref: news.gmane.org gmane.linux.lib.musl.general:2872
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/2872>

On 02/28/2013 05:30:51 PM, Rich Felker wrote:
> On Fri, Mar 01, 2013 at 12:15:21PM +1300, Andre Renaud wrote:
> > Hi,
> > Can anyone tell me what the policy for musl is regarding ARM =20
> optimised
> > assembly implementations of functions such as memcpy/memmove? I =20
> notice
> > that there are i386/x86_64 versions for some of these. Doing some
> > simple testing on an ARM platform I found that an ARM asm
> > implementation of memcpy is ~80% faster than the C one currently in
> > MUSL (this is on an ARMv5, so no NEON instructions or similar).
> >
> > I don't think I'm capable of writing the optimised version entirely
> > myself, however there are various implementations floating around in
> > libraries such as bionic etc... Is it possible to have BSD licensed
> > code brought in to musl (which is MIT licensed)?
>=20
> ARM optimizations are welcome as long as they're thoroughly tested,
> not heavily bloated, and support all v4 (including no-thumb) and later
> cpu models, either by using universally-available features or
> conditioning use of features on the .hidden __hwcap provided in musl.

Out of curiosity, why armv4 no thumb?

I'd actually say that armv5 is probably the one to optimize for, =20
because it's somewhere over 80% of the installed base of arm systems =20
and generally provides an additonal 25% speedup from armv4 to armv5. =20
Anything lower than that can use C, anything newer than that can =20
benefit from an armv5 version vs C.

The reason armv4t _without_ thumb isn't interesting is you need at =20
least armv4t to use EABI, and I had to patch my compiler to make even =20
that work because telling it EABI hardwired output to <=3D armv5l even =20
though that wasn't technically required. (Presumably since fixed but =20
the point is nobody _noticed_ for several years.)

Newer compilers have dropped support for OABI entirely, and armv4t =20
systems aren't that common. (They existed, the tin can tools nail board =20
used one, but the generic C code works for them. Point is I'm not sure =20
they're worth _optimizing_ for if it costs the vast majority of systems =20
a 25% performance hit and we don't want to maintain multiple versions. =20
If you _have_ an armv5 version, the armv4 one won't/shouldn't get much =20
testing.)

I believe armv6 was mostly just SMP extensions, so not worth optimizing =20
memcpy for. armv7 is nice but not uibiquitous the way armv5 is, and =20
armv7 brings with it the "thumb2" instruction set which means you'd =20
need 2 versions depending on what target you wanted to compile for...

Rob=