From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 25045 invoked from network); 25 Jun 2020 22:06:06 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 25 Jun 2020 22:06:06 -0000 Received: (qmail 32548 invoked by uid 550); 25 Jun 2020 22:06:04 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 32527 invoked from network); 25 Jun 2020 22:06:04 -0000 Date: Thu, 25 Jun 2020 17:50:42 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200625215041.GT6430@brightrain.aerifal.cx> References: <20200121185215.5958-1-armccurdy@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200121185215.5958-1-armccurdy@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH v2] Add big-endian support to ARM assembler memcpy On Tue, Jan 21, 2020 at 10:52:15AM -0800, Andre McCurdy wrote: > Allow the existing ARM assembler memcpy implementation to be used for > both big and little endian targets. > --- > > Exactly the same changes as before but rebased to account for > whitespace changes in the preceding patch to add Thumb2 support. > > COPYRIGHT | 2 +- > src/string/arm/{memcpy_le.S => memcpy.S} | 101 ++++++++++++++++++++++- > src/string/arm/memcpy.c | 3 - > 3 files changed, 98 insertions(+), 8 deletions(-) > rename src/string/arm/{memcpy_le.S => memcpy.S} (82%) > delete mode 100644 src/string/arm/memcpy.c > > diff --git a/COPYRIGHT b/COPYRIGHT > index e6472371..d3edc2a2 100644 > --- a/COPYRIGHT > +++ b/COPYRIGHT > @@ -127,7 +127,7 @@ Copyright © 2017-2018 Arm Limited > and labelled as such in comments in the individual source files. All > have been licensed under extremely permissive terms. > > -The ARM memcpy code (src/string/arm/memcpy_el.S) is Copyright © 2008 > +The ARM memcpy code (src/string/arm/memcpy.S) is Copyright © 2008 > The Android Open Source Project and is licensed under a two-clause BSD > license. It was taken from Bionic libc, used on Android. > > diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy.S > similarity index 82% > rename from src/string/arm/memcpy_le.S > rename to src/string/arm/memcpy.S > index 7b35d305..869e3448 100644 > --- a/src/string/arm/memcpy_le.S > +++ b/src/string/arm/memcpy.S > @@ -1,5 +1,3 @@ > -#if !__ARMEB__ > - > /* > * Copyright (C) 2008 The Android Open Source Project > * All rights reserved. > @@ -42,7 +40,7 @@ > * code safely callable from thumb mode, adjusting the return > * instructions to be compatible with pre-thumb ARM cpus, removal of > * prefetch code that is not compatible with older cpus and support for > - * building as thumb 2. > + * building as thumb 2 and big-endian. > */ > > .syntax unified > @@ -227,24 +225,45 @@ non_congruent: > * becomes aligned to 32 bits (r5 = nb of words to copy for alignment) > */ > movs r5, r5, lsl #31 > + > +#if __ARMEB__ > + movmi r3, r3, ror #24 > + strbmi r3, [r0], #1 > + movcs r3, r3, ror #24 > + strbcs r3, [r0], #1 > + movcs r3, r3, ror #24 > + strbcs r3, [r0], #1 > +#else > strbmi r3, [r0], #1 > movmi r3, r3, lsr #8 > strbcs r3, [r0], #1 > movcs r3, r3, lsr #8 > strbcs r3, [r0], #1 > movcs r3, r3, lsr #8 > +#endif > > cmp r2, #4 > blo partial_word_tail > > +#if __ARMEB__ > + mov r3, r3, lsr r12 > + mov r3, r3, lsl r12 > +#endif > + > /* Align destination to 32 bytes (cache line boundary) */ > 1: tst r0, #0x1c > beq 2f > ldr r5, [r1], #4 > sub r2, r2, #4 > +#if __ARMEB__ > + mov r4, r5, lsr lr > + orr r4, r4, r3 > + mov r3, r5, lsl r12 > +#else > mov r4, r5, lsl lr > orr r4, r4, r3 > mov r3, r5, lsr r12 > +#endif Am I missing something or are both cases identical here? That would either indicate this is gratuitous or there's a bug here and they were intended not to be the same. > [...] > @@ -350,9 +429,15 @@ less_than_thirtytwo: > > 1: ldr r5, [r1], #4 > sub r2, r2, #4 > +#if __ARMEB__ > + mov r4, r5, lsr lr > + orr r4, r4, r3 > + mov r3, r5, lsl r12 > +#else > mov r4, r5, lsl lr > orr r4, r4, r3 > mov r3, r5, lsr r12 > +#endif And again here. Rich