From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id d998ebdb for ; Thu, 16 Jan 2020 15:21:45 +0000 (UTC) Received: (qmail 5752 invoked by uid 550); 16 Jan 2020 15:21:43 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 5732 invoked from network); 16 Jan 2020 15:21:43 -0000 Date: Thu, 16 Jan 2020 16:21:20 +0100 From: Natanael Copa To: Andre McCurdy Cc: musl@lists.openwall.com Message-ID: <20200116162120.5c1a90ba@ncopa-desktop.copa.dup.pw> In-Reply-To: References: <20190913184432.29753-1-armccurdy@gmail.com> <20190913184432.29753-2-armccurdy@gmail.com> <20190913185910.GK9017@brightrain.aerifal.cx> <20200115154553.GH30412@brightrain.aerifal.cx> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-alpine-linux-musl) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [musl] [PATCH 2/2] Add big-endian support to ARM assembler memcpy On Wed, 15 Jan 2020 10:41:08 -0800 Andre McCurdy wrote: > On Wed, Jan 15, 2020 at 7:46 AM Rich Felker wrote: > > On Fri, Sep 13, 2019 at 01:38:34PM -0700, Andre McCurdy wrote: > > > On Fri, Sep 13, 2019 at 11:59 AM Rich Felker wrote: > > > > On Fri, Sep 13, 2019 at 11:44:32AM -0700, Andre McCurdy wrote: > > > > > Allow the existing ARM assembler memcpy implementation to be used for > > > > > both big and little endian targets. > > > > > > > > Nice. I don't want to merge this just before release, but as long as > > > > it looks ok I should be able to review and merge it afterward. > > > > > > > > Note that I'd really like to replace this giant file with C using > > > > inline asm just for the inner block copies and C for all the flow > > > > control, but I don't mind merging this first as long as it's correct. > > > > > > Sounds good. I'll wait for your feedback after the upcoming release. > > > > Sorry this dropped off my radar. I'd like to merge at least the thumb > > part since it's simple enough to review quickly and users have > > actually complained about memcpy being slow on armv7 with -mthumb as > > default. > > Interesting. I wonder what the reference was against which the musl C > code was compared? From my own benchmarking I didn't find the musl > assembler to be much faster than the C code. There are armv6 and maybe > early armv7 CPUs where explicit prefetch instructions make a huge > difference (much more so than C -vs- assembler). Did the users who > complained about musl memcpy() compare against a memcpy() which uses > prefetch? For armv7 using NEON may help, although the latest armv7 > cores seem to perform very well with plain old C code too. There are > lots of trade offs so it's impossible for a single implementation to > be universally optimal. The "arm-mem" routines used on Raspberry Pi > seem to be a very fast for many targets, but unfortunately the armv6 > memcpy generates mis-aligned accesses so isn't suitable for armv5. > > https://github.com/bavison/arm-mem/ The Alpine user reported it here: https://gitlab.alpinelinux.org/alpine/aports/issues/11128 I don't know if you got the __builtin_memcpy or the libc version. I do know that qemu once got surprised that `memcpy` used libc's non-atomic version instead of gcc's atomic __builtin_memcpy. This happened due to alpine users fortify-headers as FORTIFY_SOURCE implementation. Not sure if something similar happened here. -nc