From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id 91cd63fc for ; Wed, 15 Jan 2020 18:41:34 +0000 (UTC) Received: (qmail 31933 invoked by uid 550); 15 Jan 2020 18:41:33 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 31913 invoked from network); 15 Jan 2020 18:41:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=D+UK0rK6Is18ypW1Rj404n40AP5fsw+f7xgFYEK7v4s=; b=V5lH3NdgnVsn82hpC3KiR7biybzr8sGDGkcaApjSYKBuHs40Q5sjY6KRUJaKXOBfet SbqIYexX+4htqi+hY4HQJThzmaCDSVk3XYoh39acFIr2XKwcbJc+tYBhcUtI3R+1qITd GC9wG9ecFhUn2dx/NehWbb2ydqLSTIevZoc6yMVZ8YsUHtgmoGB+IJB3xyWHGCSGkoj/ 1/XrLyJ+IosO2tdY80BSlZiQ4A0qOIjtvYpDDKZnZHkQUbOZIt2RFo6DTB7UMPk9eYja FD6lGSALv7EUDJFO3kLg7wgo8rStzzwsS67Kli4yh6njviulpaSjkCsPOAZWmVhnTgSl jQ9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=D+UK0rK6Is18ypW1Rj404n40AP5fsw+f7xgFYEK7v4s=; b=o1fJCeDlDZzlkFRfMYOJXQINFyBg0vOBlJP1J6fK6dDMSJcu6BgkGOCpEJM5vtvx0+ +7RanEutffMNmePV7uGzcij+YoU5r5hzt1Dvlo4ju5q89dwiNtmpTflHy/IUbun2e9RO sBszL0DcN+nWBWE5UL+q05KZEo5tS5BtvIlgIoKMKpoifO4ackrm3orH2brtJ2wzl4Rt 3LFeaNF9cyCIgnWSaQ/v3d+X9j3rYsoIg7TPSk3ODoKQSTgAF7V0AkAuH8ZutOhkyxIK /RefXKgDGnbrvdVYi2EJ7vAZw0cPzuarJ100z8QCz6TMQmWSkTMhdItaRghrxqmFLnmf b8/A== X-Gm-Message-State: APjAAAXlE0yEsLUZwIrlueUtGuyxbZiWYxoty7H1lYlIYq34zoytcTqp 2oWFLxBUiGthrylEdNeWbntZN2XylazajRQTFrC3+lg7 X-Google-Smtp-Source: APXvYqw3FFxcYqcIDriZSn/dp7AhRwG35hLSRMy4enhPNMortGk4mUqH1aIfgwGQs6BZ+PY5pPNhLy1Y6zeO6To4MWw= X-Received: by 2002:a1f:1b07:: with SMTP id b7mr15522022vkb.79.1579113680235; Wed, 15 Jan 2020 10:41:20 -0800 (PST) MIME-Version: 1.0 References: <20190913184432.29753-1-armccurdy@gmail.com> <20190913184432.29753-2-armccurdy@gmail.com> <20190913185910.GK9017@brightrain.aerifal.cx> <20200115154553.GH30412@brightrain.aerifal.cx> In-Reply-To: <20200115154553.GH30412@brightrain.aerifal.cx> From: Andre McCurdy Date: Wed, 15 Jan 2020 10:41:08 -0800 Message-ID: To: musl@lists.openwall.com Content-Type: text/plain; charset="UTF-8" Subject: Re: [musl] [PATCH 2/2] Add big-endian support to ARM assembler memcpy On Wed, Jan 15, 2020 at 7:46 AM Rich Felker wrote: > On Fri, Sep 13, 2019 at 01:38:34PM -0700, Andre McCurdy wrote: > > On Fri, Sep 13, 2019 at 11:59 AM Rich Felker wrote: > > > On Fri, Sep 13, 2019 at 11:44:32AM -0700, Andre McCurdy wrote: > > > > Allow the existing ARM assembler memcpy implementation to be used for > > > > both big and little endian targets. > > > > > > Nice. I don't want to merge this just before release, but as long as > > > it looks ok I should be able to review and merge it afterward. > > > > > > Note that I'd really like to replace this giant file with C using > > > inline asm just for the inner block copies and C for all the flow > > > control, but I don't mind merging this first as long as it's correct. > > > > Sounds good. I'll wait for your feedback after the upcoming release. > > Sorry this dropped off my radar. I'd like to merge at least the thumb > part since it's simple enough to review quickly and users have > actually complained about memcpy being slow on armv7 with -mthumb as > default. Interesting. I wonder what the reference was against which the musl C code was compared? From my own benchmarking I didn't find the musl assembler to be much faster than the C code. There are armv6 and maybe early armv7 CPUs where explicit prefetch instructions make a huge difference (much more so than C -vs- assembler). Did the users who complained about musl memcpy() compare against a memcpy() which uses prefetch? For armv7 using NEON may help, although the latest armv7 cores seem to perform very well with plain old C code too. There are lots of trade offs so it's impossible for a single implementation to be universally optimal. The "arm-mem" routines used on Raspberry Pi seem to be a very fast for many targets, but unfortunately the armv6 memcpy generates mis-aligned accesses so isn't suitable for armv5. https://github.com/bavison/arm-mem/