From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,RDNS_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 Received: (qmail 2816 invoked from network); 26 Mar 2020 01:49:26 -0000 Received-SPF: pass (mother.openwall.net: domain of lists.openwall.com designates 195.42.179.200 as permitted sender) receiver=inbox.vuxu.org; client-ip=195.42.179.200 envelope-from= Received: from unknown (HELO mother.openwall.net) (195.42.179.200) by inbox.vuxu.org with ESMTP; 26 Mar 2020 01:49:26 -0000 Received: (qmail 11613 invoked by uid 550); 26 Mar 2020 01:49:24 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 11591 invoked from network); 26 Mar 2020 01:49:24 -0000 Date: Wed, 25 Mar 2020 21:49:11 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200326014911.GA11469@brightrain.aerifal.cx> References: <20200325214544.GT14278@port70.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200325214544.GT14278@port70.net> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH] aarch64: add optimized memcpy, memmove and memset On Wed, Mar 25, 2020 at 10:45:45PM +0100, Szabolcs Nagy wrote: > minimal edits to upstream version for easier updates > and because this code was benchmarked across many cores. > > gcc generates slow code for the current c implementations. > > the integer memcpy was chosen instead of the simd one, > this performs better on little cores, i think this is > the more conservative choice for now. I think this was discussed before on IRC, and I'm not particularly opposed to these especially since aarch64 is one of the most important archs these days. However I would really like to avoid adding more asm source files with the function flow written in asm when the only thing that really needs to benefit from asm is the inner loop body. I know nothing has happened on this front since we last talked about it, so it's very possible that the answer is just "we need something with decent performance in the short term and nobody has cycles to spend on doing it better right now and so we we should just use the asm files"... > note: there are upcoming security architectures which > may mean updates to these functions (BTI - landing pads, > PAUTH - return address signing, MTE - 16byte tag granule > may affect optimized strcmp etc, not relevant yet), but > runtime support for these will need other libc changes. If these mattered they'd be another reason to prefer having the function in C with minimal inline asm or just extensions for unaligned loads/stores, but MTE is the only one of these that's interesting and it doesn't conflict with any current code in musl at all (nothing does unaligned overreads; they have to be assumed to be able to fault anyway). Rich