From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 32146 invoked from network); 28 Aug 2021 19:53:47 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 28 Aug 2021 19:53:47 -0000 Received: (qmail 26210 invoked by uid 550); 28 Aug 2021 19:53:45 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 26190 invoked from network); 28 Aug 2021 19:53:44 -0000 Date: Sat, 28 Aug 2021 15:53:31 -0400 From: Rich Felker To: tugouxp <13824125580@163.com> Cc: musl@lists.openwall.com Message-ID: <20210828195330.GZ13220@brightrain.aerifal.cx> References: <4e37b6c4.19be.17b8bc750df.Coremail.13824125580@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4e37b6c4.19be.17b8bc750df.Coremail.13824125580@163.com> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Why the musl libc did not support neon simd acceleartor officially on mem* operations? On Sat, Aug 28, 2021 at 04:01:40PM +0800, tugouxp wrote: > HI guys: > I found that the current implmention of musl arm port memcpy.S and > other mem*.S operations did not use arm neon instructions, this > seems differenct with other counterparts like newlibc, glibc and > bonic libc, which all impl. the neon version of mem* operations. so > could you tell me why? is there and concern about on this in musl? > if i want to imple my self imple. how to do this, is there any > matual pathches to use? Generally we don't have any significant asm implementations that depend on non-baseline extensions to the ISA. The same is true for x86 where no sse/avx is used. The asm files we have for arm are already way too large and complex, with all the high level flow gratuitously written in asm. Ideally at some point we will refactor that to have all the high level logic in C and just core block copy primitives provided by the archs. Whether this would cleanly admit using methods only known to be available at runtime I'm not sure. Do you know what performance difference you're missing out on by not having neon? At what block sizes does it matter? Rich