From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 3742 invoked from network); 16 May 2020 03:40:38 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 16 May 2020 03:40:38 -0000 Received: (qmail 8004 invoked by uid 550); 16 May 2020 03:40:32 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7986 invoked from network); 16 May 2020 03:40:31 -0000 Date: Fri, 15 May 2020 23:40:19 -0400 From: Rich Felker To: musl@lists.openwall.com Cc: Srinivasan J Message-ID: <20200516034016.GP21576@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] memcpy/memcmp optimizations and possible alternatives On Sat, May 16, 2020 at 08:37:34AM +0530, Srinivasan J wrote: > Hi, > I see that the musl-libc implementations of mem* functions does not > use AVX/SSE4 instructions. Please let me know if there is a plan to > support these or if there is a recommended way to use optimized > versions of mem* functions when running memory intensive applications > with musl-libc. The application is being run in alpine linux. You can safely replace memcpy/memcmp with another implementation just by linking it in your program, as long as it matches the contract of the standard function and GCC's additional requirements on the function. On modern x86, "rep mov" is, or at least until recently was, supposed to be the fastest way to memcpy. But AMD and Intel like to keep making regressions in this area so it's possible it's not true anymore. I haven't kept up with it the past few years. At least some of the possible alternatives have properties that might make them slower depending on use or even unsafe with respect to weakly ordered memory operations. It's possible that if there are reliably faster approaches, we'll integrate them at some point. Probably not with any runtime selection, although it's inexpensive to do the selection in a code path only used for large n, so it might be ok. Note that our memcmp right now is really bad, and should be optimized.