From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 1891 invoked from network); 11 Feb 2023 13:35:49 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 11 Feb 2023 13:35:49 -0000 Received: (qmail 5151 invoked by uid 550); 11 Feb 2023 13:35:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 4094 invoked from network); 11 Feb 2023 13:35:46 -0000 Date: Sat, 11 Feb 2023 08:35:33 -0500 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20230211133532.GD4163@brightrain.aerifal.cx> References: <20230201180115.GB2626@voyager> <20230209190316.GU4163@brightrain.aerifal.cx> <75d9cfae.35eb.1863ac4e3c0.Coremail.00107082@163.com> <20230210131044.GZ4163@brightrain.aerifal.cx> <23b37232.4d4c.1863b92aa13.Coremail.00107082@163.com> <20230210141955.GA4163@brightrain.aerifal.cx> <10dbd851.a99.1863ee385b5.Coremail.00107082@163.com> <20230211093936.46b9a2f044052552be38cdb2@zhasha.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Re:Re: [musl] Re:Re: [musl] Re:Re: [musl] Re:Re: [musl] qsort On Sat, Feb 11, 2023 at 10:06:02AM +0100, alice wrote: > On Sat Feb 11, 2023 at 9:39 AM CET, Joakim Sindholt wrote: > > On Sat, 11 Feb 2023 06:44:29 +0100, "alice" wrote: > > > based on the glibc profiling, glibc also has their natively-loaded-cpu-specific > > > optimisations, the _avx_ functions in your case. musl doesn't implement any > > > SIMD optimisations, so this is a bit apples-to-oranges unless musl implements > > > the same kind of native per-arch optimisation. > > > > > > you should rerun these with GLIBC_TUNABLES, from something in: > > > https://www.gnu.org/software/libc/manual/html_node/Hardware-Capability-Tunables.html > > > which should let you disable them all (if you just want to compare C to C code). > > > > > > ( unrelated, but has there been some historic discussion of implementing > > > something similar in musl? i feel like i might be forgetting something. ) > > > > There already are arch-specific asm implementations of functions like > > memcpy. > > apologies, i wasn't quite clear- the difference > between src/string/x86_64/memcpy.s and the glibc fiesta is that the latter > utilises subarch-specific SIMD (as you explain below), e.g. AVX like in the > above benchmarks. a baseline x86_64 asm is more fair-game if the difference is > as significant as it is for memcpy :) Folks are missing the point here. It's not anything to do with AVX or even glibc's memcpy making glibc faster here. Rather, it's that glibc is *not calling memcpy* for 4-byte (and likely a bunch of other specialized cases) element sizes. Either they manually special-case them, or the compiler (due to lack of -ffreestanding and likely -O3 or something) is inlining the memcpy. Based on the profiling data, I would predict an instant 2x speed boost special-casing small sizes to swap directly with no memcpy call. Incidentally, our memcpy is almost surely at least as fast as glibc's for 4-byte copies. It's very large sizes where performance is likely to diverge. Rich