From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 31978 invoked from network); 11 Feb 2023 09:06:19 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 11 Feb 2023 09:06:19 -0000 Received: (qmail 7349 invoked by uid 550); 11 Feb 2023 09:06:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7313 invoked from network); 11 Feb 2023 09:06:15 -0000 MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ayaya.dev; s=key1; t=1676106363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=siC3bFcU9iJLO/CeUQ0HZXtjG8nlHaaSrvj3V3kLq50=; b=GtZcHqZ8K88/7seJXTPhDUFetVbU/Eu4D3jBiuT3V2uzyeDf2S0v01oxPh9m0VG9g9pROK m7ej0QiwUB6U5Lc8VkTSR8yOicpqggyGvXVdx0jAYryGaQ68M//qbMhOC5YCFXze0X3o0B YRkpRSdEyfU5e8lLvFWYHF6eEd+Ph2o= Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Sat, 11 Feb 2023 10:06:02 +0100 Message-Id: X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: "alice" To: References: <4d290220.36d6.1860222ca46.Coremail.00107082@163.com> <20230201180115.GB2626@voyager> <20230209190316.GU4163@brightrain.aerifal.cx> <75d9cfae.35eb.1863ac4e3c0.Coremail.00107082@163.com> <20230210131044.GZ4163@brightrain.aerifal.cx> <23b37232.4d4c.1863b92aa13.Coremail.00107082@163.com> <20230210141955.GA4163@brightrain.aerifal.cx> <10dbd851.a99.1863ee385b5.Coremail.00107082@163.com> <20230211093936.46b9a2f044052552be38cdb2@zhasha.com> In-Reply-To: <20230211093936.46b9a2f044052552be38cdb2@zhasha.com> X-Migadu-Flow: FLOW_OUT Subject: Re: [musl] Re:Re: [musl] Re:Re: [musl] Re:Re: [musl] Re:Re: [musl] qsort On Sat Feb 11, 2023 at 9:39 AM CET, Joakim Sindholt wrote: > On Sat, 11 Feb 2023 06:44:29 +0100, "alice" wrote: > > based on the glibc profiling, glibc also has their natively-loaded-cpu-= specific > > optimisations, the _avx_ functions in your case. musl doesn't implement= any > > SIMD optimisations, so this is a bit apples-to-oranges unless musl impl= ements > > the same kind of native per-arch optimisation. > >=20 > > you should rerun these with GLIBC_TUNABLES, from something in: > > https://www.gnu.org/software/libc/manual/html_node/Hardware-Capability-= Tunables.html > > which should let you disable them all (if you just want to compare C to= C code). > >=20 > > ( unrelated, but has there been some historic discussion of implementin= g > > something similar in musl? i feel like i might be forgetting somethin= g. ) > > There already are arch-specific asm implementations of functions like > memcpy. apologies, i wasn't quite clear- the difference between src/string/x86_64/memcpy.s and the glibc fiesta is that the latter utilises subarch-specific SIMD (as you explain below), e.g. AVX like in the above benchmarks. a baseline x86_64 asm is more fair-game if the difference= is as significant as it is for memcpy :) i wonder if anyone has tried such baseline-asm for str*, or for non i386/ x86_64 by now. there seems to only be x86 and mips asm in the tree currentl= y (base platform support aside). (purely out of interest of course- i don't have the ability to write such things (yet), and maybe there are some gains more significant than "2.2%" possible with just sse2 for instance.) > As I see it there are 3 issues standing between musl and the > glibc approach of writing a new function every time Intel or AMD > releases a new core design: > 1) ifunc resolvers don't work on statically linked binaries. > 2) If they did it would mean shipping 12 different implementations of > each optimized function, making the binary huge for, for the most > part, no good reason. > 3) The esoteric bug is no longer in memcpy but in either memcpy_c, > memcpy_mmx, memcpy_3dnow, memcpy_sse2, memcpy_sse3, memcpy_ssse3, > memcpy_sse41, memcpy_sse42, memcpy_avx, memcpy_avx2, memcpy_avx512, > or memcpy_amx or whatever else is added in the future in a > never-ending spiral of implementations piling up. 3) is admittedly the worst effect- niche esoteric debugging is worse than "= disk space", and having so many implementations is certainly hard to maintain. > It is my opinion that musl should remain small and concise to allow it > to effectively serve both the "small" and "gotta go fast" markets. I say > both because you can always haul in libreallyreallyfastsort.a/so but you > can't take the 47 qsort/memcpy implementations out of libc. yes, i generally find myself having the same opinion :)