From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 1891 invoked from network); 11 Feb 2023 13:35:49 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 11 Feb 2023 13:35:49 -0000
Received: (qmail 5151 invoked by uid 550); 11 Feb 2023 13:35:46 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 4094 invoked from network); 11 Feb 2023 13:35:46 -0000
Date: Sat, 11 Feb 2023 08:35:33 -0500
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Message-ID: <20230211133532.GD4163@brightrain.aerifal.cx>
References: <20230201180115.GB2626@voyager>
 <20230209190316.GU4163@brightrain.aerifal.cx>
 <75d9cfae.35eb.1863ac4e3c0.Coremail.00107082@163.com>
 <20230210131044.GZ4163@brightrain.aerifal.cx>
 <23b37232.4d4c.1863b92aa13.Coremail.00107082@163.com>
 <20230210141955.GA4163@brightrain.aerifal.cx>
 <10dbd851.a99.1863ee385b5.Coremail.00107082@163.com>
 <CQFHTXNLAUOI.2OHPT1JEJF27G@sumire>
 <20230211093936.46b9a2f044052552be38cdb2@zhasha.com>
 <CQFM48UU024L.3F72QJSEDJMQ@sumire>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CQFM48UU024L.3F72QJSEDJMQ@sumire>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] Re:Re: [musl] Re:Re: [musl] Re:Re: [musl] Re:Re: [musl]
 qsort

On Sat, Feb 11, 2023 at 10:06:02AM +0100, alice wrote:
> On Sat Feb 11, 2023 at 9:39 AM CET, Joakim Sindholt wrote:
> > On Sat, 11 Feb 2023 06:44:29 +0100, "alice" <alice@ayaya.dev> wrote:
> > > based on the glibc profiling, glibc also has their natively-loaded-cpu-specific
> > > optimisations, the _avx_ functions in your case. musl doesn't implement any
> > > SIMD optimisations, so this is a bit apples-to-oranges unless musl implements
> > > the same kind of native per-arch optimisation.
> > > 
> > > you should rerun these with GLIBC_TUNABLES, from something in:
> > > https://www.gnu.org/software/libc/manual/html_node/Hardware-Capability-Tunables.html
> > > which should let you disable them all (if you just want to compare C to C code).
> > > 
> > > ( unrelated, but has there been some historic discussion of implementing
> > >   something similar in musl? i feel like i might be forgetting something. )
> >
> > There already are arch-specific asm implementations of functions like
> > memcpy.
> 
> apologies, i wasn't quite clear- the difference
> between src/string/x86_64/memcpy.s and the glibc fiesta is that the latter
> utilises subarch-specific SIMD (as you explain below), e.g. AVX like in the
> above benchmarks. a baseline x86_64 asm is more fair-game if the difference is
> as significant as it is for memcpy :)

Folks are missing the point here. It's not anything to do with AVX or
even glibc's memcpy making glibc faster here. Rather, it's that glibc
is *not calling memcpy* for 4-byte (and likely a bunch of other
specialized cases) element sizes. Either they manually special-case
them, or the compiler (due to lack of -ffreestanding and likely -O3 or
something) is inlining the memcpy.

Based on the profiling data, I would predict an instant 2x speed boost
special-casing small sizes to swap directly with no memcpy call.

Incidentally, our memcpy is almost surely at least as fast as glibc's
for 4-byte copies. It's very large sizes where performance is likely
to diverge.

Rich