From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2
	autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 4893 invoked from network); 1 Feb 2023 18:01:35 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 1 Feb 2023 18:01:35 -0000
Received: (qmail 32676 invoked by uid 550); 1 Feb 2023 18:01:31 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 32640 invoked from network); 1 Feb 2023 18:01:30 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=s31663417;
	t=1675274476; bh=mdKQduGVme2f2M0aZuBdx+4/0fC4KPb8AThxThQ88Pw=;
	h=X-UI-Sender-Class:Date:From:To:Subject:References:In-Reply-To;
	b=OICrO0ODOfJJby9rEleLevVxi/gBnYqPHFNYy0z2ofE8rgj471kyTbzjp/HuRxLXh
	 bT+cd4jaf/8IsVbCB/VZNEMZRxidMWdlj+AeoWA3/j4KRCWyLLqCIS0cgM9XHjIiN0
	 efblxc5oIAIw+spreiMM64jSgf2/YpOuy6hyEG9Ovq+QZEdS3mmGlGoPs40CVkGf8Y
	 nCrrnXjKu5BSVvZz6z3HWle6g7B02GrsSh0NXFoOnoM67rQlgOWsZuhvTTcK2ZLV/Q
	 hUxCbLcUf8RAmWb2z21sAh7nu0etBb8m/yLGDB0cfhcDf1nMUbA4D9GL/yv0o144KT
	 4Q7sGia+PG8pQ==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Date: Wed, 1 Feb 2023 19:01:15 +0100
From: Markus Wichmann <nullplan@gmx.net>
To: musl@lists.openwall.com
Message-ID: <20230201180115.GB2626@voyager>
References: <CAAEi2GextYuWRK-JKtpCLxewyJ2u380m5+s=M_0P=ZBDxyX-xA@mail.gmail.com>
 <CPX182QJK5D6.2Y2DE5NACZ1I@sumire>
 <4d290220.36d6.1860222ca46.Coremail.00107082@163.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4d290220.36d6.1860222ca46.Coremail.00107082@163.com>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-Provags-ID: V03:K1:QlXdhVbOSSQ6eGZ+JVf73E+X8UCmg+qTpWwig5uRWkfYIyopeES
 zll7o0+B5WyD0XcPxxJlVHEJpTNoFLJPeWMAeD4COr/aVzont3HSre3p4KIGng6mktMH7Id
 RiGFiI8xowWOsGLbeQtZKVGf5mbAEOq8/wTAEawoslkgQBrFr2WmI8ulMhbRyzwcdXSS0st
 K6+jDf7vMWDiDSPoPBrhw==
UI-OutboundReport: notjunk:1;M01:P0:Rkx44kd12lY=;PtBq4iUlKm8PQv7OYdCFKfT14Zd
 jE4xZQhWBX8uvrplTha8FqhfIkbyGyHG2z4VU89ldi1LLghbdsaGytMoAsNFJ0t2/pjcm26VB
 qQAKe3NkrszPdHydO75HwsdqvsUzceX1HpJGDEjfJVYWWazEEY5zolp9bADYWqNMIA1N5F2Z7
 m5PthqIZhzkLCjORNEOO2NRRu+Pwe3SeL4r8q90XoofySRqGD0JY7ACo+YKSZphKxYo3tmFa6
 5ilH/bi0w8hb7x4IlbC+voObvX0sV2ifxZR62oCll/DBppqzZq946sQs8UB2Np96w3IO/cDSp
 xghxMg5KEjyoO21DU/BkjoHCjN4mr+nIw3I4Prvaoro9rCZuVmzV5bL03pAliS8AkKUn6Hl5o
 HxztfEbezY2W4SRqbUhEmPe8Gu7qTwhBVBiXqiC5lYAgm136icX6+vT4rU1dw0lKC42Ze1rab
 cejq5GApzypUwphiDZN5NgLVWLkBdiSbTflAj01zNxRzrK+Iyjh1MsDs9xorYbijtq9iCcIJy
 bGiVTly02O0JO6t5g2Vi31fOREYVk+oRvbDYhEeFyIMeHmtFH6DsM0cUaa6gz6SUjZKgjfxH1
 NGZ2+6QCC7TL7hijGdUPAU/oqD8NIWMpBDEtfWMHRaFawVbq3yBFEc4dhsWd5uOwx+oFDm9gA
 UEusXqwrYcgLTzTMkwvOTNKrMADNwgzi5WLkgnYGh8y+exCP5Thv8OeRIPuJbfadmNj3uFN8H
 pZseRhkAXvzhx540tgqhbe+nN9zqFHNS3HytCpoJ9M+t0jaMRpsv4kvPvP5vwGKdlDmdVEo0O
 +psrL0AZYPnTjS9R6JqVKo2pEmnkY3wuG4CzIwMeWTQWBRiAHrDMnmXKbH/P8T8GJMceXc0Yd
 /ko19OUP6H8u69D0+EV3cxI+SxK74wBwRTv3jWGEhymcbthWubuEBccSi7J6kposvbuF6ouxq
 ugygd1Tt90OPAwn3+pTVL4MG3r0=
Content-Transfer-Encoding: quoted-printable
Subject: Re: [musl] Re:Re: [musl] qsort

On Mon, Jan 30, 2023 at 06:04:39PM +0800, David Wang wrote:
> WIth glibc,  average report is  "qsort:3351271634  vs   sort:4358412048"
> But on alpine,  it is "qsort:9585927992  vs   sort:4354856330"
>
> On average, qsort with musl is significantly slower than c++ sort,
> while with glibc the average performance of qsort is bettern than c++
> sort.
>

It seems to me as though smoothsort does not optimize for the number of
comparisons.

> Is there any story about the implementation of qsort in musl?  I feel
> it focused on performance improvement for some special kind of domain,
> but not clear what it is.
>

Smoothsort has the desirable property of being adaptive. Its runtime
gets closer to O(n) the more sorted the input already is. glibc uses
mergesort or quicksort (the latter as fallback) and neither of them has
that property. Plus, mergesort requires scratch storage and has a
significantly harder time sorting arrays with large elements, because
you end up constantly copying stuff. glibc tries to mitigate this by
indirectly sorting once the elements go above 32 bytes in size.

Basically, glibc is optimized for the comparisons, musl more for the
number of swaps. Although we really shouldn't loose sight of the
compares entirely, since those are indirect function calls, and current
processors seem to dislike those.

Anyway, there is an inefficiency in musl's smoothsort implementation,
namely in the sift() function:


| if(cmp(ar[0], lf, arg) >=3D 0 && cmp(ar[0], rt, arg) >=3D 0) {
| 	break;
| }
| if(cmp(lf, rt, arg) >=3D 0) {
| 	ar[i++] =3D lf;
| 	head =3D lf;
| 	pshift -=3D 1;
| } else {
| 	ar[i++] =3D rt;
| 	head =3D rt;
| 	pshift -=3D 2;
| }

As you can see, this does two or three comparisions, but actually needs
to do only two in any case: If we detect ar[0] >=3D lf && ar[0] < rt, then
by transitivity we have also detected rt > lf, so there is no need to
compare lf and rt directly anymore. I have not really found a good way
to formulate code that takes advantage of that, maybe something like
this?


		if(cmp(ar[0], lf, arg) < 0) {
			if(cmp(lf, rt, arg) >=3D 0) {
				ar[i++] =3D lf;
				head =3D lf;
				pshift -=3D 1;
			} else {
go_right:
				ar[i++] =3D rt;
				head =3D rt;
				pshift -=3D 2;
			}
		} else if (cmp(ar[0], rt, arg) < 0) {
			goto go_right;
		} else {
			break;
		}

Yeah, not great. But it does use only two compares ever. This ought to
lower the number of compares in your test somewhat.

HTH,
Markus