```mailing list of musl libc
help / color / mirror / code / Atom feed```
```From: "David Wang" <00107082@163.com>
To: musl@lists.openwall.com
Subject: Re: [musl] qsort
Date: Thu, 16 Feb 2023 23:15:24 +0800 (CST)	[thread overview]
Message-ID: <45a265c9.2f3.1865acb64f0.Coremail.00107082@163.com> (raw)

At 2023-02-02 02:01:15, "Markus Wichmann" <nullplan@gmx.net> wrote:

>Anyway, there is an inefficiency in musl's smoothsort implementation,
>namely in the sift() function:
>
>
>| if(cmp(ar[0], lf, arg) >= 0 && cmp(ar[0], rt, arg) >= 0) {
>| 	break;
>| }
>| if(cmp(lf, rt, arg) >= 0) {
>| 	ar[i++] = lf;
>| 	pshift -= 1;
>| } else {
>| 	ar[i++] = rt;
>| 	pshift -= 2;
>| }
>
>As you can see, this does two or three comparisions, but actually needs
>to do only two in any case: If we detect ar[0] >= lf && ar[0] < rt, then
>by transitivity we have also detected rt > lf, so there is no need to
>compare lf and rt directly anymore. I have not really found a good way
>to formulate code that takes advantage of that, maybe something like
>this?
>
>
>		if(cmp(ar[0], lf, arg) < 0) {
>			if(cmp(lf, rt, arg) >= 0) {
>				ar[i++] = lf;
>				pshift -= 1;
>			} else {
>go_right:
>				ar[i++] = rt;
>				pshift -= 2;
>			}
>		} else if (cmp(ar[0], rt, arg) < 0) {
>			goto go_right;
>		} else {
>			break;
>		}
>
>Yeah, not great. But it does use only two compares ever. This ought to
>lower the number of compares in your test somewhat.
>
>HTH,
>Markus

I tried this change, but improvement is not much.
When sorting 1<<20 random items, my test shows average count of comparisons decrease from 2.733*nlogn to 2.717*nlogn, improved less than 1%
Three comparisons, in the original code ,only happen when parent is larger than its left child, and smaller then its right child, if (parent, left, right) are independent, the probability for this order right>parent>left is 1/6,  the expected comparison would be 3*1/6+2*5/6 = 13/6 for the original code, and we can get 1/13 improvement, but heapsort procedure may make the probability much less than 1/6 since every round a leaf node is put to the head, and so I got less than 1% improvement.

Strange thing is, I made a textbook implementation of heapsort as in following patch, optimized for size 8, the runtime and comparison improved a lot.

The code patch is
diff --git a/src/stdlib/qsort_nr.c b/src/stdlib/qsort_nr.c
index 8ffe71d0..710b1aaa 100644
--- a/src/stdlib/qsort_nr.c
+++ b/src/stdlib/qsort_nr.c
@@ -1,4 +1,5 @@
#define _BSD_SOURCE
+#include <stdint.h>
#include <stdlib.h>

typedef int (*cmpfun)(const void *, const void *);
@@ -8,7 +9,36 @@ static int wrapper_cmp(const void *v1, const void *v2, void *cmp)
return ((cmpfun)cmp)(v1, v2);
}

+static void _heapifyx2(uint64_t *base, int s, int n, cmpfun cmp) {
+	int ms, ns;
+	uint64_t t;
+	while(1) {
+		ns=s*2+1;
+		if (ns>=n) break;
+		ms=s;
+		if (cmp(base+ms, base+ns)<0) ms=ns;
+		ns++;
+		if (ns<n&&cmp(base+ms, base+ns)<0) ms=ns;
+		if (s==ms) break;
+		t=base[ms]; base[ms]=base[s]; base[s]=t;
+		s=ms;
+	}
+}
+
+static void _hsortx2(uint64_t *base, size_t n, cmpfun cmp) {
+	int i;
+	uint64_t t;
+	for (i=n/2; i>=0; i--) _heapifyx2(base, i, n, cmp);
+	for (n--; n>0; n--) {
+		t=base[0]; base[0]=base[n]; base[n]=t;
+		_heapifyx2(base, 0, n, cmp);
+	}
+}
+
void qsort(void *base, size_t nel, size_t width, cmpfun cmp)
{
+	if (width==8) {
+		return _hsortx2(base, nel, cmp);
+	}
__qsort_r(base, nel, width, wrapper_cmp, (void *)cmp);
}

And the test code is:
-----------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define MAXM 20
#define MAXN 1<<MAXM

long long count=0;
typedef struct { int k, v; } VNode;
VNode vs[MAXN];
int mycmp(const void *a, const void *b) {
count++;
return ((const VNode*)a)->v - ((const VNode*)b)->v;
}

int main() {
int i, j, k, n;
double f;
VNode t;
srand(time(NULL));
for (k=0; k<256; k++) {
for (i=0; i<MAXN; i++) vs[i].v=i;
for (n=MAXN; n>1; n--) {
i=n-1; j=rand()%n;
if (i!=j) { t=vs[i]; vs[i]=vs[j]; vs[j]=t; }
}
qsort(vs, MAXN, sizeof(vs[0]), mycmp);
for (i=0; i<MAXN; i++) if (vs[i].v!=i) { printf("error\n") ;return 1; }
}
f=count; f/=k; f/=MAXN; f/=MAXM; printf("count: %.3f*nlogn\n", f);
return 0;
}
-----------------------

Here is what I collect
+------------+-----------+-------------+
|            |  runtime  |  comparison |
+------------+-----------+-------------+
|   glibc    | 0m32.093s | 0.937*nlogn |
|  heapsort  | 0m54.708s | 1.846*nlogn |
| smoothsort | 1m13.137s | 2.733*nlogn | <-- this version of smoothsort has optimized for 8byte data item.
+------------+-----------+-------------+

I think smoothsort has worse average performance than heapsort based on the number of comparisons made.
But the ~2*nlogn comparison of heapsort could not beat ~1*nlogn of merge sort, used by glibc.

> On Thu, 9 Feb 2023, Rich Felker wrote:
>
>Before we consider any such change though we should probably see
>profiling data to determine where the time is being spent in
>smoothsort, in case it's stupid stuff that's fixable.
>
>Rich

I think, optimize for small size 4/8/16 is practical, since feeding large item to qsort is less efficient than feeding an index or address to qsort, in my opinion,  a typical structure for sorting would be  { long long k, v; } or something alike, where k is index/address referring to the real data. The performance improved a lot if optimize out those memcpy call.
And further improvement could be made, for the average performance, if switch back to heapsort.
(Hope someone else can help confirm those two above).

But heapsort in theory is always significantly slower than mergesort, I think it should be about 2 time slower when using full set of benchmark tests

David

```

```next prev parent reply	other threads:[~2023-02-16 15:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20  1:49 Guy
2023-01-20 12:55 ` alice
2023-01-30 10:04   ` [musl] " David Wang
2023-02-01 18:01     ` Markus Wichmann
2023-02-02  2:12       ` [musl] " David Wang
2023-02-03  5:22         ` [musl] " David Wang
2023-02-03  8:03           ` Alexander Monakov
2023-02-03  9:01             ` [musl] " David Wang
2023-02-09 19:03       ` Rich Felker
2023-02-09 19:20         ` Alexander Monakov
2023-02-09 19:52           ` Rich Felker
2023-02-09 20:18             ` Rich Felker
2023-02-09 20:27               ` Pierpaolo Bernardi
2023-02-10  4:10             ` Markus Wichmann
2023-02-10 10:00         ` [musl] " David Wang
2023-02-10 13:10           ` Rich Felker
2023-02-10 13:45             ` [musl] " David Wang
2023-02-10 14:19               ` Rich Felker
2023-02-11  5:12                 ` [musl] " David Wang
2023-02-11  5:44                   ` alice
2023-02-11  8:39                     ` Joakim Sindholt
2023-02-11  9:06                       ` alice
2023-02-11  9:31                         ` [musl] " David Wang
2023-02-11 13:35                         ` Rich Felker
2023-02-11 17:18                           ` David Wang
2023-02-16 15:15       ` David Wang [this message]
2023-02-16 16:07         ` Rich Felker
2023-02-17  1:35           ` [musl] " David Wang
2023-02-17 13:17           ` Alexander Monakov
2023-02-17 15:07             ` Rich Felker
2023-02-11  9:22     ` [musl] " Markus Wichmann
2023-02-11  9:36       ` [musl] " David Wang
2023-02-11  9:51       ` David Wang
2023-01-20 13:32 ` [musl] qsort Valery Ushakov
```

```Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,

Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

switches of git-send-email(1):

git send-email \
--to=00107082@163.com \
--cc=musl@lists.openwall.com \

https://kernel.org/pub/software/scm/git/docs/git-send-email.html

```Code repositories for project(s) associated with this public inbox