[RFC] new qsort implementation

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Bobby Bingham <koorogi@koorogi.info>
To: musl@lists.openwall.com
Subject: [RFC] new qsort implementation
Date: Mon, 1 Sep 2014 02:12:43 -0500	[thread overview]
Message-ID: <20140901071243.GA6828@duality.lan> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 4034 bytes --]

Hi all,

As I mentioned a while back on IRC, I've been looking into wikisort[1]
and grailsort[2] to see if either of them would be a good candidate for
use as musl's qsort.

The C-language reference implementations of both of these algorithms are
inappropriate, as they're both very large, not type-agnostic, and are
not written in idiomatic C.  Some of the complexity of both comes from
the fact that they are stable sorts, which qsort is not required to be.

Attached is an implementation of qsort based on a predecessor of the
paper that grailsort is based on, which describes an unstable block
merge sort.  The paper is available at [3].  My algorithm deviates a
little bit from what the paper describes, but it should be pretty easy
to follow.

You can find my test program with this algorithm and others at [4].
Some of the implementations included are quicksort variants, so the
"qsort-killer" testcase will trigger quadratic behavior in them.  If you
want to run this you should consider reducing the maximum input size in
testcases.h, disabling the qsort-killer input at the bottom of
testcases.c, or disabling the affected sort algorithms ("freebsd",
"glibc quicksort", and depending on your libc, "system") in sorters.c.

Here are the numbers comparing musl's current smoothsort with the
attached grailsort code for various input patterns and sizes.  The test
was run on x86_64, compiled with gcc 4.8.3 at -Os:

                              random              sorted             reverse            constant        sorted+noise       reverse+noise        qsort-killer    elements
                     compares     ms     compares     ms     compares     ms     compares     ms     compares     ms     compares     ms     compares     ms
musl smoothsort        327818     11        19976      0       268152      8        19976      0       142608      5       289637      8       139090      4       10000
                      4352267     97       199971      2      3327332     59       199971      2      2479200     45      3826143     68      1803634     37      100000
                     54441776    945      1999963     29     40048748    663      1999963     27     32506944    577     47405848    798     21830972    411     1000000
                    652805234  12300     19999960    289    465600753   7505     19999960    293    402201458   6891    572755136   9484    259691645   4741    10000000

grailsort              161696      2        71024      0        41110      0        28004      0       143195      1       125943      1        89027      0       10000
                      1993908     27       753996      2       412840      5       270727      3      1818802     15      1640569     17      1064759     10      100000
                     23428984    330      7686249     27      4177007     74      2729965     41     21581351    192     19909192    211     12325132    119     1000000
                    266520949   3884     75927601    277     42751315    901     28243939    436    248048604   2343    232357446   2575    139177031   1368    10000000

As far as code size, here are before and after sizes as reported by
size(1) for bsearch.o, qsort.o, and a minimal statically linked program
using qsort:

                before      after
    bsearch.o      116        160
    qsort.o       1550       1242
    statictest    2950       2819

At -O2, the before and after sizes show the same basic pattern.  At -O3,
gcc performs more aggressive inlining on the grailsort code, and it
balloons to more than twice the size of musl's current code.

For the sake of soft-float targets, I'd also like to look at replacing
the call to sqrt with an integer approximation.  But before I go much
further, I'd like to post the code here and get some feedback.

1. https://github.com/BonzaiThePenguin/WikiSort
2. https://github.com/Mrrl/GrailSort
3. http://akira.ruc.dk/~keld/teaching/algoritmedesign_f04/Artikler/04/Huang88.pdf
4. http://git.koorogi.info/cgit/qsort_compare/

--
Bobby Bingham

[-- Attachment #1.2: bsearch.c --]
[-- Type: text/x-c, Size: 697 bytes --]

#include <stdlib.h>
#include <limits.h>

size_t __bsearch(const void *key, const void *base, size_t nel, size_t width, int (*cmp)(const void *, const void *))
{
	size_t baseidx = 0, tryidx;
	void *try;
	int sign;

	while (nel > 0) {
		tryidx = baseidx + nel/2;
		try = (char*)base + tryidx*width;
		sign = cmp(key, try);
		if (!sign) return tryidx;
		else if (sign < 0)
			nel /= 2;
		else {
			baseidx = tryidx + 1;
			nel -= nel/2 + 1;
		}
	}

	return ~baseidx;
}

void *bsearch(const void *key, const void *base, size_t nel, size_t width, int (*cmp)(const void *, const void *))
{
	size_t idx = __bsearch(key, base, nel, width, cmp);
	return idx > SSIZE_MAX ? NULL : (char*)base + idx*width;
}

[-- Attachment #1.3: qsort.c --]
[-- Type: text/x-c, Size: 4853 bytes --]

#include <math.h>
#include <stddef.h>
#include <stdint.h>
#include <limits.h>

typedef int (*cmpfun)(const void *, const void *);
size_t __bsearch(const void *, const void *, size_t, size_t, cmpfun);

static inline size_t bsearch_pos(const void *key, const void *haystack, size_t nmel, size_t width, cmpfun cmp)
{
	size_t pos = __bsearch(key, haystack, nmel, width, cmp);
	return pos > SSIZE_MAX ? ~pos : pos;
}

static inline void *tail(char *base, size_t nmel, size_t width)
{
	return base + (nmel-1) * width;
}

static void swap(char *a, char *b, size_t width)
{
#ifdef __GNUC__
	typedef uint32_t __attribute__((__may_alias__)) u32;

	if ((uintptr_t)a % 4 == 0 && (uintptr_t)b % 4 == 0) {
		for (; width >= 4; width -= 4) {
			uint32_t tmp = *((u32*)a);
			*((u32*)a) = *((u32*)b);
			*((u32*)b) = tmp;
			a += 4;
			b += 4;
		}
	}
#endif

	while (width--) {
		char tmp = *a;
		*a++ = *b;
		*b++ = tmp;
	}
}

static void rotate(char *base, size_t size, size_t shift)
{
	int dir = 1;
	while (shift) {
		while (2*shift <= size) {
			swap(base, base + dir*shift, shift);
			size -= shift;
			base += shift*dir;
		}
		shift = size - shift;
		base  = dir > 0 ? base + size - shift : base - shift;
		dir  *= -1;
	}
}

static void distribute_buffer(char *base, size_t bufnmel, size_t sortnmel, size_t width, cmpfun cmp)
{
	while (bufnmel) {
		char  *sorted    = base + bufnmel * width;
		size_t insertpos = bsearch_pos(base, sorted, sortnmel, width, cmp);

		if (insertpos > 0)
			rotate(base, (bufnmel + insertpos) * width, bufnmel * width);

		base     += (insertpos + 1) * width;
		bufnmel  -= 1;
		sortnmel -= insertpos;
	}
}

#define MAX_SORTNET 8

static const uint8_t sortnet[][2] = {
	/* 0: index =  0 */
	/* 1: index =  0 */
	/* 2: index =  0 */
	{0,1},
	/* 3: index =  1 */
	{0,1}, {0,2}, {1,2},
	/* 4: index =  4 */
	{0,1}, {2,3}, {0,2}, {1,3}, {1,2},
	/* 5: index =  9 */
	{0,1}, {3,4}, {2,4}, {2,3}, {1,4}, {0,3}, {0,2}, {1,3}, {1,2},
	/* 6: index = 18 */
	{1,2}, {4,5}, {0,2}, {3,5}, {0,1}, {3,4}, {2,5}, {0,3}, {1,4}, {2,4},
	{1,3}, {2,3},
	/* 7: index = 30 */
	{1,2}, {3,4}, {5,6}, {0,2}, {3,5}, {4,6}, {0,1}, {4,5}, {2,6}, {0,4},
	{1,5}, {0,3}, {2,5}, {1,3}, {2,4}, {2,3},
	/* 8: index = 46 */
	{0,1}, {2,3}, {4,5}, {6,7}, {0,2}, {1,3}, {4,6}, {5,7}, {1,2}, {5,6},
	{0,4}, {3,7}, {1,5}, {2,6}, {1,4}, {3,6}, {2,4}, {3,5}, {3,4},
	/* 9: index = 65 */
};

static const uint8_t sortnet_index[] = { 0, 0, 0, 1, 4, 9, 18, 30, 46, 65 };

static void sorting_network(char *base, size_t nmel, size_t width, cmpfun cmp)
{
	for (int i = sortnet_index[nmel]; i < sortnet_index[nmel+1]; i++) {
		char *elem1 = base + sortnet[i][0] * width;
		char *elem2 = base + sortnet[i][1] * width;
		if (cmp(elem1, elem2) > 0) swap(elem1, elem2, width);
	}
}

/* get index of last block whose head is less than the previous block's tail */
static size_t last_overlap(char *base, size_t bcount, size_t bwidth, size_t width, cmpfun cmp)
{
	for (char *cur = tail(base, bcount, bwidth); --bcount; cur -= bwidth)
		if (cmp(cur - width, cur) > 0) break;
	return bcount;
}

void merge(char *buf, char *base, size_t anmel, size_t bnmel, size_t width, cmpfun cmp)
{
	char *a = buf;
	char *b = base + anmel * width;

	size_t skip = bsearch_pos(b, base, anmel, width, cmp);
	anmel -= skip;
	base  += skip * width;

	swap(base, a, anmel * width);

	while (anmel && bnmel) {
		if (cmp(a, b) <= 0) { swap(base, a, width); a += width; anmel--; }
		else                { swap(base, b, width); b += width; bnmel--; }
		base += width;
	}

	swap(base, a, anmel * width);
}

void qsort(void *unsorted, size_t nmel, size_t width, cmpfun cmp)
{
	char *base = unsorted;

	if (nmel <= MAX_SORTNET) {
		sorting_network(base, nmel, width, cmp);
		return;
	}

	size_t blknmel = sqrt(nmel);               /* elements in a block */
	size_t bufnmel = blknmel + nmel % blknmel; /* elements in the buffer */
	size_t bwidth  = blknmel * width;          /* size of a block in bytes */
	size_t blocks  = nmel / blknmel - 1;       /* number of blocks in a + b */
	size_t acount  = blocks / 2;
	size_t bcount  = blocks - acount;

	char *a = base + bufnmel * width;
	char *b = a    + acount  * bwidth;

	qsort(a, acount * blknmel, width, cmp);
	qsort(b, bcount * blknmel, width, cmp);

	/* if already sorted, nothing to do */
	if (cmp(tail(a, acount * blknmel, width), b) <= 0)
		goto distribute;

	/* sort all the a and b blocks together by their head elements */
	qsort(a, blocks, bwidth, cmp);

	/* merge, starting from the end and working towards the beginning */
	while (blocks > 1) {
		size_t overlap = last_overlap(a, blocks, bwidth, width, cmp);
		if (overlap > 0)
			merge(base, tail(a, overlap, bwidth), blknmel, (blocks - overlap) * blknmel, width, cmp);
		blocks = overlap;
	}

distribute:
	qsort(base, bufnmel, width, cmp);
	distribute_buffer(base, bufnmel, nmel - bufnmel, width, cmp);
}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

next             reply	other threads:[~2014-09-01  7:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-01  7:12 Bobby Bingham [this message]
2014-09-01 11:17 ` Szabolcs Nagy
2014-09-01 18:20   ` Bobby Bingham
2014-09-01 20:53     ` Rich Felker
2014-09-01 21:46       ` Bobby Bingham
2014-09-01 11:25 ` Alexander Monakov
2014-09-01 18:27   ` Bobby Bingham
2023-02-17 15:51 ` [musl] " Rich Felker
2023-02-17 22:53   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140901071243.GA6828@duality.lan \
    --to=koorogi@koorogi.info \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).