From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6032 Path: news.gmane.org!not-for-mail From: Bobby Bingham Newsgroups: gmane.linux.lib.musl.general Subject: [RFC] new qsort implementation Date: Mon, 1 Sep 2014 02:12:43 -0500 Message-ID: <20140901071243.GA6828@duality.lan> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="b5gNqxB1S1yM7hjW" X-Trace: ger.gmane.org 1409555905 4964 80.91.229.3 (1 Sep 2014 07:18:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 1 Sep 2014 07:18:25 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-6039-gllmg-musl=m.gmane.org@lists.openwall.com Mon Sep 01 09:18:18 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XOLsW-0003IK-Mq for gllmg-musl@plane.gmane.org; Mon, 01 Sep 2014 09:18:16 +0200 Original-Received: (qmail 3114 invoked by uid 550); 1 Sep 2014 07:18:14 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 3101 invoked from network); 1 Sep 2014 07:18:13 -0000 Content-Disposition: inline X-Operating-System: Linux duality 3.13.5-gentoo User-Agent: Mutt/1.5.22 (2013-10-16) Xref: news.gmane.org gmane.linux.lib.musl.general:6032 Archived-At: --b5gNqxB1S1yM7hjW Content-Type: multipart/mixed; boundary="G4iJoqBmSsgzjUCe" Content-Disposition: inline --G4iJoqBmSsgzjUCe Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Hi all, As I mentioned a while back on IRC, I've been looking into wikisort[1] and grailsort[2] to see if either of them would be a good candidate for use as musl's qsort. The C-language reference implementations of both of these algorithms are inappropriate, as they're both very large, not type-agnostic, and are not written in idiomatic C. Some of the complexity of both comes from the fact that they are stable sorts, which qsort is not required to be. Attached is an implementation of qsort based on a predecessor of the paper that grailsort is based on, which describes an unstable block merge sort. The paper is available at [3]. My algorithm deviates a little bit from what the paper describes, but it should be pretty easy to follow. You can find my test program with this algorithm and others at [4]. Some of the implementations included are quicksort variants, so the "qsort-killer" testcase will trigger quadratic behavior in them. If you want to run this you should consider reducing the maximum input size in testcases.h, disabling the qsort-killer input at the bottom of testcases.c, or disabling the affected sort algorithms ("freebsd", "glibc quicksort", and depending on your libc, "system") in sorters.c. Here are the numbers comparing musl's current smoothsort with the attached grailsort code for various input patterns and sizes. The test was run on x86_64, compiled with gcc 4.8.3 at -Os: random sorted reverse constant sorted+noise reverse+noise qsort-killer elements compares ms compares ms compares ms compares ms compares ms compares ms compares ms musl smoothsort 327818 11 19976 0 268152 8 19976 0 142608 5 289637 8 139090 4 10000 4352267 97 199971 2 3327332 59 199971 2 2479200 45 3826143 68 1803634 37 100000 54441776 945 1999963 29 40048748 663 1999963 27 32506944 577 47405848 798 21830972 411 1000000 652805234 12300 19999960 289 465600753 7505 19999960 293 402201458 6891 572755136 9484 259691645 4741 10000000 grailsort 161696 2 71024 0 41110 0 28004 0 143195 1 125943 1 89027 0 10000 1993908 27 753996 2 412840 5 270727 3 1818802 15 1640569 17 1064759 10 100000 23428984 330 7686249 27 4177007 74 2729965 41 21581351 192 19909192 211 12325132 119 1000000 266520949 3884 75927601 277 42751315 901 28243939 436 248048604 2343 232357446 2575 139177031 1368 10000000 As far as code size, here are before and after sizes as reported by size(1) for bsearch.o, qsort.o, and a minimal statically linked program using qsort: before after bsearch.o 116 160 qsort.o 1550 1242 statictest 2950 2819 At -O2, the before and after sizes show the same basic pattern. At -O3, gcc performs more aggressive inlining on the grailsort code, and it balloons to more than twice the size of musl's current code. For the sake of soft-float targets, I'd also like to look at replacing the call to sqrt with an integer approximation. But before I go much further, I'd like to post the code here and get some feedback. 1. https://github.com/BonzaiThePenguin/WikiSort 2. https://github.com/Mrrl/GrailSort 3. http://akira.ruc.dk/~keld/teaching/algoritmedesign_f04/Artikler/04/Huang88.pdf 4. http://git.koorogi.info/cgit/qsort_compare/ -- Bobby Bingham --G4iJoqBmSsgzjUCe Content-Type: text/x-c; charset=utf-8 Content-Disposition: attachment; filename="bsearch.c" #include #include size_t __bsearch(const void *key, const void *base, size_t nel, size_t width, int (*cmp)(const void *, const void *)) { size_t baseidx = 0, tryidx; void *try; int sign; while (nel > 0) { tryidx = baseidx + nel/2; try = (char*)base + tryidx*width; sign = cmp(key, try); if (!sign) return tryidx; else if (sign < 0) nel /= 2; else { baseidx = tryidx + 1; nel -= nel/2 + 1; } } return ~baseidx; } void *bsearch(const void *key, const void *base, size_t nel, size_t width, int (*cmp)(const void *, const void *)) { size_t idx = __bsearch(key, base, nel, width, cmp); return idx > SSIZE_MAX ? NULL : (char*)base + idx*width; } --G4iJoqBmSsgzjUCe Content-Type: text/x-c; charset=utf-8 Content-Disposition: attachment; filename="qsort.c" #include #include #include #include typedef int (*cmpfun)(const void *, const void *); size_t __bsearch(const void *, const void *, size_t, size_t, cmpfun); static inline size_t bsearch_pos(const void *key, const void *haystack, size_t nmel, size_t width, cmpfun cmp) { size_t pos = __bsearch(key, haystack, nmel, width, cmp); return pos > SSIZE_MAX ? ~pos : pos; } static inline void *tail(char *base, size_t nmel, size_t width) { return base + (nmel-1) * width; } static void swap(char *a, char *b, size_t width) { #ifdef __GNUC__ typedef uint32_t __attribute__((__may_alias__)) u32; if ((uintptr_t)a % 4 == 0 && (uintptr_t)b % 4 == 0) { for (; width >= 4; width -= 4) { uint32_t tmp = *((u32*)a); *((u32*)a) = *((u32*)b); *((u32*)b) = tmp; a += 4; b += 4; } } #endif while (width--) { char tmp = *a; *a++ = *b; *b++ = tmp; } } static void rotate(char *base, size_t size, size_t shift) { int dir = 1; while (shift) { while (2*shift <= size) { swap(base, base + dir*shift, shift); size -= shift; base += shift*dir; } shift = size - shift; base = dir > 0 ? base + size - shift : base - shift; dir *= -1; } } static void distribute_buffer(char *base, size_t bufnmel, size_t sortnmel, size_t width, cmpfun cmp) { while (bufnmel) { char *sorted = base + bufnmel * width; size_t insertpos = bsearch_pos(base, sorted, sortnmel, width, cmp); if (insertpos > 0) rotate(base, (bufnmel + insertpos) * width, bufnmel * width); base += (insertpos + 1) * width; bufnmel -= 1; sortnmel -= insertpos; } } #define MAX_SORTNET 8 static const uint8_t sortnet[][2] = { /* 0: index = 0 */ /* 1: index = 0 */ /* 2: index = 0 */ {0,1}, /* 3: index = 1 */ {0,1}, {0,2}, {1,2}, /* 4: index = 4 */ {0,1}, {2,3}, {0,2}, {1,3}, {1,2}, /* 5: index = 9 */ {0,1}, {3,4}, {2,4}, {2,3}, {1,4}, {0,3}, {0,2}, {1,3}, {1,2}, /* 6: index = 18 */ {1,2}, {4,5}, {0,2}, {3,5}, {0,1}, {3,4}, {2,5}, {0,3}, {1,4}, {2,4}, {1,3}, {2,3}, /* 7: index = 30 */ {1,2}, {3,4}, {5,6}, {0,2}, {3,5}, {4,6}, {0,1}, {4,5}, {2,6}, {0,4}, {1,5}, {0,3}, {2,5}, {1,3}, {2,4}, {2,3}, /* 8: index = 46 */ {0,1}, {2,3}, {4,5}, {6,7}, {0,2}, {1,3}, {4,6}, {5,7}, {1,2}, {5,6}, {0,4}, {3,7}, {1,5}, {2,6}, {1,4}, {3,6}, {2,4}, {3,5}, {3,4}, /* 9: index = 65 */ }; static const uint8_t sortnet_index[] = { 0, 0, 0, 1, 4, 9, 18, 30, 46, 65 }; static void sorting_network(char *base, size_t nmel, size_t width, cmpfun cmp) { for (int i = sortnet_index[nmel]; i < sortnet_index[nmel+1]; i++) { char *elem1 = base + sortnet[i][0] * width; char *elem2 = base + sortnet[i][1] * width; if (cmp(elem1, elem2) > 0) swap(elem1, elem2, width); } } /* get index of last block whose head is less than the previous block's tail */ static size_t last_overlap(char *base, size_t bcount, size_t bwidth, size_t width, cmpfun cmp) { for (char *cur = tail(base, bcount, bwidth); --bcount; cur -= bwidth) if (cmp(cur - width, cur) > 0) break; return bcount; } void merge(char *buf, char *base, size_t anmel, size_t bnmel, size_t width, cmpfun cmp) { char *a = buf; char *b = base + anmel * width; size_t skip = bsearch_pos(b, base, anmel, width, cmp); anmel -= skip; base += skip * width; swap(base, a, anmel * width); while (anmel && bnmel) { if (cmp(a, b) <= 0) { swap(base, a, width); a += width; anmel--; } else { swap(base, b, width); b += width; bnmel--; } base += width; } swap(base, a, anmel * width); } void qsort(void *unsorted, size_t nmel, size_t width, cmpfun cmp) { char *base = unsorted; if (nmel <= MAX_SORTNET) { sorting_network(base, nmel, width, cmp); return; } size_t blknmel = sqrt(nmel); /* elements in a block */ size_t bufnmel = blknmel + nmel % blknmel; /* elements in the buffer */ size_t bwidth = blknmel * width; /* size of a block in bytes */ size_t blocks = nmel / blknmel - 1; /* number of blocks in a + b */ size_t acount = blocks / 2; size_t bcount = blocks - acount; char *a = base + bufnmel * width; char *b = a + acount * bwidth; qsort(a, acount * blknmel, width, cmp); qsort(b, bcount * blknmel, width, cmp); /* if already sorted, nothing to do */ if (cmp(tail(a, acount * blknmel, width), b) <= 0) goto distribute; /* sort all the a and b blocks together by their head elements */ qsort(a, blocks, bwidth, cmp); /* merge, starting from the end and working towards the beginning */ while (blocks > 1) { size_t overlap = last_overlap(a, blocks, bwidth, width, cmp); if (overlap > 0) merge(base, tail(a, overlap, bwidth), blknmel, (blocks - overlap) * blknmel, width, cmp); blocks = overlap; } distribute: qsort(base, bufnmel, width, cmp); distribute_buffer(base, bufnmel, nmel - bufnmel, width, cmp); } --G4iJoqBmSsgzjUCe-- --b5gNqxB1S1yM7hjW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJUBBxrAAoJEFedFv6RmqC9SYIP/3A2NoEjpXuMOAfMuAAWPKgp 4+iwn/y4A2Xa0LFpq0EPccTbEZQxsadNX81Uwz3Viz1MmWmBMYDlj2SJRDG1I6Pk MCDVojfVuFWyu1AkPFL3pXDT7AjzV49+VLixfg1YsTi0PdK9SVEq0jolWSsESmrs d0lAj/tYqyZEKix4AFSefi3Bjk5ibD5QReIeqMLV6uF6UszFgoV+PyGVARd//MCf QM07CW82XOU0KJQ9lGsCMTE8AoYz38HPyvY9oF+L9yd6SOETtBQleg0/afGvsufH +s3BMIUbmpPlvTVc25d/8XugBq31GQYdSnNAd8XKbZOP9/vYgL2OUznTi8t3fW+X zveKli7vHKgS3uv3555WBxEs4gVo2ZeACZNHcP4HEU0b/eBwT0uMfQ2TvnOikiFV pDMq5SW8zBCgZSXg9BWZcpXlN2erjk4q9WQ0NsPP62s5HYs5lEZLNtC7Tvj1AmGZ Di7vrP6Z44A7THnZFTQKkwy40JbWsDqI4OXYfRHw7tI5Ih/ctXfd4WHWR1qMJvMp nSc7X0BKBs3jzXCewNbr8bX2LDijVPDCJm1p9Q50nm32gyfxM9roKeJ43+IoPhj9 RyXtrI2qJQmVcWFsgxXfiatzrpEhx876qEK+tELe2dTyn64In2W/JOJFsrooKQxF jLN+6pQ9atRDij2YTO0w =N8xK -----END PGP SIGNATURE----- --b5gNqxB1S1yM7hjW--