From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 8495 invoked from network); 9 Jul 2021 13:38:32 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 9 Jul 2021 13:38:32 -0000 Received: (qmail 16009 invoked by uid 550); 9 Jul 2021 13:38:30 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 1912 invoked from network); 9 Jul 2021 11:26:40 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=insomnia247.nl; s=mail; t=1625829989; bh=/O0A7Vul6Dpan+hDRm3Pq7SBR9FdQV/ygpSqRE+ooTY=; h=To:Subject:Cc:Date:From:From; b=R/kr1Xme2tiz5umE6a1UtzS6Fwzs2y06bJn6EPaZo/OxmgpsEO/O4fvy0Gxkpehb2 PKw4fWPfskNeufnFBXc8e5bqYvbTyI5RP86ZxJ4dttDcVA9ua6xKq6xKBv69St4aZO JqFqfGxyqTP/EI8qQptMoNvp4lUZ5hxB9ahwm5hU= To: musl@lists.openwall.com Cc: jason@insomnia247.nl MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Message-Id: <20210709112629.024302221844@gateway02.insomnia247.nl> Date: Fri, 9 Jul 2021 13:26:28 +0200 (CEST) From: jason Subject: [musl] Improvements to qsort I've been trying to understand ngmalloc, and it's a struggle, so for relaxation I had a look at qsort, and have a few suggestions. To avoid burying the lead, the following reduces comparisons by about 4% with no downside: --- a/qsort.c +++ b/qsort.c @@ -100,18 +100,16 @@ static void sift(unsigned char *head, size_t width, cmpfun cmp, int pshift, size rt = head - width; lf = head - width - lp[pshift - 2]; - if((*cmp)(ar[0], lf) >= 0 && (*cmp)(ar[0], rt) >= 0) { - break; - } if((*cmp)(lf, rt) >= 0) { - ar[i++] = lf; head = lf; pshift -= 1; } else { - ar[i++] = rt; head = rt; pshift -= 2; } + if((*cmp)(ar[0], head) >= 0) + break; + ar[i++] = head; } cycle(width, ar, i); } Basically, instead of doing two comparisons to see if ar[0] is greater than both the left and right children and then comparing the children with each other to find the greater one, find the greater child and compare ar[0] with that one. I have several other less-significant changes that I wonder if musl is interested in: * I notice than pntz() is always used in the following context: trail = pntz(p); shr(p, trail); pshift += trail; both pntz() and shr() have branches conditional on whether the shift is more or less than 8*sizeof(size_t) bits. I have replaced this by a single function, which allows the conditional branch to be shared: /* * Re-normalize p[]. Ignore the low-order bit (which must be set!), * find the position of the next-least-significant bit, and shift p[] * down by that much. Return the number of bits shifted. * * There must be a penultimate trailing bit; i.e. p[] must not be {1, 0}. */ static inline int normalize(size_t p[2]) { int s; size_t t = p[0] - 1; if (t == 0) { t = p[1]; s = ntz(t); p[0] = t >> s; p[1] = 0; s += 8*sizeof(size_t); } else { s = ntz(t); p[0] = t >> s | p[1] << (8*sizeof(size_t) - s); p[1] >>= s; } return s; } which is used simply as: pshift += normalize(p); * As part of my learning about the code, I have added a lot of comments, particularly large block comments about the smoothsort algorithm in general, the meaning of p[] and pindex, what a "stepson" is, when it's okay to use sift() instead of trinkle(), what the "trusty" flag means, how cycle() is used, etc. Would these be worth cleaning up and submitting? * In my private copy, I have changed the type of "trusty" to bool, a data type I'm quite fond of as it concisely documents the limited range of values. I notice, however, that musl is completely devoid of any use of , although it uses other C99 features. Is this a conscious decision? * Likewise, I notice that most musl code has a space in "keyword (condition)" like "if (foo != 0)", "while (bar < N)" and "for (i=0; i < N; i++)". But the qsort() code, with one single exception, omits the space. Likewise, most msul code is quite happy to "if (condition) break;" on a single line, with no braces, but the original sift() function above puts the break on a second line and a closing brace on a third. The musl code base is inconsistent enough that it's hard to infer code formatting conventions. Are these discrepancies in qsort a problem worth fixing? * The way the heap-building loop maintains its loop invariants is a trifle peculiar. It begins with the element at *head included in p[] and pindex, but excluded from the heap ordering invariants. Then it establishes the ordering invariants (with sift() or trinkle()) and immediately adds another (un-ordered) element to p[] and pindex. After the main loop exits, a standalone call to trinkle() places the final element in heap order. I have a more conventional (add, then sift) ordering working which seems to simplify the code: while(head < high) { + bool notrinkle = false; + + /* Append an element to the heap */ + head += width; if((p[0] & 3) == 3) { - sift(head, width, cmp, pshift, lp); + /* Two trees of adjacent sizes: merge with head */ shr(p, 2); pshift += 2; - } else { - if(lp[pshift - 1] >= high - head) { - trinkle(head, width, cmp, p, pshift, 0, lp); - } else { + if((size_t)(high - head) > lp[pshift - 1]) { sift(head, width, cmp, pshift, lp); + notrinkle = true; } - - if(pshift == 1) { - shl(p, 1); - pshift = 0; - } else { - shl(p, pshift - 1); - pshift = 1; - } + } else if(pshift == 1) { + /* order-1 exists; new tree is order-0 */ + shl(p, 1); + pshift = 0; + notrinkle = (high != head); + } else { + /* new tree is order-1 */ + shl(p, pshift - 1); + pshift = 1; + notrinkle = ((size_t)(high - head) > lp[0]); } - p[0] |= 1; - head += width; + + /* Re-establish the heap using trinkle */ + if(!notrinkle) + trinkle(head, width, cmp, p, pshift, false, lp); } - trinkle(head, width, cmp, p, pshift, 0, lp); If the original ordering is wanted, then the loop initialization can be changed to start with two elements (p[0] = 3, pindex = 0, head = base + width), since two elements automatically fulfil the required loop invariants. Does anyone have a preference? * Since the first test in qsort() excludes width == 0, the loop over width in cycle() can be changed to "do { ... } while (width);". * The ar[] array in sift() only has to be as large as the lp[] array in qsort(), not 1/6 larger. * The ar[] array in trinkle() only has to be half as large, plus one. * The pindex > 1 case in the heap-consuming loop contains some redundant left-then-right shifting: shl(p, 2); pshift -= 2; p[0] ^= 7; shr(p, 1); This can be optimized. * There are several cases where variables can be moved into smaller local scopes, particularly the "stepson", "rt" and "lf" variables in trinkle(). This is another change I've made privately as I think it makes the code more reasonable. Worth submitting? * GCC emits a couple of signed/unsigned comparison warnings with -W -Wall. Worth fixing?