* [musl] Slightly faster 'scalbn(x,i) = x*2**i'
@ 2026-01-31 23:55 Damian McGuckin
2026-02-05 20:25 ` Szabolcs Nagy
0 siblings, 1 reply; 3+ messages in thread
From: Damian McGuckin @ 2026-01-31 23:55 UTC (permalink / raw)
To: MUSL
In the course of helping a colleage debug a problem with ldexp/scalbn in
Apple's LIBM for ARM, I ported a custom version of scalbn() written in
Chapel to C. The resulting C code was pretty much like MUSL for 2**i being
a normal number (or a number greater than the largest floating point
number). But, for 2**i being subnormal (or effectively even smaller) and
|x| being normal, it avoids the multiplications seen in MUSL.
The X86-64 assembler code size from both GCC11 and CLANG19 is much the
same. Just for fun, I timed it against MUSL's routine and it is about 30%
faster over a broad range of input.
As MUSL's scalbn() is not a performance hog and it easy to read, I see no
reason to suggest changing it.
Let me know if anybody thinks otherwise.
Thanks - Damian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [musl] Slightly faster 'scalbn(x,i) = x*2**i'
2026-01-31 23:55 [musl] Slightly faster 'scalbn(x,i) = x*2**i' Damian McGuckin
@ 2026-02-05 20:25 ` Szabolcs Nagy
2026-02-08 2:40 ` Damian McGuckin
0 siblings, 1 reply; 3+ messages in thread
From: Szabolcs Nagy @ 2026-02-05 20:25 UTC (permalink / raw)
To: Damian McGuckin; +Cc: MUSL
* Damian McGuckin <damianm@esi.com.au> [2026-02-01 10:55:08 +1100]:
>
> In the course of helping a colleage debug a problem with ldexp/scalbn in
> Apple's LIBM for ARM, I ported a custom version of scalbn() written in
> Chapel to C. The resulting C code was pretty much like MUSL for 2**i being a
> normal number (or a number greater than the largest floating point number).
> But, for 2**i being subnormal (or effectively even smaller) and
> |x| being normal, it avoids the multiplications seen in MUSL.
>
> The X86-64 assembler code size from both GCC11 and CLANG19 is much the same.
> Just for fun, I timed it against MUSL's routine and it is about 30% faster
> over a broad range of input.
nice
>
> As MUSL's scalbn() is not a performance hog and it easy to read, I see no
> reason to suggest changing it.
>
> Let me know if anybody thinks otherwise.
well it's not critical,
but it also does not hurt to have the code on the list
for future reference since you already spent time on it.
>
> Thanks - Damian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [musl] Slightly faster 'scalbn(x,i) = x*2**i'
2026-02-05 20:25 ` Szabolcs Nagy
@ 2026-02-08 2:40 ` Damian McGuckin
0 siblings, 0 replies; 3+ messages in thread
From: Damian McGuckin @ 2026-02-08 2:40 UTC (permalink / raw)
To: Szabolcs Nagy; +Cc: MUSL
Subsequent to further testing on newer CPUs using more practical values of
the integer argument and a revision to avoid a potential integer overflow
bug that I had introduced, I can safely say that my code runs in much the
same time as MUSL's own code within timing tolerances.
Mine and MUSL's both compile with GCC to the same number of instructions
with GCC using -O3 and -mavx2 with the GCC 11 I am using.
So, no need to change the status quo.
Apologies for the noise - Damian
Below, I provide the code from the other project for those interested.
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#ifdef LINT
#define __builtin_expect(x, j) (x)
#define UFT 1.0e-38
#else
#define UFT 0x1.0p-1022
#endif
#define ABS fabs
#define unused /*@unused@*/
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
#define asuint32(__f) ((union{float _f; uint32_t _i;}){__f})._i
#define asuint64(__f) ((union{double _f; uint64_t _i;}){__f})._i
#define asfloat(__u) ((union{uint32_t _i; float _f;}){__u})._f
#define asdouble(__u) ((union{uint64_t _i; double _f;}){__u})._f
/*
* Compute x*2^n wth near minimalist multiplications
*/
double myldexp(double x, int n)
{
static const uint64_t w = 64, p = 53;
static const uint64_t einf = (((uint64_t) 1) << (w - p)) - 1;
static const int b = (int) (einf / 2);
if (unlikely(n > b))
{
/*
* x*2^n = x * 2^*(b-n%1)*(2^(n/2))^2
* where n is limited to b<n<2*b+p+1
*/
static const double inflate[] = { 1 / UFT, 2 / UFT };
static const int N = (int) (einf + p);
const double y = x * inflate[n & 1];
const uint64_t et = (uint64_t) ((((n < N ? n : N) + (b + 1))) / 2);
const double t = asdouble(et << (p - 1));
return (y * t) * t; /* suboptimal if cost of multiplication is high */
}
else if (unlikely(n < 1 - b))
{
/*
* x*2^n = (x/2^(1-b-n))*2^(1-b) = (x/R)*2^(1-b), R = 2^m, m>0
*
* At this point, 1 <= R. If |x| <= 2^(-p), the products (x/R)*2^(1-b)
* and x*2^(1-b) both underflow to zero for any R, i.e. the former
* product can be computed as the latter. As a result, the computation
* of x/R only really needs to be done when x > 2^(-p). It can be shown
* that (x/R) = (x/r) where r = min(2^m,2^(e-1)) = 2^k for all practical
* purposes because once m>e-1, both underflow to zero. The use of a
* (normal) r rather than (potentially subnormal) R has the advantage
* that x/r = x/2^k can be computed rapidly by subtracting the biased
* exponent fields of x and r in-place, or because r has only zero bits
* in the negative and significand bit field, the encodings of x and r.
*/
const uint64_t _x = asuint64(x);
const uint64_t ex = (_x >> (p - 1)) & einf;
if (likely(b - p < ex && ex < einf)) /* only do this is necessary */
{
const uintt64_t e = ex - 1;
const uint64_t m = (uint64_t) ((1 - b) - n); /* never overflows */
const uint64_t _r = (m < e ? m : e) << (p - 1); /* always valid */
x = asdouble(_x - _r); /* replace x by x / r */
}
return x * UFT;
}
return x * asdouble(((uint64_t) (n + b)) << (p - 1));
}
#ifdef TEST
#include <stdio.h>
#include <math.h>
#include <assert.h>
int main(int argc, char *argv[])
{
double t = 0x1.9192d74f53405p-6;
union { double r; unsigned long _r; } _t = { t };
int e = -1027;
double z;
double big = 0x1.0p+1000;
double tiny = 0x1.0p-1020;
double res = myldexp (t, e);
/* double res = (t * 0x1p-1020) * 0x1p-7; */
double ref = t * 0x1p-1027;
printf ("res=%la\nref=%la\n", res, ref);
printf ("0x%016lx\n", _t._r);
_t.r = ref;
printf ("0x%016lx\n", _t._r);
/* test for same bug seen in Apple libm on ARM */
assert(res == ref);
assert(ldexp(big, -2060) == myldexp(big, -2060));
assert(__builtin_ldexp(tiny, 2000) == (z = myldexp(tiny, 2000)));
printf ("%1a\n", z);
return(0);
}
#endif
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-08 2:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-31 23:55 [musl] Slightly faster 'scalbn(x,i) = x*2**i' Damian McGuckin
2026-02-05 20:25 ` Szabolcs Nagy
2026-02-08 2:40 ` Damian McGuckin
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).