mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Slightly faster 'scalbn(x,i) = x*2**i'
@ 2026-01-31 23:55 Damian McGuckin
  2026-02-05 20:25 ` Szabolcs Nagy
  0 siblings, 1 reply; 3+ messages in thread
From: Damian McGuckin @ 2026-01-31 23:55 UTC (permalink / raw)
  To: MUSL


In the course of helping a colleage debug a problem with ldexp/scalbn in
Apple's LIBM for ARM, I ported a custom version of scalbn() written in 
Chapel to C. The resulting C code was pretty much like MUSL for 2**i being 
a normal number (or a number greater than the largest floating point 
number). But, for 2**i being subnormal (or effectively even smaller) and
|x| being normal, it avoids the multiplications seen in MUSL.

The X86-64 assembler code size from both GCC11 and CLANG19 is much the 
same. Just for fun, I timed it against MUSL's routine and it is about 30% 
faster over a broad range of input.

As MUSL's scalbn() is not a performance hog and it easy to read, I see no 
reason to suggest changing it.

Let me know if anybody thinks otherwise.

Thanks - Damian

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] Slightly faster 'scalbn(x,i) = x*2**i'
  2026-01-31 23:55 [musl] Slightly faster 'scalbn(x,i) = x*2**i' Damian McGuckin
@ 2026-02-05 20:25 ` Szabolcs Nagy
  2026-02-08  2:40   ` Damian McGuckin
  0 siblings, 1 reply; 3+ messages in thread
From: Szabolcs Nagy @ 2026-02-05 20:25 UTC (permalink / raw)
  To: Damian McGuckin; +Cc: MUSL

* Damian McGuckin <damianm@esi.com.au> [2026-02-01 10:55:08 +1100]:
> 
> In the course of helping a colleage debug a problem with ldexp/scalbn in
> Apple's LIBM for ARM, I ported a custom version of scalbn() written in
> Chapel to C. The resulting C code was pretty much like MUSL for 2**i being a
> normal number (or a number greater than the largest floating point number).
> But, for 2**i being subnormal (or effectively even smaller) and
> |x| being normal, it avoids the multiplications seen in MUSL.
> 
> The X86-64 assembler code size from both GCC11 and CLANG19 is much the same.
> Just for fun, I timed it against MUSL's routine and it is about 30% faster
> over a broad range of input.

nice

> 
> As MUSL's scalbn() is not a performance hog and it easy to read, I see no
> reason to suggest changing it.
> 
> Let me know if anybody thinks otherwise.

well it's not critical,
but it also does not hurt to have the code on the list
for future reference since you already spent time on it.

> 
> Thanks - Damian

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] Slightly faster 'scalbn(x,i) = x*2**i'
  2026-02-05 20:25 ` Szabolcs Nagy
@ 2026-02-08  2:40   ` Damian McGuckin
  0 siblings, 0 replies; 3+ messages in thread
From: Damian McGuckin @ 2026-02-08  2:40 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: MUSL


Subsequent to further testing on newer CPUs using more practical values of 
the integer argument and a revision to avoid a potential integer overflow 
bug that I had introduced, I can safely say that my code runs in much the 
same time as MUSL's own code within timing tolerances.

Mine and MUSL's both compile with GCC to the same number of instructions 
with GCC using -O3 and -mavx2 with the GCC 11 I am using.

So, no need to change the status quo.

Apologies for the noise - Damian

Below, I provide the code from the other project  for those interested.

#include <stdint.h>
#include <assert.h>
#include <stdio.h>

#ifdef LINT
#define    __builtin_expect(x, j)    (x)
#define    UFT    1.0e-38
#else
#define    UFT    0x1.0p-1022
#endif
#define    ABS            fabs
#define    unused         /*@unused@*/
#define    likely(x)      __builtin_expect((x),1)
#define    unlikely(x)    __builtin_expect((x),0)

#define    asuint32(__f)    ((union{float _f; uint32_t _i;}){__f})._i
#define    asuint64(__f)    ((union{double _f; uint64_t _i;}){__f})._i
#define    asfloat(__u)     ((union{uint32_t _i; float _f;}){__u})._f
#define    asdouble(__u)    ((union{uint64_t _i; double _f;}){__u})._f

/*
  * Compute x*2^n wth near minimalist multiplications
  */
double myldexp(double x, int n)
{
     static const uint64_t w = 64, p = 53;
     static const uint64_t einf = (((uint64_t) 1) << (w - p)) - 1;
     static const int b = (int) (einf / 2);

     if (unlikely(n > b))
     {
         /*
          * x*2^n = x * 2^*(b-n%1)*(2^(n/2))^2
          * where n is limited to b<n<2*b+p+1
          */
         static const double inflate[] = { 1 / UFT, 2 / UFT };
         static const int N = (int) (einf + p);
         const double y = x * inflate[n & 1];
         const uint64_t et = (uint64_t) ((((n < N ? n : N) + (b + 1))) / 2);
         const double t = asdouble(et << (p - 1));

         return (y * t) * t; /* suboptimal if cost of multiplication is high */
     }
     else if (unlikely(n < 1 - b))
     {
         /*
          * x*2^n = (x/2^(1-b-n))*2^(1-b) = (x/R)*2^(1-b), R = 2^m, m>0
          *
          * At this point, 1 <= R. If |x| <= 2^(-p), the products (x/R)*2^(1-b)
          * and x*2^(1-b) both underflow to zero for any R, i.e. the former
          * product can be computed as the latter. As a result, the computation
          * of x/R only really needs to be done when x > 2^(-p). It can be shown
          * that (x/R) = (x/r) where r = min(2^m,2^(e-1)) = 2^k for all practical
          * purposes because once m>e-1, both underflow to zero. The use of a
          * (normal) r rather than (potentially subnormal) R has the advantage
          * that x/r = x/2^k can be computed rapidly by subtracting the biased
          * exponent fields of x and r in-place, or because r has only zero bits
          * in the negative and significand bit field, the encodings of x and r.
          */
         const uint64_t _x = asuint64(x);
         const uint64_t ex = (_x >> (p - 1)) & einf;

         if (likely(b - p < ex && ex < einf)) /* only do this is necessary */
         {
             const uintt64_t e = ex - 1;
             const uint64_t m = (uint64_t) ((1 - b) - n); /* never overflows */
             const uint64_t _r = (m < e ? m : e) << (p - 1); /* always valid */

             x = asdouble(_x - _r); /* replace x by x / r */
         }
         return x * UFT;
     }
     return x * asdouble(((uint64_t) (n + b)) << (p - 1));
}
#ifdef TEST
#include <stdio.h>
#include <math.h>
#include <assert.h>

int main(int argc, char *argv[])
{
     double t = 0x1.9192d74f53405p-6;
     union { double r; unsigned long _r; } _t = { t };
     int e = -1027;
     double z;
     double big = 0x1.0p+1000;
     double tiny = 0x1.0p-1020;
     double res = myldexp (t, e);
     /* double res = (t * 0x1p-1020) * 0x1p-7; */
     double ref = t * 0x1p-1027;
     printf ("res=%la\nref=%la\n", res, ref);
     printf ("0x%016lx\n", _t._r);
     _t.r = ref;
     printf ("0x%016lx\n", _t._r);
     /* test for same bug seen in Apple libm on ARM */
     assert(res == ref);
     assert(ldexp(big, -2060) == myldexp(big, -2060));
     assert(__builtin_ldexp(tiny, 2000) == (z = myldexp(tiny, 2000)));
     printf ("%1a\n", z);
     return(0);
}
#endif

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-02-08  2:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-31 23:55 [musl] Slightly faster 'scalbn(x,i) = x*2**i' Damian McGuckin
2026-02-05 20:25 ` Szabolcs Nagy
2026-02-08  2:40   ` Damian McGuckin

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).