* math todo
@ 2012-05-01 0:05 Szabolcs Nagy
2012-05-01 1:19 ` Rich Felker
0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2012-05-01 0:05 UTC (permalink / raw)
To: musl
hello i thought i dump my math todo here
before the next big release
there are various fixes i didn't dare to
do (eg int32 -> uint32 cleanups) without
extensive test suit for math
tests:
there is still a lot to do for testing
it would be nice to have test vectors for every math
functions and cover the difficult cases for the most
common functions (see c99 F.9, G.6, crlibm, ucb test)
there are many open issues (long double handling,
exception and rounding mode testing, randomized tests,
tests based on math identities)
math.h:
__isrel pulls in long double isnan so isless etc cannot be
entirely inlined for float and double arguments
definition of NAN (0/0) raises invalid exception where it is used
signbit >>63 vs !!
>1ulp error:
tan(pi/2-eps) round tozero
exp(-inf) round upward
asm implementations:
add proper comments (using #)
x86_64 asm versions
long double:
drop ld128 support? and move ldshape union to arch/
at least rename the ieeel2 union and use the same union in all *l.c
internal/longdbl.h needs cleanup
efficient long double classification in math.h? (needs arch specific
things)
volatile fix:
-ffloat-store or -fexcess-precision=standard should fix most
volatile issues so some of the volatile hacks can be cleaned up
fp exceptions:
uniform inexact/invalid exception raising:
common hacks to raise inexact when x!=0: (int)x == 0 and huge+x > 0
round:
shouldn't raise inexact exception (or fix lround if it does)
round to int:
use +0x1.8p52 trick where it is applicable (rem_pio2, rint, lgamma_r)
trigonometric functions:
__rem_pio2_large: maybe returning only 2 bits is enough?
__tan: 3rd arg semantics is probably not optimal
sinf,cosf: return sindf(-y) vs -sindf(y)?
__sin,__cos,..: z,w,.. -> x2,x4,.. so degree is easier to see
use long double pi in long double code? (casin, cacos,..)
log2:
dekker vs long double arithmetics
scalbf:
scalb is buggy, do we need the *f and *l version?
generic code fixes:
int32_t -> uint32_t conversion (can be subtle, so testing is needed)
+= 1, -= 1 -> ++, --
TWO52, twom1000 vs tiny (renames where it makes sense)
remove overflow thresholds (sinh, cosh) when result overflows anyway?
sign bit checking convention (sqrt.c)
missing:
sqrtl
tgamma, tgammaf
(long double bessel)
nextafterf on ld64
tgamma:
lanczos approx as in boost/math/special_functions and python/Modules/mathmodule.c
complex
optimizable creal cimag (libm.h macro for internal code?)
include <math.h> and <complex.h> instead of libm.h (once there are efficient creal etc.)
cpack(x,y) vs x+I*y vs union .a[0]=x, .a[1]=y
fix casin[h], cacos[h], catan[h] (complex arith cornercases)
add missing long double versions
Kahan, W. "Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign Bit." 1987
Hull, Fairgrieve, Tang "Implementing complex elementary functions using exception handling" 1994
Hull, Fairgrieve, Tang "Implementing the complex arcsine and arccosine functions using exception handling" 1997
see boost/math/complex, python/Modules/cmathmodule.c
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: math todo
2012-05-01 0:05 math todo Szabolcs Nagy
@ 2012-05-01 1:19 ` Rich Felker
2012-05-01 9:14 ` Szabolcs Nagy
0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2012-05-01 1:19 UTC (permalink / raw)
To: musl
On Tue, May 01, 2012 at 02:05:03AM +0200, Szabolcs Nagy wrote:
> hello i thought i dump my math todo here
> before the next big release
Thanks!
> tests:
> there is still a lot to do for testing
> it would be nice to have test vectors for every math
> functions and cover the difficult cases for the most
> common functions (see c99 F.9, G.6, crlibm, ucb test)
> there are many open issues (long double handling,
> exception and rounding mode testing, randomized tests,
> tests based on math identities)
I'd be happy to see more testing. :-)
> math.h:
> __isrel pulls in long double isnan so isless etc cannot be
> entirely inlined for float and double arguments
Indeed, this should be fixed asap. I'll see if I can write some clever
macros to avoid bloating up the header too much.
> definition of NAN (0/0) raises invalid exception where it is used
This is only used as a fallback on compilers that don't have
__builtin_nan, no?
> signbit >>63 vs !!
Hm? This has been fixed I think.
> >1ulp error:
> tan(pi/2-eps) round tozero
> exp(-inf) round upward
What's the issue here? exp(-inf) is an exact zero; there's no rounding
involved. Are you saying in round-upward mode the result is nonzero
with the current implementation?
> asm implementations:
> add proper comments (using #)
> x86_64 asm versions
With yesterday's commit, I think all the x86_64 asm that's possible
with SSE2 is done. If we want asm for non-longdouble versions of
transcendental functions, I think it will just involve moving the
value from an SSE register to an FPU register and back... I have no
idea how this would compare in performance to the C versions using
SSE.
BTW under asm, we may also want to switch to the faster acos
implementation.
> long double:
> drop ld128 support? and move ldshape union to arch/
What ld128 support?
I don't think we really want to "drop" it since it will probably be
needed later,
> at least rename the ieeel2 union and use the same union in all *l.c
> internal/longdbl.h needs cleanup
This cleanup stuff is low priority as long as it works, IMO.
> efficient long double classification in math.h? (needs arch specific
> things)
I think this would potentially be worthwhile. Having bits/math.h would
also allow us to remove float_t/double_t from alltypes.h (they're only
used/needed by one header, math.h).
> volatile fix:
> -ffloat-store or -fexcess-precision=standard should fix most
> volatile issues so some of the volatile hacks can be cleaned up
I think it would be nice to try to ensure that the code still works
with just -ffloat-store for older gcc versions that don't have
-fexcess-precision=standard.
> fp exceptions:
> uniform inexact/invalid exception raising:
> common hacks to raise inexact when x!=0: (int)x == 0 and huge+x > 0
I don't really care how uniform it is as long as it works, but it's
fine to have this sort of cleanup as a low-priority goal.
> round:
> shouldn't raise inexact exception (or fix lround if it does)
> round to int:
> use +0x1.8p52 trick where it is applicable (rem_pio2, rint, lgamma_r)
> trigonometric functions:
> __rem_pio2_large: maybe returning only 2 bits is enough?
> __tan: 3rd arg semantics is probably not optimal
> sinf,cosf: return sindf(-y) vs -sindf(y)?
> __sin,__cos,..: z,w,.. -> x2,x4,.. so degree is easier to see
> use long double pi in long double code? (casin, cacos,..)
> log2:
> dekker vs long double arithmetics
> scalbf:
> scalb is buggy, do we need the *f and *l version?
How so?
> generic code fixes:
> int32_t -> uint32_t conversion (can be subtle, so testing is needed)
> += 1, -= 1 -> ++, --
> TWO52, twom1000 vs tiny (renames where it makes sense)
> remove overflow thresholds (sinh, cosh) when result overflows anyway?
> sign bit checking convention (sqrt.c)
> missing:
> sqrtl
Until we have a better one, couldn't we just use sqrt() as a first
approximation then use Newton's method or a binary-search for the
correct long double result? Or just return sqrt() since the only arch
that doesn't have sqrtl asm yet is the one where ld==double.
> tgamma, tgammaf
Well we have these but they're inaccurate for some inputs.
> (long double bessel)
> nextafterf on ld64
Eh?
> tgamma:
> lanczos approx as in boost/math/special_functions and python/Modules/mathmodule.c
> complex
> optimizable creal cimag (libm.h macro for internal code?)
This was done a long time ago.
> include <math.h> and <complex.h> instead of libm.h (once there are efficient creal etc.)
> cpack(x,y) vs x+I*y vs union .a[0]=x, .a[1]=y
> fix casin[h], cacos[h], catan[h] (complex arith cornercases)
> add missing long double versions
Ok.
Rich
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: math todo
2012-05-01 1:19 ` Rich Felker
@ 2012-05-01 9:14 ` Szabolcs Nagy
2012-05-01 14:44 ` Rich Felker
0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2012-05-01 9:14 UTC (permalink / raw)
To: musl
[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]
* Rich Felker <dalias@aerifal.cx> [2012-04-30 21:19:21 -0400]:
> > definition of NAN (0/0) raises invalid exception where it is used
>
> This is only used as a fallback on compilers that don't have
> __builtin_nan, no?
yes
> > signbit >>63 vs !!
>
> Hm? This has been fixed I think.
ok it seems to be fixed
> > >1ulp error:
> > tan(pi/2-eps) round tozero
> > exp(-inf) round upward
>
> What's the issue here? exp(-inf) is an exact zero; there's no rounding
> involved. Are you saying in round-upward mode the result is nonzero
> with the current implementation?
yes, actually
fesetround(FE_UPWARD);
exp(-inf) = 0x1p-1074; // smallest subnormal
instead of 0, which is 1ulp depending on your definition of ulp :)
i'll attach all the known problematic cases
(i386 asm is used, no -ffloat-store, acos is not yet fixed)
(i think acosh,asinh have problems as well but those are not tested)
zero and sign errors are treated as infinite ulp errors
errors >99ulp are printed as 99ulp
> With yesterday's commit, I think all the x86_64 asm that's possible
> with SSE2 is done. If we want asm for non-longdouble versions of
> transcendental functions, I think it will just involve moving the
> value from an SSE register to an FPU register and back... I have no
> idea how this would compare in performance to the C versions using
> SSE.
ok so only some benchmarking is left
> BTW under asm, we may also want to switch to the faster acos
> implementation.
ok i'll add it
(for double it is known to work, but i havent tested it for
the long double case)
> > long double:
> > drop ld128 support? and move ldshape union to arch/
>
> What ld128 support?
> I don't think we really want to "drop" it since it will probably be
> needed later,
ok
in freebsd several functions are implemented in a way that
they work on ld80 and ld128 as well but sometimes macro
hacks and workarounds are needed
grep for LDBP_MANT_DIG == 113
> > volatile fix:
> > -ffloat-store or -fexcess-precision=standard should fix most
> > volatile issues so some of the volatile hacks can be cleaned up
>
> I think it would be nice to try to ensure that the code still works
> with just -ffloat-store for older gcc versions that don't have
> -fexcess-precision=standard.
ok
> > scalbf:
> > scalb is buggy, do we need the *f and *l version?
>
> How so?
it's just non standard and i assume some code may use scalb
instead of scalbn, but scalbf and scalbl is probably not used
> > generic code fixes:
> > int32_t -> uint32_t conversion (can be subtle, so testing is needed)
> > += 1, -= 1 -> ++, --
> > TWO52, twom1000 vs tiny (renames where it makes sense)
> > remove overflow thresholds (sinh, cosh) when result overflows anyway?
> > sign bit checking convention (sqrt.c)
> > missing:
> > sqrtl
>
> Until we have a better one, couldn't we just use sqrt() as a first
> approximation then use Newton's method or a binary-search for the
> correct long double result? Or just return sqrt() since the only arch
> that doesn't have sqrtl asm yet is the one where ld==double.
yes that's a good idea
> > tgamma, tgammaf
>
> Well we have these but they're inaccurate for some inputs.
ah yes i added dummy versions, but they are not usable for serious work
> > (long double bessel)
> > nextafterf on ld64
>
> Eh?
actually it's nexttowardf(float x, long double y)
it's not implemented when long double == double
it cannot be just a simple wrapper around nextafter or nextafterf
> > tgamma:
> > lanczos approx as in boost/math/special_functions and python/Modules/mathmodule.c
> > complex
> > optimizable creal cimag (libm.h macro for internal code?)
>
> This was done a long time ago.
good
[-- Attachment #2: matherr.txt --]
[-- Type: text/plain, Size: 16713 bytes --]
99 ulp want: 0 got: 8000000000000000 round: m acos arg0: 0x1p+0 arg1: 0x0p+0 want: 0x0p+0 got: -0x0p+0
99 ulp want: 8000000000000000 got: 8000000000000001 round: p atan arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
99 ulp want: 0 got: 1 round: m atan arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
99 ulp want: 0 got: 1 round: z atan arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
99 ulp want: 8000000000000000 got: 8000000000000001 round: z atan arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
-99 ulp want: 3e92559dfa91a5e5 got: 3e92553ab0f98600 round: z cos arg0: 0x1.921fb0aedb52ep+0 arg1: 0x0p+0 want: 0x1.2559dfa91a5e5p-22 got: 0x1.2553ab0f986p-22
-99 ulp want: 3e8e6ec5d3a34c02 got: 3e8e6dff406bfe00 round: z cos arg0: 0x1.921fb1766a171p+0 arg1: 0x0p+0 want: 0x1.e6ec5d3a34c02p-23 got: 0x1.e6dff406bfep-23
-99 ulp want: 3e8cd4571d234c0e got: 3e8cd39089ea2c00 round: z cos arg0: 0x1.921fb1a9b7edep+0 arg1: 0x0p+0 want: 0x1.cd4571d234c0ep-23 got: 0x1.cd39089ea2cp-23
-99 ulp want: 3e6567b0ae8d3130 got: 3e656496613e6000 round: z cos arg0: 0x1.921fb499054c1p+0 arg1: 0x0p+0 want: 0x1.567b0ae8d313p-25 got: 0x1.56496613e6p-25
-99 ulp want: 3e692c3d7c8d312e got: 3e6929232f42a800 round: z cos arg0: 0x1.921fb47ae0e5ap+0 arg1: 0x0p+0 want: 0x1.92c3d7c8d312ep-25 got: 0x1.929232f42a8p-25
-16 ulp want: bfe14f348fed47b6 got: bfe14f348fed47a6 round: p cos arg0: -0x1.e2d0add5684ap+16 arg1: 0x0p+0 want: -0x1.14f348fed47b6p-1 got: -0x1.14f348fed47a6p-1
-99 ulp want: 3fc4c145814a7dd6 got: 3fc4c1458148e377 round: z cos arg0: 0x1.8174c6167647fp+14 arg1: 0x0p+0 want: 0x1.4c145814a7dd6p-3 got: 0x1.4c1458148e377p-3
-99 ulp want: 3fc4c145814a7dd6 got: 3fc4c1458148e377 round: m cos arg0: 0x1.8174c6167647fp+14 arg1: 0x0p+0 want: 0x1.4c145814a7dd6p-3 got: 0x1.4c1458148e377p-3
-99 ulp want: 3fddc6f9c12db1c9 got: 3fddc6f9c12db0cd round: z cos arg0: 0x1.e69f4157cad0cp+12 arg1: 0x0p+0 want: 0x1.dc6f9c12db1c9p-2 got: 0x1.dc6f9c12db0cdp-2
-99 ulp want: 3fddc6f9c12db1c9 got: 3fddc6f9c12db0cd round: m cos arg0: 0x1.e69f4157cad0cp+12 arg1: 0x0p+0 want: 0x1.dc6f9c12db1c9p-2 got: 0x1.dc6f9c12db0cdp-2
-99 ulp want: 3fcced65cabc55bf got: 3fcced65cabbb142 round: z cos arg0: 0x1.79dd7368775c3p+17 arg1: 0x0p+0 want: 0x1.ced65cabc55bfp-3 got: 0x1.ced65cabbb142p-3
-99 ulp want: 3fcced65cabc55bf got: 3fcced65cabbb142 round: m cos arg0: 0x1.79dd7368775c3p+17 arg1: 0x0p+0 want: 0x1.ced65cabc55bfp-3 got: 0x1.ced65cabbb142p-3
99 ulp want: 3fedda3163ed5d27 got: 3fedda3163eda382 round: z cos arg0: 0x1.25d58ebe0a3f9p+15 arg1: 0x0p+0 want: 0x1.dda3163ed5d27p-1 got: 0x1.dda3163eda382p-1
99 ulp want: 3fedda3163ed5d27 got: 3fedda3163eda382 round: m cos arg0: 0x1.25d58ebe0a3f9p+15 arg1: 0x0p+0 want: 0x1.dda3163ed5d27p-1 got: 0x1.dda3163eda382p-1
-99 ulp want: bfcb4f76409126f1 got: bfcb4f76409060a2 round: p cos arg0: -0x1.32171bd2fc39fp+8 arg1: 0x0p+0 want: -0x1.b4f76409126f1p-3 got: -0x1.b4f76409060a2p-3
-99 ulp want: bfd0d646ecbc5238 got: bfd0d646ecbc23a6 round: p cos arg0: 0x1.b77ba27615fd4p+17 arg1: 0x0p+0 want: -0x1.0d646ecbc5238p-2 got: -0x1.0d646ecbc23a6p-2
99 ulp want: 0 got: 1 round: p exp arg0: -inf arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
99 ulp want: 8000000000000000 got: 8000000000000001 round: p expm1 arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
99 ulp want: 1 got: 0 round: m expm1 arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x1p-1074 got: 0x0p+0
99 ulp want: 1 got: 0 round: z expm1 arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x1p-1074 got: 0x0p+0
99 ulp want: 8000000000000001 got: 8000000000000000 round: z log1p arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x1p-1074 got: -0x0p+0
99 ulp want: 8000000000000000 got: 8000000000000001 round: p sin arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
99 ulp want: 0 got: 1 round: m sin arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
99 ulp want: 0 got: 1 round: z sin arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
99 ulp want: 8000000000000000 got: 8000000000000001 round: z sin arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
99 ulp want: bfefffffffffffff got: bff000000011553b round: p sin arg0: 0x1.32ce90b32173ep+18 arg1: 0x0p+0 want: -0x1.fffffffffffffp-1 got: -0x1.000000011553bp+0
-2 ulp want: 3fefd1d7f1c8170c got: 3fefd1d7f1c8170a round: z sinh arg0: 0x1.c13876341b62ep-1 arg1: 0x0p+0 want: 0x1.fd1d7f1c8170cp-1 got: 0x1.fd1d7f1c8170ap-1
-2 ulp want: 3fe857954132083d got: 3fe857954132083b round: z sinh arg0: 0x1.67425fe575c88p-1 arg1: 0x0p+0 want: 0x1.857954132083dp-1 got: 0x1.857954132083bp-1
2 ulp want: 3fef1aef6e4181c7 got: 3fef1aef6e4181c9 round: p sinh arg0: 0x1.b911a09cbfdf4p-1 arg1: 0x0p+0 want: 0x1.f1aef6e4181c7p-1 got: 0x1.f1aef6e4181c9p-1
2 ulp want: 3fdac1e17d0da3f7 got: 3fdac1e17d0da3f9 round: p sinh arg0: 0x1.a089039d2a1adp-2 arg1: 0x0p+0 want: 0x1.ac1e17d0da3f7p-2 got: 0x1.ac1e17d0da3f9p-2
2 ulp want: bfeac252c29bb71a got: bfeac252c29bb71c round: p sinh arg0: -0x1.857add62fa2bp-1 arg1: 0x0p+0 want: -0x1.ac252c29bb71ap-1 got: -0x1.ac252c29bb71cp-1
-2 ulp want: 3fcd846ac730bb07 got: 3fcd846ac730bb05 round: m sinh arg0: 0x1.d42ff8f159a07p-3 arg1: 0x0p+0 want: 0x1.d846ac730bb07p-3 got: 0x1.d846ac730bb05p-3
-2 ulp want: 3fcd846ac730bb07 got: 3fcd846ac730bb05 round: z sinh arg0: 0x1.d42ff8f159a07p-3 arg1: 0x0p+0 want: 0x1.d846ac730bb07p-3 got: 0x1.d846ac730bb05p-3
-2 ulp want: bfcfc6f923054efc got: bfcfc6f923054efa round: m sinh arg0: -0x1.f75a54466a762p-3 arg1: 0x0p+0 want: -0x1.fc6f923054efcp-3 got: -0x1.fc6f923054efap-3
-2 ulp want: bfcfc6f923054efb got: bfcfc6f923054ef9 round: z sinh arg0: -0x1.f75a54466a762p-3 arg1: 0x0p+0 want: -0x1.fc6f923054efbp-3 got: -0x1.fc6f923054ef9p-3
2 ulp want: bfcef1a856ac1eee got: bfcef1a856ac1ef0 round: p sinh arg0: -0x1.ea675453c1528p-3 arg1: 0x0p+0 want: -0x1.ef1a856ac1eeep-3 got: -0x1.ef1a856ac1efp-3
-2 ulp want: 3fce6da42ddf35ea got: 3fce6da42ddf35e8 round: m sinh arg0: 0x1.e26145c02af1cp-3 arg1: 0x0p+0 want: 0x1.e6da42ddf35eap-3 got: 0x1.e6da42ddf35e8p-3
-2 ulp want: 3fce6da42ddf35ea got: 3fce6da42ddf35e8 round: z sinh arg0: 0x1.e26145c02af1cp-3 arg1: 0x0p+0 want: 0x1.e6da42ddf35eap-3 got: 0x1.e6da42ddf35e8p-3
-2 ulp want: bfee5dc452bdb7ad got: bfee5dc452bdb7ab round: m sinh arg0: -0x1.b08abbf2543e9p-1 arg1: 0x0p+0 want: -0x1.e5dc452bdb7adp-1 got: -0x1.e5dc452bdb7abp-1
-2 ulp want: bfee5dc452bdb7ac got: bfee5dc452bdb7aa round: z sinh arg0: -0x1.b08abbf2543e9p-1 arg1: 0x0p+0 want: -0x1.e5dc452bdb7acp-1 got: -0x1.e5dc452bdb7aap-1
-2 ulp want: 3fdba7c70c14e1c3 got: 3fdba7c70c14e1c1 round: m sinh arg0: 0x1.adc22d546c638p-2 arg1: 0x0p+0 want: 0x1.ba7c70c14e1c3p-2 got: 0x1.ba7c70c14e1c1p-2
-2 ulp want: 3fdba7c70c14e1c3 got: 3fdba7c70c14e1c1 round: z sinh arg0: 0x1.adc22d546c638p-2 arg1: 0x0p+0 want: 0x1.ba7c70c14e1c3p-2 got: 0x1.ba7c70c14e1c1p-2
-3 ulp want: bfcf1058b810c4ea got: bfcf1058b810c4e7 round: m sinh arg0: -0x1.ec448dacc4d3dp-3 arg1: 0x0p+0 want: -0x1.f1058b810c4eap-3 got: -0x1.f1058b810c4e7p-3
-3 ulp want: bfcf1058b810c4e9 got: bfcf1058b810c4e6 round: z sinh arg0: -0x1.ec448dacc4d3dp-3 arg1: 0x0p+0 want: -0x1.f1058b810c4e9p-3 got: -0x1.f1058b810c4e6p-3
-2 ulp want: 3fda9edcc93490ee got: 3fda9edcc93490ec round: m sinh arg0: 0x1.9e83e23cadd18p-2 arg1: 0x0p+0 want: 0x1.a9edcc93490eep-2 got: 0x1.a9edcc93490ecp-2
-2 ulp want: 3fda9edcc93490ee got: 3fda9edcc93490ec round: z sinh arg0: 0x1.9e83e23cadd18p-2 arg1: 0x0p+0 want: 0x1.a9edcc93490eep-2 got: 0x1.a9edcc93490ecp-2
-2 ulp want: bfdb77760e081cd9 got: bfdb77760e081cd7 round: m sinh arg0: -0x1.aafc2776377eep-2 arg1: 0x0p+0 want: -0x1.b77760e081cd9p-2 got: -0x1.b77760e081cd7p-2
-2 ulp want: bfdb77760e081cd8 got: bfdb77760e081cd6 round: z sinh arg0: -0x1.aafc2776377eep-2 arg1: 0x0p+0 want: -0x1.b77760e081cd8p-2 got: -0x1.b77760e081cd6p-2
-2 ulp want: bfe8ba8c2def4592 got: bfe8ba8c2def4590 round: m sinh arg0: -0x1.6c2af543de11ep-1 arg1: 0x0p+0 want: -0x1.8ba8c2def4592p-1 got: -0x1.8ba8c2def459p-1
-2 ulp want: bfe8ba8c2def4591 got: bfe8ba8c2def458f round: z sinh arg0: -0x1.6c2af543de11ep-1 arg1: 0x0p+0 want: -0x1.8ba8c2def4591p-1 got: -0x1.8ba8c2def458fp-1
2 ulp want: 4119e111386fe780 got: 4119e111386fe782 round: z tan arg0: 0x1.921f8db2b958dp+0 arg1: 0x0p+0 want: 0x1.9e111386fe78p+18 got: 0x1.9e111386fe782p+18
-99 ulp want: 41621225fabc950c got: 41621209abebecb6 round: z tan arg0: 0x1.921fb37eef6fap+0 arg1: 0x0p+0 want: 0x1.21225fabc950cp+23 got: 0x1.21209abebecb6p+23
-99 ulp want: 416a222735f505fd got: 416a21ec023cf830 round: z tan arg0: 0x1.921fb40acae4ap+0 arg1: 0x0p+0 want: 0x1.a222735f505fdp+23 got: 0x1.a21ec023cf83p+23
-99 ulp want: 417c35ac9ef2ad2d got: 417c3522a91e6ea2 round: z tan arg0: 0x1.921fb4b310032p+0 arg1: 0x0p+0 want: 0x1.c35ac9ef2ad2dp+24 got: 0x1.c3522a91e6ea2p+24
-99 ulp want: 418b8c161ee609d8 got: 418b8b0f06800f38 round: z tan arg0: 0x1.921fb4f9ea79bp+0 arg1: 0x0p+0 want: 0x1.b8c161ee609d8p+25 got: 0x1.b8b0f06800f38p+25
-99 ulp want: 418f0f30948c2833 got: 418f0de220c7b376 round: z tan arg0: 0x1.921fb502529c9p+0 arg1: 0x0p+0 want: 0x1.f0f30948c2833p+25 got: 0x1.f0de220c7b376p+25
-99 ulp want: 419cec9642faae58 got: 419cea523ecfdc3c round: z tan arg0: 0x1.921fb520dbabcp+0 arg1: 0x0p+0 want: 0x1.cec9642faae58p+26 got: 0x1.cea523ecfdc3cp+26
-99 ulp want: 41a5125600daedec got: 41a50fee6c9e4ac8 round: z tan arg0: 0x1.921fb52bf682ap+0 arg1: 0x0p+0 want: 0x1.5125600daedecp+27 got: 0x1.50fee6c9e4ac8p+27
-99 ulp want: 41bc21e3cad2dba8 got: 41bc1952f2d3037a round: z tan arg0: 0x1.921fb53b2942dp+0 arg1: 0x0p+0 want: 0x1.c21e3cad2dba8p+28 got: 0x1.c1952f2d3037ap+28
-99 ulp want: 41c21e8a2ab25a5c got: 41c2176f7a40708e round: z tan arg0: 0x1.921fb53d325c1p+0 arg1: 0x0p+0 want: 0x1.21e8a2ab25a5cp+29 got: 0x1.2176f7a40708ep+29
-99 ulp want: 41d153ba1e4c2daa got: 41d146c081651560 round: z tan arg0: 0x1.921fb540913edp+0 arg1: 0x0p+0 want: 0x1.153ba1e4c2daap+30 got: 0x1.146c08165156p+30
-99 ulp want: 420abbf97365a7d4 got: 4209cccd7b4aad92 round: z tan arg0: 0x1.921fb543f6367p+0 arg1: 0x0p+0 want: 0x1.abbf97365a7d4p+33 got: 0x1.9cccd7b4aad92p+33
-99 ulp want: 4220eb7222928f43 got: 421eff70160506b0 round: z tan arg0: 0x1.921fb544248edp+0 arg1: 0x0p+0 want: 0x1.0eb7222928f43p+35 got: 0x1.eff70160506bp+34
-99 ulp want: 422ca793c3e1bafd got: 4228cdc4227d9e22 round: z tan arg0: 0x1.921fb54430f35p+0 arg1: 0x0p+0 want: 0x1.ca793c3e1bafdp+35 got: 0x1.8cdc4227d9e22p+35
-99 ulp want: 42311ed08b4c3e84 got: 422ce1ed44606ca8 round: z tan arg0: 0x1.921fb54433dd9p+0 arg1: 0x0p+0 want: 0x1.11ed08b4c3e84p+36 got: 0x1.ce1ed44606ca8p+35
-99 ulp want: 4248d6d64b805173 got: 424025a0d6fa309c round: z tan arg0: 0x1.921fb5443daa5p+0 arg1: 0x0p+0 want: 0x1.8d6d64b805173p+37 got: 0x1.025a0d6fa309cp+37
-99 ulp want: 427394333b25d835 got: 4251d236ccb8fec8 round: z tan arg0: 0x1.921fb54442005p+0 arg1: 0x0p+0 want: 0x1.394333b25d835p+40 got: 0x1.1d236ccb8fec8p+38
-99 ulp want: 428cf0a054216e88 got: 4254fafafbcf00aa round: z tan arg0: 0x1.921fb544428acp+0 arg1: 0x0p+0 want: 0x1.cf0a054216e88p+41 got: 0x1.4fafafbcf00aap+38
-99 ulp want: 4287f7457f2246a3 got: 425497d3fece47b6 round: z tan arg0: 0x1.921fb544427c1p+0 arg1: 0x0p+0 want: 0x1.7f7457f2246a3p+41 got: 0x1.497d3fece47b6p+38
-99 ulp want: 429ffb97121c3896 got: 42561368abe1d361 round: z tan arg0: 0x1.921fb54442b18p+0 arg1: 0x0p+0 want: 0x1.ffb97121c3896p+42 got: 0x1.61368abe1d361p+38
-99 ulp want: 42a907e7aaaea9f3 got: 42566cd8ee4c1ef3 round: z tan arg0: 0x1.921fb54442bd1p+0 arg1: 0x0p+0 want: 0x1.907e7aaaea9f3p+43 got: 0x1.66cd8ee4c1ef3p+38
-99 ulp want: 42b1a25bb08e2211 got: 42569be266c9f694 round: z tan arg0: 0x1.921fb54442c3p+0 arg1: 0x0p+0 want: 0x1.1a25bb08e2211p+44 got: 0x1.69be266c9f694p+38
-99 ulp want: 42b0908339725fa3 got: 4256946803ed39a9 round: z tan arg0: 0x1.921fb54442c21p+0 arg1: 0x0p+0 want: 0x1.0908339725fa3p+44 got: 0x1.6946803ed39a9p+38
-99 ulp want: 42b7c69d0d4db53d got: 4256b9fdbc883eae round: z tan arg0: 0x1.921fb54442c6cp+0 arg1: 0x0p+0 want: 0x1.7c69d0d4db53dp+44 got: 0x1.6b9fdbc883eaep+38
-99 ulp want: 42b678b0ac797390 got: 4256b4f39e03488f round: z tan arg0: 0x1.921fb54442c62p+0 arg1: 0x0p+0 want: 0x1.678b0ac79739p+44 got: 0x1.6b4f39e03488fp+38
-99 ulp want: 42c0174eaf74279d got: 4256d0c7103a8d29 round: z tan arg0: 0x1.921fb54442c99p+0 arg1: 0x0p+0 want: 0x1.0174eaf74279dp+45 got: 0x1.6d0c7103a8d29p+38
-99 ulp want: 42ca80a300c5bcae got: 4256ea4e5ccffcc5 round: z tan arg0: 0x1.921fb54442ccbp+0 arg1: 0x0p+0 want: 0x1.a80a300c5bcaep+45 got: 0x1.6ea4e5ccffcc5p+38
-99 ulp want: 42c50db55110745c got: 4256e0116643b405 round: z tan arg0: 0x1.921fb54442cb7p+0 arg1: 0x0p+0 want: 0x1.50db55110745cp+45 got: 0x1.6e0116643b405p+38
-99 ulp want: 42c631c0768030fe got: 4256e29fc83e36f9 round: z tan arg0: 0x1.921fb54442cbcp+0 arg1: 0x0p+0 want: 0x1.631c0768030fep+45 got: 0x1.6e29fc83e36f9p+38
-99 ulp want: 42d5a8ff56a9370e got: 4256f9bb09213c9f round: z tan arg0: 0x1.921fb54442ce9p+0 arg1: 0x0p+0 want: 0x1.5a8ff56a9370ep+46 got: 0x1.6f9bb09213c9fp+38
-99 ulp want: 42db788ef1b787b6 got: 4256fee3e0e43221 round: z tan arg0: 0x1.921fb54442cf3p+0 arg1: 0x0p+0 want: 0x1.b788ef1b787b6p+46 got: 0x1.6fee3e0e43221p+38
-99 ulp want: 42d838cf6838a195 got: 4256fc4f2ae3809e round: z tan arg0: 0x1.921fb54442ceep+0 arg1: 0x0p+0 want: 0x1.838cf6838a195p+46 got: 0x1.6fc4f2ae3809ep+38
-99 ulp want: 42dfba01d10013e9 got: 425701792b554663 round: z tan arg0: 0x1.921fb54442cf8p+0 arg1: 0x0p+0 want: 0x1.fba01d10013e9p+46 got: 0x1.701792b554663p+38
-99 ulp want: 42e6fc1002ce31b7 got: 425706a57e50dc6a round: z tan arg0: 0x1.921fb54442d02p+0 arg1: 0x0p+0 want: 0x1.6fc1002ce31b7p+47 got: 0x1.706a57e50dc6ap+38
-99 ulp want: 42eda30b5171acf4 got: 4257093c873fb922 round: z tan arg0: 0x1.921fb54442d07p+0 arg1: 0x0p+0 want: 0x1.da30b5171acf4p+47 got: 0x1.7093c873fb922p+38
-2 ulp want: bfd13f1027a096d1 got: bfd13f1027a096cf round: m tan arg0: 0x1.75c63414b9f57p+19 arg1: 0x0p+0 want: -0x1.13f1027a096d1p-2 got: -0x1.13f1027a096cfp-2
3 ulp want: bfcbe4bb2d89518d got: bfcbe4bb2d895190 round: z tan arg0: -0x1.1551831ffe3eap+6 arg1: 0x0p+0 want: -0x1.be4bb2d89518dp-3 got: -0x1.be4bb2d89519p-3
79 ulp want: bfb54516fc029550 got: bfb54516fc02959f round: z tan arg0: -0x1.94c6c8d914b3dp+3 arg1: 0x0p+0 want: -0x1.54516fc02955p-4 got: -0x1.54516fc02959fp-4
71 ulp want: bfb54516fc029551 got: bfb54516fc029598 round: m tan arg0: -0x1.94c6c8d914b3dp+3 arg1: 0x0p+0 want: -0x1.54516fc029551p-4 got: -0x1.54516fc029598p-4
-99 ulp want: c0618d2afbac399a got: c0618d2afb9c1ea8 round: p tan arg0: 0x1.99fc268aa1543p+8 arg1: 0x0p+0 want: -0x1.18d2afbac399ap+7 got: -0x1.18d2afb9c1ea8p+7
4 ulp want: bfbd1753684282e8 got: bfbd1753684282ec round: z tan arg0: -0x1.ae746bc4ff77p+9 arg1: 0x0p+0 want: -0x1.d1753684282e8p-4 got: -0x1.d1753684282ecp-4
-4 ulp want: bfbd1753684282e9 got: bfbd1753684282e5 round: m tan arg0: -0x1.ae746bc4ff77p+9 arg1: 0x0p+0 want: -0x1.d1753684282e9p-4 got: -0x1.d1753684282e5p-4
99 ulp want: 0 got: 8000000000000000 round: m acos arg0: 0x1p+0 arg1: 0x0p+0 want: 0x0p+0 got: -0x0p+0
99 ulp want: 8000000000000001 got: 8000000000000000 round: p asin arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x1p-1074 got: -0x0p+0
99 ulp want: 8000000000000000 got: 8000000000000001 round: m atan2 arg0: -0x1p-1022 arg1: 0x1.fffffffffffffp+1023 want: -0x0p+0 got: -0x1p-1074
99 ulp want: 8000000000000000 got: 8000000000000001 round: m atan2 arg0: -0x1p-1022 arg1: 0x1.fffffffffffffp+1023 want: -0x0p+0 got: -0x1p-1074
99 ulp want: 0 got: 1 round: p exp arg0: -inf arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
99 ulp want: 0 got: 1 round: p exp arg0: -0x1.75p+9 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
99 ulp want: 0 got: 1 round: p exp arg0: -0x1.c9c8p+13 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
-2 ulp want: 3e60000000000001 got: 3e5fffffffffffff round: m sinh arg0: 0x1p-25 arg1: 0x0p+0 want: 0x1.0000000000001p-25 got: 0x1.fffffffffffffp-26
-2 ulp want: 3e60000000000001 got: 3e5fffffffffffff round: z sinh arg0: 0x1p-25 arg1: 0x0p+0 want: 0x1.0000000000001p-25 got: 0x1.fffffffffffffp-26
-2 ulp want: be60000000000001 got: be5fffffffffffff round: z sinh arg0: -0x1p-25 arg1: 0x0p+0 want: -0x1.0000000000001p-25 got: -0x1.fffffffffffffp-26
all: 69734 fail: 12787 failbad: 107 failepic: 72
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: math todo
2012-05-01 9:14 ` Szabolcs Nagy
@ 2012-05-01 14:44 ` Rich Felker
2012-05-01 18:33 ` Szabolcs Nagy
0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2012-05-01 14:44 UTC (permalink / raw)
To: musl
On Tue, May 01, 2012 at 11:14:58AM +0200, Szabolcs Nagy wrote:
> yes, actually
> fesetround(FE_UPWARD);
> exp(-inf) = 0x1p-1074; // smallest subnormal
> instead of 0, which is 1ulp depending on your definition of ulp :)
This is incorrect (because the result is exact, the error is ==1ulp,
and IEEE requires <1ulp). But for the large finite negative arguments,
0x1p-1074 is the correctly rounded answer in rounds-upward mode.
> > BTW under asm, we may also want to switch to the faster acos
> > implementation.
> ok i'll add it
> (for double it is known to work, but i havent tested it for
> the long double case)
Even if we can't switch ld we could still switch it for the
float/double versions and keep our original asm for ld80.
> > Until we have a better one, couldn't we just use sqrt() as a first
> > approximation then use Newton's method or a binary-search for the
> > correct long double result? Or just return sqrt() since the only arch
> > that doesn't have sqrtl asm yet is the one where ld==double.
> yes that's a good idea
Already committed the latter idea.
> > > tgamma, tgammaf
> >
> > Well we have these but they're inaccurate for some inputs.
> ah yes i added dummy versions, but they are not usable for serious work
Good to know.
> > > (long double bessel)
> > > nextafterf on ld64
> >
> > Eh?
> actually it's nexttowardf(float x, long double y)
> it's not implemented when long double == double
> it cannot be just a simple wrapper around nextafter or nextafterf
All of the nextafter stuff looks overly complicated. Shouldn't these
functions all just be (essentially):
if (x>y) rep_x--;
else if (x<y) rep_x++;
modulo a little handling for sign and ±0?
> 99 ulp want: 0 got: 8000000000000000 round: m acos arg0: 0x1p+0 arg1: 0x0p+0 want: 0x0p+0 got: -0x0p+0
> 99 ulp want: 8000000000000000 got: 8000000000000001 round: p atan arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
> 99 ulp want: 0 got: 1 round: m atan arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> 99 ulp want: 0 got: 1 round: z atan arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> 99 ulp want: 8000000000000000 got: 8000000000000001 round: z atan arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
IMO this is 1ulp not >99ulp. For denormals ulp is not x * 0x1p-52 but
the actual last place of significance.
> 99 ulp want: 0 got: 1 round: p exp arg0: -inf arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> 99 ulp want: 8000000000000000 got: 8000000000000001 round: p expm1 arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
> 99 ulp want: 1 got: 0 round: m expm1 arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x1p-1074 got: 0x0p+0
> 99 ulp want: 1 got: 0 round: z expm1 arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x1p-1074 got: 0x0p+0
> 99 ulp want: 8000000000000001 got: 8000000000000000 round: z log1p arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x1p-1074 got: -0x0p+0
> 99 ulp want: 8000000000000000 got: 8000000000000001 round: p sin arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
> 99 ulp want: 0 got: 1 round: m sin arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> 99 ulp want: 0 got: 1 round: z sin arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> 99 ulp want: 8000000000000000 got: 8000000000000001 round: z sin arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
Same for these.
> 99 ulp want: 0 got: 8000000000000000 round: m acos arg0: 0x1p+0 arg1: 0x0p+0 want: 0x0p+0 got: -0x0p+0
> 99 ulp want: 8000000000000001 got: 8000000000000000 round: p asin arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x1p-1074 got: -0x0p+0
> 99 ulp want: 8000000000000000 got: 8000000000000001 round: m atan2 arg0: -0x1p-1022 arg1: 0x1.fffffffffffffp+1023 want: -0x0p+0 got: -0x1p-1074
> 99 ulp want: 8000000000000000 got: 8000000000000001 round: m atan2 arg0: -0x1p-1022 arg1: 0x1.fffffffffffffp+1023 want: -0x0p+0 got: -0x1p-1074
> 99 ulp want: 0 got: 1 round: p exp arg0: -inf arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> 99 ulp want: 0 got: 1 round: p exp arg0: -0x1.75p+9 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> 99 ulp want: 0 got: 1 round: p exp arg0: -0x1.c9c8p+13 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
And these. Of course in some cases the result is still non-conformant
for other reasons like being outside the range of the function, or
==1ulp error (when correct answer is exact) rather than <1ulp.
> all: 69734 fail: 12787 failbad: 107 failepic: 72
Like your choice of naming. :-)
Rich
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: math todo
2012-05-01 14:44 ` Rich Felker
@ 2012-05-01 18:33 ` Szabolcs Nagy
0 siblings, 0 replies; 5+ messages in thread
From: Szabolcs Nagy @ 2012-05-01 18:33 UTC (permalink / raw)
To: musl
* Rich Felker <dalias@aerifal.cx> [2012-05-01 10:44:38 -0400]:
> On Tue, May 01, 2012 at 11:14:58AM +0200, Szabolcs Nagy wrote:
> > yes, actually
> > fesetround(FE_UPWARD);
> > exp(-inf) = 0x1p-1074; // smallest subnormal
> > instead of 0, which is 1ulp depending on your definition of ulp :)
>
> This is incorrect (because the result is exact, the error is ==1ulp,
> and IEEE requires <1ulp). But for the large finite negative arguments,
> 0x1p-1074 is the correctly rounded answer in rounds-upward mode.
>
ok
> > > BTW under asm, we may also want to switch to the faster acos
> > > implementation.
> > ok i'll add it
> > (for double it is known to work, but i havent tested it for
> > the long double case)
>
> Even if we can't switch ld we could still switch it for the
> float/double versions and keep our original asm for ld80.
>
ok
> All of the nextafter stuff looks overly complicated. Shouldn't these
> functions all just be (essentially):
>
> if (x>y) rep_x--;
> else if (x<y) rep_x++;
>
> modulo a little handling for sign and ±0?
>
yes
and that's what is done, but under/overflow flags are
also handled and for double two uint32_t is used instead
of one uint64_t ++
(and comparision is done using int instead of float compare)
it can be a bit simpler
> > 99 ulp want: 0 got: 8000000000000000 round: m acos arg0: 0x1p+0 arg1: 0x0p+0 want: 0x0p+0 got: -0x0p+0
> > 99 ulp want: 8000000000000000 got: 8000000000000001 round: p atan arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
> > 99 ulp want: 0 got: 1 round: m atan arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> > 99 ulp want: 0 got: 1 round: z atan arg0: 0x1p-1074 arg1: 0x0p+0 want: 0x0p+0 got: 0x1p-1074
> > 99 ulp want: 8000000000000000 got: 8000000000000001 round: z atan arg0: -0x1p-1074 arg1: 0x0p+0 want: -0x0p+0 got: -0x1p-1074
>
> IMO this is 1ulp not >99ulp. For denormals ulp is not x * 0x1p-52 but
> the actual last place of significance.
>
yes it is 1ulp, but it's a huge relative error, same for sign differences
> > all: 69734 fail: 12787 failbad: 107 failepic: 72
>
> Like your choice of naming. :-)
:)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-05-01 18:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-01 0:05 math todo Szabolcs Nagy
2012-05-01 1:19 ` Rich Felker
2012-05-01 9:14 ` Szabolcs Nagy
2012-05-01 14:44 ` Rich Felker
2012-05-01 18:33 ` Szabolcs Nagy
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).