mailing list of musl libc
 help / color / mirror / code / Atom feed
* Using float_t and double_t in math functions
@ 2013-05-09  1:43 Rich Felker
  2013-05-09 13:21 ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2013-05-09  1:43 UTC (permalink / raw)
  To: musl

Hi all,

Today I've been doing some experimenting on the relative math
performance of musl and glibc. After eliminating a lot of bogus
results (the gcc 4.4 on my test machine (x86) was causing musl's
configure to use -ffloat-store, which kills performance) things mostly
look good. Aside from sqrt (which is more costly on musl because
glibc's violates the requirement of correct rounding), everything I'm
testing seems faster, in some cases up to five times faster.

While debugging the slowdown from -ffloat-store, one thing I ran
across is that a lot of functions end up performing store/load pairs
to drop excess precision when storing intermediate results. The
situation is much worse with -ffloat-store, but persists with modern
gcc because of -fexcess-precision=standard, which is implied anyway by
-std=c99.

As far as I can tell, in most of the affected code, keeping excess
precision does not hurt the accuracy of the result, and it might even
improve the results. Thus, nsz and I discussed (on IRC) the
possibility of changing intermediate variables in functions that can
accept excess precision from float and double to float_t and double_t.
This would not affect the generated code at all on machines without
excess precision, but on x86 (without SSE) it eliminates all the
costly store/load pairs. As an example (on my test machine), it
dropped the cost of sinf(0.25) from 180 cycles to 130 cycles (glibc
takes 140 cycles, the main difference apparently being that glibc's
math library updates errno).

Unless there are objections, I think we should change float and double
intermediate variables in the implementations of math functions to
float_t and double_t, except where it's actually important to avoid
excess precision. Comments?

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using float_t and double_t in math functions
  2013-05-09  1:43 Using float_t and double_t in math functions Rich Felker
@ 2013-05-09 13:21 ` Szabolcs Nagy
  2013-05-09 14:57   ` Rich Felker
  2013-05-09 16:02   ` Rich Felker
  0 siblings, 2 replies; 5+ messages in thread
From: Szabolcs Nagy @ 2013-05-09 13:21 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@aerifal.cx> [2013-05-08 21:43:27 -0400]:
> As far as I can tell, in most of the affected code, keeping excess
> precision does not hurt the accuracy of the result, and it might even
> improve the results. Thus, nsz and I discussed (on IRC) the
> possibility of changing intermediate variables in functions that can
> accept excess precision from float and double to float_t and double_t.
> This would not affect the generated code at all on machines without
> excess precision, but on x86 (without SSE) it eliminates all the
> costly store/load pairs. As an example (on my test machine), it

ie. it is only for i386 (without sse)
(which is not a trendy platform nowadays)
but there it improves performance and
code size a bit so it is worth doing

at the same time all the STRICT_ASSIGN macros
can be removed (already a noop) which were
there to enforce store with the right precision
on i386 when musl is compiled without -ffloat-store,
but i dont think that should be supported


btw the other ugly macro that remains is
FORCE_EVAL to force evaluation of floating-point
expressions for their side-effect, which should
be eventually

#define FORCE_EVAL(expr) do{ \
_Pragma("STDC FENV_ACCESS ON") \
expr; \
} while(0)

but no compiler supports this that i know of
so now we have volatile hacks with unnecessary
stores

there are a few more 'volatile' in the code
but all should be possible to clean up
(fma and fmaf are probably exceptions similar
to FORCE_EVAL)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using float_t and double_t in math functions
  2013-05-09 13:21 ` Szabolcs Nagy
@ 2013-05-09 14:57   ` Rich Felker
  2013-05-09 19:14     ` Szabolcs Nagy
  2013-05-09 16:02   ` Rich Felker
  1 sibling, 1 reply; 5+ messages in thread
From: Rich Felker @ 2013-05-09 14:57 UTC (permalink / raw)
  To: musl

On Thu, May 09, 2013 at 03:21:57PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@aerifal.cx> [2013-05-08 21:43:27 -0400]:
> > As far as I can tell, in most of the affected code, keeping excess
> > precision does not hurt the accuracy of the result, and it might even
> > improve the results. Thus, nsz and I discussed (on IRC) the
> > possibility of changing intermediate variables in functions that can
> > accept excess precision from float and double to float_t and double_t.
> > This would not affect the generated code at all on machines without
> > excess precision, but on x86 (without SSE) it eliminates all the
> > costly store/load pairs. As an example (on my test machine), it
> 
> ie. it is only for i386 (without sse)
> (which is not a trendy platform nowadays)

Most distros are still using either i486 or original i686 (no SSE1,
much less SSE2) as their x86 target. Of course musl users can opt not
to (and compile musl with -mfpmath=sse -msse2) but for universal
static binaries a more baseline target may be preferable.

> but there it improves performance and
> code size a bit so it is worth doing

Do you want to do it, or do you want me to? I don't mind but you're
more familiar with the code and probably better aware of where it's
okay to change. (BTW, it's probably not safe to change arg-reduction
code, right?)

> at the same time all the STRICT_ASSIGN macros
> can be removed (already a noop) which were
> there to enforce store with the right precision
> on i386 when musl is compiled without -ffloat-store,
> but i dont think that should be supported

Agreed. I was only vaguely aware they were still around.

> btw the other ugly macro that remains is
> FORCE_EVAL to force evaluation of floating-point
> expressions for their side-effect, which should
> be eventually
> 
> #define FORCE_EVAL(expr) do{ \
> _Pragma("STDC FENV_ACCESS ON") \
> expr; \
> } while(0)
> 
> but no compiler supports this that i know of
> so now we have volatile hacks with unnecessary
> stores

I wonder if there's a way to use the result without storing it...
Probably not anything sane. Passing to a function would be more
costly, I think, and would still be a store on stack-based archs
anyway.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using float_t and double_t in math functions
  2013-05-09 13:21 ` Szabolcs Nagy
  2013-05-09 14:57   ` Rich Felker
@ 2013-05-09 16:02   ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Rich Felker @ 2013-05-09 16:02 UTC (permalink / raw)
  To: musl

On Thu, May 09, 2013 at 03:21:57PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@aerifal.cx> [2013-05-08 21:43:27 -0400]:
> > As far as I can tell, in most of the affected code, keeping excess
> > precision does not hurt the accuracy of the result, and it might even
> > improve the results. Thus, nsz and I discussed (on IRC) the
> > possibility of changing intermediate variables in functions that can
> > accept excess precision from float and double to float_t and double_t.
> > This would not affect the generated code at all on machines without
> > excess precision, but on x86 (without SSE) it eliminates all the
> > costly store/load pairs. As an example (on my test machine), it
> 
> ie. it is only for i386 (without sse)
> (which is not a trendy platform nowadays)
> but there it improves performance and
> code size a bit so it is worth doing

By the way, part of the reason I think we should make the change where
it doesn't hurt (and probably helps) accuracy is so we're not telling
people:

"Yes, some math functions in musl are slower than glibc because we're
taking extra care to make sure they give you less-accurate results."

:-)

In practice it's very few that are slower. I think most will just go
from being 2-3 times as fast as glibc to 3-5 times as fast as glibc.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using float_t and double_t in math functions
  2013-05-09 14:57   ` Rich Felker
@ 2013-05-09 19:14     ` Szabolcs Nagy
  0 siblings, 0 replies; 5+ messages in thread
From: Szabolcs Nagy @ 2013-05-09 19:14 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@aerifal.cx> [2013-05-09 10:57:14 -0400]:
> 
> Do you want to do it, or do you want me to? I don't mind but you're
> more familiar with the code and probably better aware of where it's
> okay to change. (BTW, it's probably not safe to change arg-reduction
> code, right?)
> 

i will do it, but it will take some time there are some simple
cases like sinf where it should clearly work (polynomial evals,
most temporaries)

but in other cases the excess precision can actually hurt
(correctly rounded operations, double-double representation,
over/underflow flags should be raised,..)

and there are some non-trivial cases
eg. acosh(x) for large x is ln(2*x), or to avoid overflow:

 log(x) + 0.693147180559945309417232121458176568;

which could be rewritten as

 log(x) + (double_t)0.693147180559945309417232121458176568L;

so gcc recognizes the const and uses fldln2 on i386
(which probably saves 1-2 bytes)
(the double rounding is not an issue for special consts)

so i think this will need to be done with
extensive testing and inspection of the generated code


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-05-09 19:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-09  1:43 Using float_t and double_t in math functions Rich Felker
2013-05-09 13:21 ` Szabolcs Nagy
2013-05-09 14:57   ` Rich Felker
2013-05-09 19:14     ` Szabolcs Nagy
2013-05-09 16:02   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).