math patches for moving bare asm to C inline asm

mailing list of musl libc
 help / color / mirror / code / Atom feed

* math patches for moving bare asm to C inline asm
@ 2020-01-05 16:35 Alexander Monakov
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
                   ` (11 more replies)
  0 siblings, 12 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-05 16:35 UTC (permalink / raw)
  To: musl

Hi,

in a recent thread Rich reiterated that it would be nice to move
from numerous .s files under src/math/$arch to C files with inline
assembly. I'd like to help with that, but at the same time I'd
like to avoid getting again into a situation when I hear how nice
it would be to get something done, and provide patches only to get
(some of) them applied 1.5 years later :)

So to get going I intend to post patches one by one in reply to this thread,
starting from rather trivial updates to x86_64 math files in order to
get feedback about policies and formatting.

I don't expect any such patches applied before the next release, but
I'd like to ask for no huge delays on high-level questions.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
@ 2020-01-05 16:36 ` Alexander Monakov
  2020-01-05 20:05   ` Rich Felker
  2020-01-06  8:40 ` [PATCH] math: move more x86-family fabs functions to C Alexander Monakov
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-05 16:36 UTC (permalink / raw)
  To: musl; +Cc: Alexander Monakov

[-- Attachment #1: Type: text/plain, Size: 636 bytes --]

---

Questions:

Why are there amd64-specific fabs implementations in the first place?
(Only) because GCC generated poor code for the generic C version?

Do annotations for mask manipulation in the patch help? Any way to make
them less ambiguous?


 src/math/x86_64/fabs.c  | 10 ++++++++++
 src/math/x86_64/fabs.s  |  9 ---------
 src/math/x86_64/fabsf.c | 10 ++++++++++
 src/math/x86_64/fabsf.s |  7 -------
 4 files changed, 20 insertions(+), 16 deletions(-)
 create mode 100644 src/math/x86_64/fabs.c
 delete mode 100644 src/math/x86_64/fabs.s
 create mode 100644 src/math/x86_64/fabsf.c
 delete mode 100644 src/math/x86_64/fabsf.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-math-move-x86_64-fabs-fabsf-to-C-with-inline-asm.patch --]
[-- Type: text/x-patch; name="0001-math-move-x86_64-fabs-fabsf-to-C-with-inline-asm.patch", Size: 1420 bytes --]

diff --git a/src/math/x86_64/fabs.c b/src/math/x86_64/fabs.c
new file mode 100644
index 00000000..16562477
--- /dev/null
+++ b/src/math/x86_64/fabs.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+double fabs(double x)
+{
+	double t;
+	__asm__ ("pcmpeqd %0, %0" : "=x"(t));          // t = ~0
+	__asm__ ("psrlq   $1, %0" : "+x"(t));          // t >>= 1
+	__asm__ ("andps   %1, %0" : "+x"(x) : "x"(t)); // x &= t
+	return x;
+}
diff --git a/src/math/x86_64/fabs.s b/src/math/x86_64/fabs.s
deleted file mode 100644
index 5715005e..00000000
--- a/src/math/x86_64/fabs.s
+++ /dev/null
@@ -1,9 +0,0 @@
-.global fabs
-.type fabs,@function
-fabs:
-	xor %eax,%eax
-	dec %rax
-	shr %rax
-	movq %rax,%xmm1
-	andpd %xmm1,%xmm0
-	ret
diff --git a/src/math/x86_64/fabsf.c b/src/math/x86_64/fabsf.c
new file mode 100644
index 00000000..36ea7481
--- /dev/null
+++ b/src/math/x86_64/fabsf.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+float fabsf(float x)
+{
+	float t;
+	__asm__ ("pcmpeqd %0, %0" : "=x"(t));          // t = ~0
+	__asm__ ("psrld   $1, %0" : "+x"(t));          // t >>= 1
+	__asm__ ("andps   %1, %0" : "+x"(x) : "x"(t)); // x &= t
+	return x;
+}
diff --git a/src/math/x86_64/fabsf.s b/src/math/x86_64/fabsf.s
deleted file mode 100644
index 501a1f17..00000000
--- a/src/math/x86_64/fabsf.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global fabsf
-.type fabsf,@function
-fabsf:
-	mov $0x7fffffff,%eax
-	movq %rax,%xmm1
-	andps %xmm1,%xmm0
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
@ 2020-01-05 20:05   ` Rich Felker
  2020-01-05 21:32     ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-01-05 20:05 UTC (permalink / raw)
  To: musl

On Sun, Jan 05, 2020 at 07:36:39PM +0300, Alexander Monakov wrote:
> ---
> 
> Questions:
> 
> Why are there amd64-specific fabs implementations in the first place?
> (Only) because GCC generated poor code for the generic C version?

I think so. It generates:

0000000000000000 <fabs>:
   0:   66 48 0f 7e c0          movq   %xmm0,%rax
   5:   48 ba ff ff ff ff ff    movabs $0x7fffffffffffffff,%rdx
   c:   ff ff 7f
   f:   48 21 d0                and    %rdx,%rax
  12:   48 89 44 24 f8          mov    %rax,-0x8(%rsp)
  17:   f2 0f 10 44 24 f8       movsd  -0x8(%rsp),%xmm0
  1d:   c3                      retq

> Do annotations for mask manipulation in the patch help? Any way to make
> them less ambiguous?

I think so. I like how you did individual asm statements with
dependency relationship between them so compiler could even schedule
them if it likes. I wonder if you could just write 0x7fffffffffffffff
as an operand and have the compiler load it, though.

Rich


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 20:05   ` Rich Felker
@ 2020-01-05 21:32     ` Alexander Monakov
  2020-01-05 22:43       ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-05 21:32 UTC (permalink / raw)
  To: musl

On Sun, 5 Jan 2020, Rich Felker wrote:

> On Sun, Jan 05, 2020 at 07:36:39PM +0300, Alexander Monakov wrote:
> > ---
> > 
> > Questions:
> > 
> > Why are there amd64-specific fabs implementations in the first place?
> > (Only) because GCC generated poor code for the generic C version?
> 
> I think so. It generates:
[snip]

*nod* In my eyes that's a missed optimization, but one that is probably not
going to be fully fixed anytime soon, although for the particular case of
generic fabs gcc-9 has improved:

        movq    %xmm0, %rax
        btrq    $63, %rax
        movq    %rax, %xmm0			

On Aarch64 GCC seems to do better with float bit manipulations (can emit code
that does them on vector registers directly without copying to/from general
registers). On x86 LLVM compiles fabs well, but not copysign.

(ideally the language would allow to express bit manipulations of floats
directly, then compilers probably would have better support as well)

FWIW GCC bugreport is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
but I'm not holding my breath.

By this logic, specialized implementations of copysign are also desirable,
right?  (2 instructions longer than fabs, except for long double)

> > Do annotations for mask manipulation in the patch help? Any way to make
> > them less ambiguous?
> 
> I think so. I like how you did individual asm statements with
> dependency relationship between them so compiler could even schedule
> them if it likes. I wonder if you could just write 0x7fffffffffffffff
> as an operand and have the compiler load it, though.

In this case the mask is so simple that building it with pcmpeq-psrl is cheaper
than loading from memory or moving from a general register. So not using an
immediate is intentional.

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 21:32     ` Alexander Monakov
@ 2020-01-05 22:43       ` Rich Felker
  2020-01-06  8:17         ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-01-05 22:43 UTC (permalink / raw)
  To: musl

On Mon, Jan 06, 2020 at 12:32:38AM +0300, Alexander Monakov wrote:
> 
> 
> On Sun, 5 Jan 2020, Rich Felker wrote:
> 
> > On Sun, Jan 05, 2020 at 07:36:39PM +0300, Alexander Monakov wrote:
> > > ---
> > > 
> > > Questions:
> > > 
> > > Why are there amd64-specific fabs implementations in the first place?
> > > (Only) because GCC generated poor code for the generic C version?
> > 
> > I think so. It generates:
> [snip]
> 
> *nod* In my eyes that's a missed optimization, but one that is probably not
> going to be fully fixed anytime soon, although for the particular case of
> generic fabs gcc-9 has improved:
> 
>         movq    %xmm0, %rax
>         btrq    $63, %rax
>         movq    %rax, %xmm0			
> 
> On Aarch64 GCC seems to do better with float bit manipulations (can emit code
> that does them on vector registers directly without copying to/from general
> registers). On x86 LLVM compiles fabs well, but not copysign.
> 
> (ideally the language would allow to express bit manipulations of floats
> directly, then compilers probably would have better support as well)
> 
> FWIW GCC bugreport is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
> but I'm not holding my breath.
> 
> By this logic, specialized implementations of copysign are also desirable,
> right?  (2 instructions longer than fabs, except for long double)

I'm not sure if "this logic" carries over. fabs is a common operation
(ideally compiler would inline it anyway in the caller, though).
copysign not so much.

Really I'm not even sure it makes sense to have the asm here at all
for fabs either, but perhaps with the gratuitous stack access in the
older-GCC version it does...?

> > > Do annotations for mask manipulation in the patch help? Any way to make
> > > them less ambiguous?
> > 
> > I think so. I like how you did individual asm statements with
> > dependency relationship between them so compiler could even schedule
> > them if it likes. I wonder if you could just write 0x7fffffffffffffff
> > as an operand and have the compiler load it, though.
> 
> In this case the mask is so simple that building it with pcmpeq-psrl is cheaper
> than loading from memory or moving from a general register. So not using an
> immediate is intentional.

OK, I was figuring the compiler might be able to generate it easily
with vector insns if there were no "non-vector" arithmetic/bitwise ops
involved in the use of the result, but that's probably expecting too
much...

Rich


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 22:43       ` Rich Felker
@ 2020-01-06  8:17         ` Alexander Monakov
  0 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-06  8:17 UTC (permalink / raw)
  To: musl

On Sun, 5 Jan 2020, Rich Felker wrote:

> I'm not sure if "this logic" carries over. fabs is a common operation
> (ideally compiler would inline it anyway in the caller, though).
> copysign not so much.

Indeed, in practice we want compilers to recognize and inline simple math
functions, especially on x86 where all float registers are call-clobbered.
This makes it moot, except maybe for musl-internal uses like in floatscan.c,
but there the right solution is probably to use __builtin_ forms if available.

(to be clear: I think musl may not implement math functions in terms of
math builtins because it might result in a circular dependency, but using
builtins for complex math and higher-level functions like in floatscan.c
should be fair game)

For functions like sqrt GCC can be changed to default to -fno-math-errno
on *-musl targets, as progress on flipping it globally seems slow.

> Really I'm not even sure it makes sense to have the asm here at all
> for fabs either, but perhaps with the gratuitous stack access in the
> older-GCC version it does...?

That's why I'm asking, the situation just looks ambiguous to me, I can't
deduce the intent and I could imagine arguments both ways. I think I'd
prefer a more inclusive playground. Please come up with some policy or
guidelines if possible.

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] math: move more x86-family fabs functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
@ 2020-01-06  8:40 ` Alexander Monakov
  2020-03-21 17:06   ` [musl] " Rich Felker
  2020-01-06 16:50 ` [PATCH] math: move trivial x86-family sqrt " Alexander Monakov
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-06  8:40 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

---

Note I'm saving some effort by not touching x32 files, presumably it's
easier to bulk-update them after the dust settles.

 src/math/i386/fabs.c    | 7 +++++++
 src/math/i386/fabs.s    | 6 ------
 src/math/i386/fabsf.c   | 7 +++++++
 src/math/i386/fabsf.s   | 6 ------
 src/math/i386/fabsl.c   | 7 +++++++
 src/math/i386/fabsl.s   | 6 ------
 src/math/x86_64/fabsl.c | 7 +++++++
 src/math/x86_64/fabsl.s | 6 ------
 8 files changed, 28 insertions(+), 24 deletions(-)
 create mode 100644 src/math/i386/fabs.c
 delete mode 100644 src/math/i386/fabs.s
 create mode 100644 src/math/i386/fabsf.c
 delete mode 100644 src/math/i386/fabsf.s
 create mode 100644 src/math/i386/fabsl.c
 delete mode 100644 src/math/i386/fabsl.s
 create mode 100644 src/math/x86_64/fabsl.c
 delete mode 100644 src/math/x86_64/fabsl.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-math-move-more-x86-family-fabs-functions-to-C.patch --]
[-- Type: text/x-patch; name="0002-math-move-more-x86-family-fabs-functions-to-C.patch", Size: 2007 bytes --]

diff --git a/src/math/i386/fabs.c b/src/math/i386/fabs.c
new file mode 100644
index 00000000..39672786
--- /dev/null
+++ b/src/math/i386/fabs.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double fabs(double x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/fabs.s b/src/math/i386/fabs.s
deleted file mode 100644
index d66ea9a1..00000000
--- a/src/math/i386/fabs.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabs
-.type fabs,@function
-fabs:
-	fldl 4(%esp)
-	fabs
-	ret
diff --git a/src/math/i386/fabsf.c b/src/math/i386/fabsf.c
new file mode 100644
index 00000000..d07be321
--- /dev/null
+++ b/src/math/i386/fabsf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float fabs(float x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/fabsf.s b/src/math/i386/fabsf.s
deleted file mode 100644
index a981c422..00000000
--- a/src/math/i386/fabsf.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabsf
-.type fabsf,@function
-fabsf:
-	flds 4(%esp)
-	fabs
-	ret
diff --git a/src/math/i386/fabsl.c b/src/math/i386/fabsl.c
new file mode 100644
index 00000000..cc1c9ed9
--- /dev/null
+++ b/src/math/i386/fabsl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double fabsl(long double x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/fabsl.s b/src/math/i386/fabsl.s
deleted file mode 100644
index ceef9e4c..00000000
--- a/src/math/i386/fabsl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabsl
-.type fabsl,@function
-fabsl:
-	fldt 4(%esp)
-	fabs
-	ret
diff --git a/src/math/x86_64/fabsl.c b/src/math/x86_64/fabsl.c
new file mode 100644
index 00000000..cc1c9ed9
--- /dev/null
+++ b/src/math/x86_64/fabsl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double fabsl(long double x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/x86_64/fabsl.s b/src/math/x86_64/fabsl.s
deleted file mode 100644
index 4e7ab525..00000000
--- a/src/math/x86_64/fabsl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabsl
-.type fabsl,@function
-fabsl:
-	fldt 8(%rsp)
-	fabs
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] math: move trivial x86-family sqrt functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
  2020-01-06  8:40 ` [PATCH] math: move more x86-family fabs functions to C Alexander Monakov
@ 2020-01-06 16:50 ` Alexander Monakov
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-06 16:50 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

---
This does not touch i386 sqrt and sqrtf, which are non-trivial.

 src/math/i386/sqrtl.c   | 7 +++++++
 src/math/i386/sqrtl.s   | 5 -----
 src/math/x86_64/sqrt.c  | 7 +++++++
 src/math/x86_64/sqrt.s  | 4 ----
 src/math/x86_64/sqrtf.c | 7 +++++++
 src/math/x86_64/sqrtf.s | 4 ----
 src/math/x86_64/sqrtl.c | 7 +++++++
 src/math/x86_64/sqrtl.s | 5 -----
 8 files changed, 28 insertions(+), 18 deletions(-)
 create mode 100644 src/math/i386/sqrtl.c
 delete mode 100644 src/math/i386/sqrtl.s
 create mode 100644 src/math/x86_64/sqrt.c
 delete mode 100644 src/math/x86_64/sqrt.s
 create mode 100644 src/math/x86_64/sqrtf.c
 delete mode 100644 src/math/x86_64/sqrtf.s
 create mode 100644 src/math/x86_64/sqrtl.c
 delete mode 100644 src/math/x86_64/sqrtl.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-math-move-trivial-x86-family-sqrt-functions-to-C.patch --]
[-- Type: text/x-patch; name="0003-math-move-trivial-x86-family-sqrt-functions-to-C.patch", Size: 2065 bytes --]

diff --git a/src/math/i386/sqrtl.c b/src/math/i386/sqrtl.c
new file mode 100644
index 00000000..864cfcc4
--- /dev/null
+++ b/src/math/i386/sqrtl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double sqrtl(long double x)
+{
+	__asm__ ("fsqrt" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/sqrtl.s b/src/math/i386/sqrtl.s
deleted file mode 100644
index e0d42616..00000000
--- a/src/math/i386/sqrtl.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global sqrtl
-.type sqrtl,@function
-sqrtl:	fldt 4(%esp)
-	fsqrt
-	ret
diff --git a/src/math/x86_64/sqrt.c b/src/math/x86_64/sqrt.c
new file mode 100644
index 00000000..657e09e3
--- /dev/null
+++ b/src/math/x86_64/sqrt.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double sqrt(double x)
+{
+	__asm__ ("sqrtsd %1, %0" : "=x"(x) : "x"(x));
+	return x;
+}
diff --git a/src/math/x86_64/sqrt.s b/src/math/x86_64/sqrt.s
deleted file mode 100644
index d3c609f9..00000000
--- a/src/math/x86_64/sqrt.s
+++ /dev/null
@@ -1,4 +0,0 @@
-.global sqrt
-.type sqrt,@function
-sqrt:	sqrtsd %xmm0, %xmm0
-	ret
diff --git a/src/math/x86_64/sqrtf.c b/src/math/x86_64/sqrtf.c
new file mode 100644
index 00000000..720baec6
--- /dev/null
+++ b/src/math/x86_64/sqrtf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float sqrtf(float x)
+{
+	__asm__ ("sqrtss %1, %0" : "=x"(x) : "x"(x));
+	return x;
+}
diff --git a/src/math/x86_64/sqrtf.s b/src/math/x86_64/sqrtf.s
deleted file mode 100644
index eec48c60..00000000
--- a/src/math/x86_64/sqrtf.s
+++ /dev/null
@@ -1,4 +0,0 @@
-.global sqrtf
-.type sqrtf,@function
-sqrtf:  sqrtss %xmm0, %xmm0
-	ret
diff --git a/src/math/x86_64/sqrtl.c b/src/math/x86_64/sqrtl.c
new file mode 100644
index 00000000..864cfcc4
--- /dev/null
+++ b/src/math/x86_64/sqrtl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double sqrtl(long double x)
+{
+	__asm__ ("fsqrt" : "+t"(x));
+	return x;
+}
diff --git a/src/math/x86_64/sqrtl.s b/src/math/x86_64/sqrtl.s
deleted file mode 100644
index 23cd687d..00000000
--- a/src/math/x86_64/sqrtl.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global sqrtl
-.type sqrtl,@function
-sqrtl:	fldt 8(%rsp)
-	fsqrt
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] math: move i386 sqrtf to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (2 preceding siblings ...)
  2020-01-06 16:50 ` [PATCH] math: move trivial x86-family sqrt " Alexander Monakov
@ 2020-01-06 17:43 ` Alexander Monakov
  2020-01-06 18:32   ` Pascal Cuoq
  2020-01-09 15:55   ` Alexander Monakov
  2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-06 17:43 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 460 bytes --]

---
Yay, first not entirely trivial function.

Apparently that double rounding yields the correct result here was
suggested by Stephen Canon in https://stackoverflow.com/a/9689746/4755075
Should his contribution be mentioned somehow in the source code?

 src/math/i386/sqrtf.c | 8 ++++++++
 src/math/i386/sqrtf.s | 7 -------
 2 files changed, 8 insertions(+), 7 deletions(-)
 create mode 100644 src/math/i386/sqrtf.c
 delete mode 100644 src/math/i386/sqrtf.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0004-math-move-i386-sqrtf-to-C.patch --]
[-- Type: text/x-patch; name="0004-math-move-i386-sqrtf-to-C.patch", Size: 559 bytes --]

diff --git a/src/math/i386/sqrtf.c b/src/math/i386/sqrtf.c
new file mode 100644
index 00000000..2bcc7754
--- /dev/null
+++ b/src/math/i386/sqrtf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+float sqrtf(float x)
+{
+	long double t;
+	__asm__ ("fsqrt" : "=t"(t) : "0"(x));
+	return (float)t;
+}
diff --git a/src/math/i386/sqrtf.s b/src/math/i386/sqrtf.s
deleted file mode 100644
index 9e944f45..00000000
--- a/src/math/i386/sqrtf.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global sqrtf
-.type sqrtf,@function
-sqrtf:	flds 4(%esp)
-	fsqrt
-	fstps 4(%esp)
-	flds 4(%esp)
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
@ 2020-01-06 18:32   ` Pascal Cuoq
  2020-01-09 15:55   ` Alexander Monakov
  1 sibling, 0 replies; 55+ messages in thread
From: Pascal Cuoq @ 2020-01-06 18:32 UTC (permalink / raw)
  To: musl

> On 6 Jan 2020, at 12:44, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> Apparently that double rounding yields the correct result here was
> suggested by Stephen Canon in https://stackoverflow.com/a/9689746/4755075
> Should his contribution be mentioned somehow in the source code?

You should rather credit Figueroa, for instance in the article “when is double rounding innocuous?”. This would be more of a primary source, and Figueroa wrote demonstrations.

Posting from phone, I apologize if I got something wrong. I'm pretty confident Stephen was implicitly citing Figueroa's results though.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] math: move i386 sqrt to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (3 preceding siblings ...)
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
@ 2020-01-07 13:06 ` Alexander Monakov
  2020-01-08  7:26   ` Rich Felker
  2020-03-21 17:53   ` [musl] " Rich Felker
  2020-01-11 15:06 ` [PATCH] math: move x86_64 (l)lrint(f) functions " Alexander Monakov
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-07 13:06 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

---
Since union ldshape does not have a dedicated field for 32 least significant
bits of the x87 long double mantissa, keeping the original approach with

    ux.i.m -= (fpsr & 0x200) - 0x100;

would lead to a 64-bit subtraction that is not trivial for the compiler to
optimize to 32-bit subtraction as done in the original assembly. Therefore
I have elected to change the approach and use

    ux.i.m ^= (fpsr & 0x200) + 0x200;

which is easier to optimize to a 32-bit rather than 64-bit xor.

Thoughts?

 src/math/i386/sqrt.c | 15 +++++++++++++++
 src/math/i386/sqrt.s | 21 ---------------------
 2 files changed, 15 insertions(+), 21 deletions(-)
 create mode 100644 src/math/i386/sqrt.c
 delete mode 100644 src/math/i386/sqrt.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-math-move-i386-sqrt-to-C.patch --]
[-- Type: text/x-patch; name="0005-math-move-i386-sqrt-to-C.patch", Size: 1103 bytes --]

diff --git a/src/math/i386/sqrt.c b/src/math/i386/sqrt.c
new file mode 100644
index 00000000..619df056
--- /dev/null
+++ b/src/math/i386/sqrt.c
@@ -0,0 +1,15 @@
+#include "libm.h"
+
+double sqrt(double x)
+{
+	union ldshape ux;
+	unsigned fpsr;
+	__asm__ ("fsqrt; fnstsw %%ax": "=t"(ux.f), "=a"(fpsr) : "0"(x));
+	if ((ux.i.m & 0x7ff) != 0x400)
+		return (double)ux.f;
+	/* Rounding to double would have encountered an exact halfway case.
+	   Adjust mantissa downwards if fsqrt rounded up, else upwards.
+	   (result of fsqrt could not have been exact) */
+	ux.i.m ^= (fpsr & 0x200) + 0x200;
+	return (double)ux.f;
+}
diff --git a/src/math/i386/sqrt.s b/src/math/i386/sqrt.s
deleted file mode 100644
index 57837e25..00000000
--- a/src/math/i386/sqrt.s
+++ /dev/null
@@ -1,21 +0,0 @@
-.global sqrt
-.type sqrt,@function
-sqrt:	fldl 4(%esp)
-	fsqrt
-	fnstsw %ax
-	sub $12,%esp
-	fld %st(0)
-	fstpt (%esp)
-	mov (%esp),%ecx
-	and $0x7ff,%ecx
-	cmp $0x400,%ecx
-	jnz 1f
-	and $0x200,%eax
-	sub $0x100,%eax
-	sub %eax,(%esp)
-	fstp %st(0)
-	fldt (%esp)
-1:	add $12,%esp
-	fstpl 4(%esp)
-	fldl 4(%esp)
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrt to C
  2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
@ 2020-01-08  7:26   ` Rich Felker
  2020-03-21 17:53   ` [musl] " Rich Felker
  1 sibling, 0 replies; 55+ messages in thread
From: Rich Felker @ 2020-01-08  7:26 UTC (permalink / raw)
  To: musl

On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> ---
> Since union ldshape does not have a dedicated field for 32 least significant
> bits of the x87 long double mantissa, keeping the original approach with
> 
>     ux.i.m -= (fpsr & 0x200) - 0x100;
> 
> would lead to a 64-bit subtraction that is not trivial for the compiler to
> optimize to 32-bit subtraction as done in the original assembly. Therefore
> I have elected to change the approach and use
> 
>     ux.i.m ^= (fpsr & 0x200) + 0x200;
> 
> which is easier to optimize to a 32-bit rather than 64-bit xor.
> 
> Thoughts?

I think it looks like a good change. Ideally the compiler would see,
since the branch was taken where the low bits are 0x400, that
subtracting a value less than 0x400 can't borrow from the high 32
bits. But I don't think current compilers do such range/value
analysis, and it doesn't seem like there are anny downsides to your
approach anyway.

Rich


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
  2020-01-06 18:32   ` Pascal Cuoq
@ 2020-01-09 15:55   ` Alexander Monakov
  2020-01-09 17:00     ` Rich Felker
  1 sibling, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-09 15:55 UTC (permalink / raw)
  To: musl

I should have asked earlier, but - is everyone happy with this style of
expressing removal of excess precision, or should I use eval_as_float?

+{
+       long double t;
[...]
+       return (float)t;
+}

Alexander


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 15:55   ` Alexander Monakov
@ 2020-01-09 17:00     ` Rich Felker
  2020-01-09 21:00       ` Szabolcs Nagy
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-01-09 17:00 UTC (permalink / raw)
  To: musl

On Thu, Jan 09, 2020 at 06:55:11PM +0300, Alexander Monakov wrote:
> I should have asked earlier, but - is everyone happy with this style of
> expressing removal of excess precision, or should I use eval_as_float?
> 
> +{
> +       long double t;
> [...]
> +       return (float)t;
> +}

musl requires C99+ with conforming floating point behavior so I think
this is okay, and I'd prefer to avoid having dependency on libm.h
machinery that's not strictly needed in src/math/$(ARCH) since it
increases the cost of changing that machinery (requires looking at
arch-specific files). My recollection was that eval_as_float is mostly
an artifact of nsz's math functions supporting and being compatible
with somewhat non-conforming compilers.

If we want to ensure correct rounding (important for sqrt[f]) even on
broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
perhaps we should just do the store from asm too?

Note that eval_as_float only helps if -ffloat-store is used, which is
a nasty hack and also nonconforming, arguably worse than the behavior
without it, so we should probably drop use of that as a fallback, and
use fp_barrier[f] instead if needed.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 17:00     ` Rich Felker
@ 2020-01-09 21:00       ` Szabolcs Nagy
  2020-01-09 22:00         ` Rich Felker
  2020-01-14 17:59         ` [musl] " Alexander Monakov
  0 siblings, 2 replies; 55+ messages in thread
From: Szabolcs Nagy @ 2020-01-09 21:00 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-01-09 12:00:02 -0500]:
> On Thu, Jan 09, 2020 at 06:55:11PM +0300, Alexander Monakov wrote:
> > I should have asked earlier, but - is everyone happy with this style of
> > expressing removal of excess precision, or should I use eval_as_float?
> > 
> > +{
> > +       long double t;
> > [...]
> > +       return (float)t;
> > +}
> 
> musl requires C99+ with conforming floating point behavior so I think
> this is okay, and I'd prefer to avoid having dependency on libm.h
> machinery that's not strictly needed in src/math/$(ARCH) since it
> increases the cost of changing that machinery (requires looking at
> arch-specific files). My recollection was that eval_as_float is mostly
> an artifact of nsz's math functions supporting and being compatible
> with somewhat non-conforming compilers.

it's an artifact of glibc using -fexcess-precision=fast,
so the compiler can do clever transformations (without
relying on float_t and double_t), and then they annotate
where the excess precision really has to be dropped
(using inline asm).

i wanted to upstream that code into glibc as well as other
math libraries and c99 standard excess precision handling
is not the default in most environments (e.g. gcc default
-std=gnu* or iso c++) and actually the annotation is rarely
needed in math functions: only exact arithmetic tricks
need it or at return from public api functions (like in
this case) to force a final rounding for side-effects
or for sane call abi.

so for me it made sense to add explicit annotations which
document where this issue is really relevant and allows
using the code in more environments by just redefining
the eval_as_float etc calls. (on systems where this is
relevant results can be different compared to other targets
anyway, so allowing more variance depending on different
excess-precision settings is not that big deal.)

about dropping excess precision at return:

c99 required cast at return, but that was considered to be
a bug so in c11 it's no longer required: return now must
round. of course in non-standard mode return continues to
keep excess precision, which seems to require to always
annotate returns, but in most math functions all relevant
fenv side-effects of the final rounding would need special
handling anyway to set errno (assuming you still care about
errno: glibc does so my code also cares about it), so
overflow etc can't happen in the normal return paths, all
such cases are directed to special code that needs special
annotations anyway, so return annotation rarely comes up
(other than with correctly rounded functions like sqrt).

> 
> If we want to ensure correct rounding (important for sqrt[f]) even on
> broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
> perhaps we should just do the store from asm too?

that would be a bit safer, but then correct compiler would
store twice i think. it's hard to get excited about this
issue: it only matters on m68k and i386 which are not the
main targets for new float code (and old code had to deal
with this and bigger brokenness already).

> 
> Note that eval_as_float only helps if -ffloat-store is used, which is
> a nasty hack and also nonconforming, arguably worse than the behavior
> without it, so we should probably drop use of that as a fallback, and
> use fp_barrier[f] instead if needed.

i think -ffloat-store almost always drops excess precision
including returns and assignments, so with that no
annotation is needed. but yes the way the annotation is
defined now is not useful against broken compilers or
non-standard excess precision setting, in glibc the
annotation is defined differently (with inline asm).

btw, i'm not against the cast, but in my optimized-routines
code i will keep using my annotations.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 21:00       ` Szabolcs Nagy
@ 2020-01-09 22:00         ` Rich Felker
  2020-01-09 23:18           ` Szabolcs Nagy
  2020-01-14 17:59         ` [musl] " Alexander Monakov
  1 sibling, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-01-09 22:00 UTC (permalink / raw)
  To: musl

On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > If we want to ensure correct rounding (important for sqrt[f]) even on
> > broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
> > perhaps we should just do the store from asm too?
> 
> that would be a bit safer, but then correct compiler would
> store twice i think.

If you did something like:

	float y = expr_with_excess_precision;
	__asm__( "" : "+m"(y));
	return y;

then I think you'd get just one store and one load, as intended. It
seems to work as intended here. Oddly though my local gcc (7.3) is
gratuitously pushing/popping a single gpr to align stack to 8 (but not
16) despite being a leaf function. No idea why.

> it's hard to get excited about this
> issue: it only matters on m68k and i386 which are not the
> main targets for new float code (and old code had to deal
> with this and bigger brokenness already).

Indeed, but context of present thread is getting rid of the i386 asm
files so it's relevant here.

> > Note that eval_as_float only helps if -ffloat-store is used, which is
> > a nasty hack and also nonconforming, arguably worse than the behavior
> > without it, so we should probably drop use of that as a fallback, and
> > use fp_barrier[f] instead if needed.
> 
> i think -ffloat-store almost always drops excess precision
> including returns and assignments, so with that no
> annotation is needed. but yes the way the annotation is
> defined now is not useful against broken compilers or
> non-standard excess precision setting, in glibc the
> annotation is defined differently (with inline asm).

I was thinking in the context of wanting to remove from configure the:

|| { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }

which is probably doing more harm than good. Do you know if there are
things that'd break if we did that? I think eval_as_float should
probably be defined as fp_barrierf to make it safe in your code,
conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).

Rich


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 22:00         ` Rich Felker
@ 2020-01-09 23:18           ` Szabolcs Nagy
  2020-01-10  2:07             ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Szabolcs Nagy @ 2020-01-09 23:18 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-01-09 17:00:14 -0500]:
> On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > > Note that eval_as_float only helps if -ffloat-store is used, which is
> > > a nasty hack and also nonconforming, arguably worse than the behavior
> > > without it, so we should probably drop use of that as a fallback, and
> > > use fp_barrier[f] instead if needed.
> > 
> > i think -ffloat-store almost always drops excess precision
> > including returns and assignments, so with that no
> > annotation is needed. but yes the way the annotation is
> > defined now is not useful against broken compilers or
> > non-standard excess precision setting, in glibc the
> > annotation is defined differently (with inline asm).
> 
> I was thinking in the context of wanting to remove from configure the:
> 
> || { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }
> 
> which is probably doing more harm than good. Do you know if there are
> things that'd break if we did that? I think eval_as_float should
> probably be defined as fp_barrierf to make it safe in your code,
> conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).

i think -fexcess-precision=standard was introduced in
gcc 4.5 and to get reliable behaviour before that we
needed -ffloat-store. the freebsd math code had
volatile hacks to avoid -ffloat-store but those were
incomplete (there were only a few bugs e.g. see commit
c4359e01303da2755fe7e8033826b132eb3659b1 freebsd sets
x87 precision to double so for them some of these were
not issues in practice).

since we had -ffloat-store i turned off the volatile
hacks in commit 6d3f1a39c14b12026df84f386875b094e3652990
and later completely removed the annotations in commit
9b0fcb441a44456c7b071c7cdaf90403f81ec05a

on new compilers -fexcess-precision=standard is used,
but that turned out to do too many stores on the fdlibm
code (which is why glibc kept using =fast), so in commit
e216951f509b71da193da2fc63e25b998740d58b i started using
float_t and double_t to get fast code in standard mode.
(of course this made things worse for -ffloat-store).

i think we would need to add back the old annotations
to make old compilers safe without -ffloat-store.
(fdlibm often raises fenv exceptions via a final rounding
before return, those could be often handled more cleanly
by __math_oflow etc helpers, but since it was not designed
for inline errno handling some normal return paths can
raise fp exceptions too and thus need eval_as_* annotation).

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 23:18           ` Szabolcs Nagy
@ 2020-01-10  2:07             ` Rich Felker
  2020-01-10  9:17               ` Szabolcs Nagy
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-01-10  2:07 UTC (permalink / raw)
  To: musl

On Fri, Jan 10, 2020 at 12:18:58AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2020-01-09 17:00:14 -0500]:
> > On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > > > Note that eval_as_float only helps if -ffloat-store is used, which is
> > > > a nasty hack and also nonconforming, arguably worse than the behavior
> > > > without it, so we should probably drop use of that as a fallback, and
> > > > use fp_barrier[f] instead if needed.
> > > 
> > > i think -ffloat-store almost always drops excess precision
> > > including returns and assignments, so with that no
> > > annotation is needed. but yes the way the annotation is
> > > defined now is not useful against broken compilers or
> > > non-standard excess precision setting, in glibc the
> > > annotation is defined differently (with inline asm).
> > 
> > I was thinking in the context of wanting to remove from configure the:
> > 
> > || { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }
> > 
> > which is probably doing more harm than good. Do you know if there are
> > things that'd break if we did that? I think eval_as_float should
> > probably be defined as fp_barrierf to make it safe in your code,
> > conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).
> 
> i think -fexcess-precision=standard was introduced in
> gcc 4.5 and to get reliable behaviour before that we
> needed -ffloat-store.

I don't think the behavior was "reliable" with -ffloat-store; it's
wrong with respect to the defined meaning of FLT_EVAL_METHOD.

> since we had -ffloat-store i turned off the volatile
> hacks in commit 6d3f1a39c14b12026df84f386875b094e3652990
> and later completely removed the annotations in commit
> 9b0fcb441a44456c7b071c7cdaf90403f81ec05a

Thanks for these references.

> on new compilers -fexcess-precision=standard is used,
> but that turned out to do too many stores on the fdlibm
> code (which is why glibc kept using =fast), so in commit
> e216951f509b71da193da2fc63e25b998740d58b i started using
> float_t and double_t to get fast code in standard mode.
> (of course this made things worse for -ffloat-store).

This was the right thing to do, and I think it largely but not
entirely eliminates the need for caring about how the compiler handles
this, except in a few cases. It could probably be eliminated in more.
For example the argument reduction code cited above could use the
right constants for double_t rather than double to avoid the need to
store/load to drop excess precision.

> i think we would need to add back the old annotations
> to make old compilers safe without -ffloat-store.
> (fdlibm often raises fenv exceptions via a final rounding
> before return, those could be often handled more cleanly
> by __math_oflow etc helpers, but since it was not designed
> for inline errno handling some normal return paths can
> raise fp exceptions too and thus need eval_as_* annotation).

I think I asked you before, but from a standpoint of fenv stuff, I'm
confused why the eval_as_* things are useful at all; it looks like you
would need fp_barrier* to ensure they're actually evaluated (e.g. in
the presence of LTO with a compiler that doesn't honor fenv right).

But I think it's also useful to distinguish between possibility of
wrong exceptions being raised, which is a rather minor issue since
some widely-used compilers don't even support fenv reasonably at at
all, and the possibility of wrong values being returned for functions
where the result is required to be correctly rounded. I would deem it
a serious problem for sqrt[f] or fma[f] to return the wrong value when
compiled with gcc3 or pcc. I don't think I would particularly care if
exceptions failed to be raised properly when compiled with gcc3 or
pcc, though. So I probably would like to ensure that, whatever code we
end up with in i386 sqrt[f].c, it it ends up working even if the
compiler does not handle excess precision correctly.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-10  2:07             ` Rich Felker
@ 2020-01-10  9:17               ` Szabolcs Nagy
  0 siblings, 0 replies; 55+ messages in thread
From: Szabolcs Nagy @ 2020-01-10  9:17 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-01-09 21:07:47 -0500]:
> On Fri, Jan 10, 2020 at 12:18:58AM +0100, Szabolcs Nagy wrote:
> > i think -fexcess-precision=standard was introduced in
> > gcc 4.5 and to get reliable behaviour before that we
> > needed -ffloat-store.
> 
> I don't think the behavior was "reliable" with -ffloat-store; it's
> wrong with respect to the defined meaning of FLT_EVAL_METHOD.

well reliable in the sense that the results are more
consistent with other targets that have FLT_EVAL_METHOD==0.

> > i think we would need to add back the old annotations
> > to make old compilers safe without -ffloat-store.
> > (fdlibm often raises fenv exceptions via a final rounding
> > before return, those could be often handled more cleanly
> > by __math_oflow etc helpers, but since it was not designed
> > for inline errno handling some normal return paths can
> > raise fp exceptions too and thus need eval_as_* annotation).
> 
> I think I asked you before, but from a standpoint of fenv stuff, I'm
> confused why the eval_as_* things are useful at all; it looks like you
> would need fp_barrier* to ensure they're actually evaluated (e.g. in
> the presence of LTO with a compiler that doesn't honor fenv right).

if you mean the current definition of eval_as_* in musl,
then those are not useful (except in the middle of some
arithmetic expression or assignment to double_t etc
where c99 normally would not drop excess precision)

their point is to annotate where we need to drop excess
precision so we can define it appropriately for specific
targets (and allow all other places to use excess prec).

e.g. in case of 'return 0x1p999*0x1p999' to raise overflow
and return inf, we need fp barrier to avoid const folds and
eval_as_double to drop excess precision so it becomes:

  return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

in principle fp_barrier need not drop excess precision so

  return fp_barrier(fp_barrier(0x1p999) * 0x1p999);

would not be enough, but i ended up using float and double
in fp_barrier instead of float_t and double_t so now it
drops excess precision too (may be a mistake, but we almost
always want to drop excess prec when forcing an evaluation).
separate eval_as_double is still useful since on most targets
it is a nop while fp_barrier is not a nop. (but e.g. i386
could define eval_as_double as fp_barrier)

the bigger problem with fp_barrier is that it just hides
a value (behind volatile or asm volatile) but it should be
forcing the mul operation, e.g. currently

  if (cond)
    return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

may be transformed to

  if (cond)
    x = fp_barrier(0x1p999);
  y = x * 0x1p999;
  if (cond)
    return eval_as_double(y);

by a compiler that assumes mul has no side-effects and
then the mul is evaluated unconditionally (but such
transformation is unlikely so in practice the barrier
works).

i don't think there is a trick that allows us to force
the evaluation of an individual fp op, so for now i'm
happy with fp_barrier and eval_as_*.

> 
> But I think it's also useful to distinguish between possibility of
> wrong exceptions being raised, which is a rather minor issue since
> some widely-used compilers don't even support fenv reasonably at at
> all, and the possibility of wrong values being returned for functions
> where the result is required to be correctly rounded. I would deem it
> a serious problem for sqrt[f] or fma[f] to return the wrong value when
> compiled with gcc3 or pcc. I don't think I would particularly care if
> exceptions failed to be raised properly when compiled with gcc3 or
> pcc, though. So I probably would like to ensure that, whatever code we
> end up with in i386 sqrt[f].c, it it ends up working even if the
> compiler does not handle excess precision correctly.

note that rounding down to double to force an overflow
usually also forces an infinity and without the force
the result would be >DBL_MAX finite which then can cause
problems, just like sqrt with excess prec.

so most eval_as_double has the same importance: if any
of them is missed the result may be wrong.

i have fp_force_eval which has no return value and is
only used to force side-effects, if you don't care about
fenv then that may be defined as nop.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] math: move x86_64 (l)lrint(f) functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (4 preceding siblings ...)
  2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
@ 2020-01-11 15:06 ` Alexander Monakov
  2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-11 15:06 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

---
Some more trivial conversions.

 src/math/x86_64/llrint.c  | 8 ++++++++
 src/math/x86_64/llrint.s  | 5 -----
 src/math/x86_64/llrintf.c | 8 ++++++++
 src/math/x86_64/llrintf.s | 5 -----
 src/math/x86_64/lrint.c   | 8 ++++++++
 src/math/x86_64/lrint.s   | 5 -----
 src/math/x86_64/lrintf.c  | 8 ++++++++
 src/math/x86_64/lrintf.s  | 5 -----
 8 files changed, 32 insertions(+), 20 deletions(-)
 create mode 100644 src/math/x86_64/llrint.c
 delete mode 100644 src/math/x86_64/llrint.s
 create mode 100644 src/math/x86_64/llrintf.c
 delete mode 100644 src/math/x86_64/llrintf.s
 create mode 100644 src/math/x86_64/lrint.c
 delete mode 100644 src/math/x86_64/lrint.s
 create mode 100644 src/math/x86_64/lrintf.c
 delete mode 100644 src/math/x86_64/lrintf.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0006-math-move-x86_64-l-lrint-f-functions-to-C.patch --]
[-- Type: text/x-patch; name="0006-math-move-x86_64-l-lrint-f-functions-to-C.patch", Size: 2206 bytes --]

diff --git a/src/math/x86_64/llrint.c b/src/math/x86_64/llrint.c
new file mode 100644
index 00000000..dd38a722
--- /dev/null
+++ b/src/math/x86_64/llrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrint(double x)
+{
+	long long r;
+	__asm__ ("cvtsd2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/llrint.s b/src/math/x86_64/llrint.s
deleted file mode 100644
index bf476498..00000000
--- a/src/math/x86_64/llrint.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global llrint
-.type llrint,@function
-llrint:
-	cvtsd2si %xmm0,%rax
-	ret
diff --git a/src/math/x86_64/llrintf.c b/src/math/x86_64/llrintf.c
new file mode 100644
index 00000000..fc8625e8
--- /dev/null
+++ b/src/math/x86_64/llrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintf(float x)
+{
+	long long r;
+	__asm__ ("cvtss2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/llrintf.s b/src/math/x86_64/llrintf.s
deleted file mode 100644
index d7204ac0..00000000
--- a/src/math/x86_64/llrintf.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global llrintf
-.type llrintf,@function
-llrintf:
-	cvtss2si %xmm0,%rax
-	ret
diff --git a/src/math/x86_64/lrint.c b/src/math/x86_64/lrint.c
new file mode 100644
index 00000000..a742fec6
--- /dev/null
+++ b/src/math/x86_64/lrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrint(double x)
+{
+	long r;
+	__asm__ ("cvtsd2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/lrint.s b/src/math/x86_64/lrint.s
deleted file mode 100644
index 15fc2454..00000000
--- a/src/math/x86_64/lrint.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global lrint
-.type lrint,@function
-lrint:
-	cvtsd2si %xmm0,%rax
-	ret
diff --git a/src/math/x86_64/lrintf.c b/src/math/x86_64/lrintf.c
new file mode 100644
index 00000000..2ba5639d
--- /dev/null
+++ b/src/math/x86_64/lrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintf(float x)
+{
+	long r;
+	__asm__ ("cvtss2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/lrintf.s b/src/math/x86_64/lrintf.s
deleted file mode 100644
index 488423d2..00000000
--- a/src/math/x86_64/lrintf.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global lrintf
-.type lrintf,@function
-lrintf:
-	cvtss2si %xmm0,%rax
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] math: move more x86-family lrint functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (5 preceding siblings ...)
  2020-01-11 15:06 ` [PATCH] math: move x86_64 (l)lrint(f) functions " Alexander Monakov
@ 2020-01-11 15:23 ` Alexander Monakov
  2020-01-11 16:07   ` Rich Felker
  2020-01-14 11:54 ` [musl] [PATCH] math: move x86-family rint " Alexander Monakov
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-11 15:23 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]

---
It was news to me that GCC inline asm conventions allow to specify
effect on x87 register stack. Here a clobber tells the compiler
that the asm pops the topmost element off the stack.

 src/math/i386/llrint.c    | 8 ++++++++
 src/math/i386/llrint.s    | 8 --------
 src/math/i386/llrintf.c   | 8 ++++++++
 src/math/i386/llrintf.s   | 9 ---------
 src/math/i386/llrintl.c   | 8 ++++++++
 src/math/i386/llrintl.s   | 8 --------
 src/math/i386/lrint.c     | 8 ++++++++
 src/math/i386/lrint.s     | 7 -------
 src/math/i386/lrintf.c    | 8 ++++++++
 src/math/i386/lrintf.s    | 7 -------
 src/math/i386/lrintl.c    | 8 ++++++++
 src/math/i386/lrintl.s    | 7 -------
 src/math/x86_64/llrintl.c | 8 ++++++++
 src/math/x86_64/llrintl.s | 7 -------
 src/math/x86_64/lrintl.c  | 8 ++++++++
 src/math/x86_64/lrintl.s  | 7 -------
 16 files changed, 64 insertions(+), 60 deletions(-)
 create mode 100644 src/math/i386/llrint.c
 delete mode 100644 src/math/i386/llrint.s
 create mode 100644 src/math/i386/llrintf.c
 delete mode 100644 src/math/i386/llrintf.s
 create mode 100644 src/math/i386/llrintl.c
 delete mode 100644 src/math/i386/llrintl.s
 create mode 100644 src/math/i386/lrint.c
 delete mode 100644 src/math/i386/lrint.s
 create mode 100644 src/math/i386/lrintf.c
 delete mode 100644 src/math/i386/lrintf.s
 create mode 100644 src/math/i386/lrintl.c
 delete mode 100644 src/math/i386/lrintl.s
 create mode 100644 src/math/x86_64/llrintl.c
 delete mode 100644 src/math/x86_64/llrintl.s
 create mode 100644 src/math/x86_64/lrintl.c
 delete mode 100644 src/math/x86_64/lrintl.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0007-math-move-more-x86-family-lrint-functions-to-C.patch --]
[-- Type: text/x-patch; name="0007-math-move-more-x86-family-lrint-functions-to-C.patch", Size: 4690 bytes --]

diff --git a/src/math/i386/llrint.c b/src/math/i386/llrint.c
new file mode 100644
index 00000000..aa400817
--- /dev/null
+++ b/src/math/i386/llrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrint(double x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/llrint.s b/src/math/i386/llrint.s
deleted file mode 100644
index 8e89cd91..00000000
--- a/src/math/i386/llrint.s
+++ /dev/null
@@ -1,8 +0,0 @@
-.global llrint
-.type llrint,@function
-llrint:
-	fldl 4(%esp)
-	fistpll 4(%esp)
-	mov 4(%esp),%eax
-	mov 8(%esp),%edx
-	ret
diff --git a/src/math/i386/llrintf.c b/src/math/i386/llrintf.c
new file mode 100644
index 00000000..c41a317b
--- /dev/null
+++ b/src/math/i386/llrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintf(float x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/llrintf.s b/src/math/i386/llrintf.s
deleted file mode 100644
index aa850c6c..00000000
--- a/src/math/i386/llrintf.s
+++ /dev/null
@@ -1,9 +0,0 @@
-.global llrintf
-.type llrintf,@function
-llrintf:
-	sub $8,%esp
-	flds 12(%esp)
-	fistpll (%esp)
-	pop %eax
-	pop %edx
-	ret
diff --git a/src/math/i386/llrintl.c b/src/math/i386/llrintl.c
new file mode 100644
index 00000000..c439ef28
--- /dev/null
+++ b/src/math/i386/llrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintl(long double x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/llrintl.s b/src/math/i386/llrintl.s
deleted file mode 100644
index 1cfb56f1..00000000
--- a/src/math/i386/llrintl.s
+++ /dev/null
@@ -1,8 +0,0 @@
-.global llrintl
-.type llrintl,@function
-llrintl:
-	fldt 4(%esp)
-	fistpll 4(%esp)
-	mov 4(%esp),%eax
-	mov 8(%esp),%edx
-	ret
diff --git a/src/math/i386/lrint.c b/src/math/i386/lrint.c
new file mode 100644
index 00000000..f16b8a9c
--- /dev/null
+++ b/src/math/i386/lrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrint(double x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/lrint.s b/src/math/i386/lrint.s
deleted file mode 100644
index 02b83d9f..00000000
--- a/src/math/i386/lrint.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrint
-.type lrint,@function
-lrint:
-	fldl 4(%esp)
-	fistpl 4(%esp)
-	mov 4(%esp),%eax
-	ret
diff --git a/src/math/i386/lrintf.c b/src/math/i386/lrintf.c
new file mode 100644
index 00000000..932d353e
--- /dev/null
+++ b/src/math/i386/lrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintf(float x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/lrintf.s b/src/math/i386/lrintf.s
deleted file mode 100644
index 907aac29..00000000
--- a/src/math/i386/lrintf.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrintf
-.type lrintf,@function
-lrintf:
-	flds 4(%esp)
-	fistpl 4(%esp)
-	mov 4(%esp),%eax
-	ret
diff --git a/src/math/i386/lrintl.c b/src/math/i386/lrintl.c
new file mode 100644
index 00000000..068e2e4d
--- /dev/null
+++ b/src/math/i386/lrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintl(long double x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/lrintl.s b/src/math/i386/lrintl.s
deleted file mode 100644
index 3ae05aac..00000000
--- a/src/math/i386/lrintl.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrintl
-.type lrintl,@function
-lrintl:
-	fldt 4(%esp)
-	fistpl 4(%esp)
-	mov 4(%esp),%eax
-	ret
diff --git a/src/math/x86_64/llrintl.c b/src/math/x86_64/llrintl.c
new file mode 100644
index 00000000..c439ef28
--- /dev/null
+++ b/src/math/x86_64/llrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintl(long double x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/x86_64/llrintl.s b/src/math/x86_64/llrintl.s
deleted file mode 100644
index 1ec0817d..00000000
--- a/src/math/x86_64/llrintl.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global llrintl
-.type llrintl,@function
-llrintl:
-	fldt 8(%rsp)
-	fistpll 8(%rsp)
-	mov 8(%rsp),%rax
-	ret
diff --git a/src/math/x86_64/lrintl.c b/src/math/x86_64/lrintl.c
new file mode 100644
index 00000000..068e2e4d
--- /dev/null
+++ b/src/math/x86_64/lrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintl(long double x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/x86_64/lrintl.s b/src/math/x86_64/lrintl.s
deleted file mode 100644
index d587b12b..00000000
--- a/src/math/x86_64/lrintl.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrintl
-.type lrintl,@function
-lrintl:
-	fldt 8(%rsp)
-	fistpll 8(%rsp)
-	mov 8(%rsp),%rax
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move more x86-family lrint functions to C
  2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
@ 2020-01-11 16:07   ` Rich Felker
  2020-01-11 16:22     ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-01-11 16:07 UTC (permalink / raw)
  To: musl

On Sat, Jan 11, 2020 at 06:23:54PM +0300, Alexander Monakov wrote:
> ---
> It was news to me that GCC inline asm conventions allow to specify
> effect on x87 register stack. Here a clobber tells the compiler
> that the asm pops the topmost element off the stack.

Is this documented/reliable/supported by other compilers (clang, pcc,
old gcc)? It looks very nice and I hope we can do it; otherwise it
looks like a gratuitous instruction would be needed. But I want to
make sure it's safe.

Rich


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] math: move more x86-family lrint functions to C
  2020-01-11 16:07   ` Rich Felker
@ 2020-01-11 16:22     ` Rich Felker
  0 siblings, 0 replies; 55+ messages in thread
From: Rich Felker @ 2020-01-11 16:22 UTC (permalink / raw)
  To: musl

On Sat, Jan 11, 2020 at 11:07:01AM -0500, Rich Felker wrote:
> On Sat, Jan 11, 2020 at 06:23:54PM +0300, Alexander Monakov wrote:
> > ---
> > It was news to me that GCC inline asm conventions allow to specify
> > effect on x87 register stack. Here a clobber tells the compiler
> > that the asm pops the topmost element off the stack.
> 
> Is this documented/reliable/supported by other compilers (clang, pcc,
> old gcc)? It looks very nice and I hope we can do it; otherwise it
> looks like a gratuitous instruction would be needed. But I want to
> make sure it's safe.

OK, I see it documented at the end of the page here:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html under 6.47.2.9
x86 Floating-Point asm Operands. So if other compilers don't honor
this they're not implementing it as documented.

I noticed also there (bullet point 5) that, if inline asm pushes
anything to x87 stack, it needs appropriate clobbers so the compiler
can ensure the stack is not full on entry. This would be easy to miss
because the stack will be empty in standalone functions, and it would
only come up if they get inlined via LTO.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [musl] [PATCH] math: move x86-family rint functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (6 preceding siblings ...)
  2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
@ 2020-01-14 11:54 ` Alexander Monakov
  2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-14 11:54 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

---
One more batch of trivial rewrites.

 src/math/i386/rint.c    | 7 +++++++
 src/math/i386/rint.s    | 6 ------
 src/math/i386/rintf.c   | 7 +++++++
 src/math/i386/rintf.s   | 6 ------
 src/math/i386/rintl.c   | 7 +++++++
 src/math/i386/rintl.s   | 6 ------
 src/math/x86_64/rintl.c | 7 +++++++
 src/math/x86_64/rintl.s | 6 ------
 8 files changed, 28 insertions(+), 24 deletions(-)
 create mode 100644 src/math/i386/rint.c
 delete mode 100644 src/math/i386/rint.s
 create mode 100644 src/math/i386/rintf.c
 delete mode 100644 src/math/i386/rintf.s
 create mode 100644 src/math/i386/rintl.c
 delete mode 100644 src/math/i386/rintl.s
 create mode 100644 src/math/x86_64/rintl.c
 delete mode 100644 src/math/x86_64/rintl.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0008-math-move-x86-family-rint-functions-to-C.patch --]
[-- Type: text/x-patch; name="0008-math-move-x86-family-rint-functions-to-C.patch", Size: 2032 bytes --]

diff --git a/src/math/i386/rint.c b/src/math/i386/rint.c
new file mode 100644
index 00000000..a5276a60
--- /dev/null
+++ b/src/math/i386/rint.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double rint(double x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/rint.s b/src/math/i386/rint.s
deleted file mode 100644
index bb99a11c..00000000
--- a/src/math/i386/rint.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rint
-.type rint,@function
-rint:
-	fldl 4(%esp)
-	frndint
-	ret
diff --git a/src/math/i386/rintf.c b/src/math/i386/rintf.c
new file mode 100644
index 00000000..bb4121a4
--- /dev/null
+++ b/src/math/i386/rintf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float rintf(float x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/rintf.s b/src/math/i386/rintf.s
deleted file mode 100644
index bce4c5a6..00000000
--- a/src/math/i386/rintf.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rintf
-.type rintf,@function
-rintf:
-	flds 4(%esp)
-	frndint
-	ret
diff --git a/src/math/i386/rintl.c b/src/math/i386/rintl.c
new file mode 100644
index 00000000..e1a92077
--- /dev/null
+++ b/src/math/i386/rintl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double rintl(long double x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/rintl.s b/src/math/i386/rintl.s
deleted file mode 100644
index cd2bf9a9..00000000
--- a/src/math/i386/rintl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rintl
-.type rintl,@function
-rintl:
-	fldt 4(%esp)
-	frndint
-	ret
diff --git a/src/math/x86_64/rintl.c b/src/math/x86_64/rintl.c
new file mode 100644
index 00000000..e1a92077
--- /dev/null
+++ b/src/math/x86_64/rintl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double rintl(long double x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/x86_64/rintl.s b/src/math/x86_64/rintl.s
deleted file mode 100644
index 64e663cd..00000000
--- a/src/math/x86_64/rintl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rintl
-.type rintl,@function
-rintl:
-	fldt 8(%rsp)
-	frndint
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] [PATCH] math: move i386 sqrtf to C
  2020-01-09 21:00       ` Szabolcs Nagy
  2020-01-09 22:00         ` Rich Felker
@ 2020-01-14 17:59         ` Alexander Monakov
  2020-01-14 18:47           ` Szabolcs Nagy
  1 sibling, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-14 17:59 UTC (permalink / raw)
  To: musl

On Thu, 9 Jan 2020, Szabolcs Nagy wrote:

> c99 required cast at return, but that was considered to be
> a bug so in c11 it's no longer required: return now must
> round

This is the interpretation that GCC uses (return statements
imply removal of excess precision), but I wonder where the
standard actually says that: afaics N1570 still has the same
footnote in 6.8.6.4 that ends with "A cast may be used to
remove this extra range and precision."

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [musl] Q: dealing with missing removal of excess precision
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (7 preceding siblings ...)
  2020-01-14 11:54 ` [musl] [PATCH] math: move x86-family rint " Alexander Monakov
@ 2020-01-14 18:17 ` Alexander Monakov
  2020-01-14 18:50   ` Szabolcs Nagy
  2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-14 18:17 UTC (permalink / raw)
  To: musl


What to do with functions that return values with excess precision from asm?

Many of the remaining x87-based asm implementations do not remove excess
precision on return. This looks like a bug, but matches Glibc behavior.
Moving such functions to C would either introduce removal of excess precision,
or result in code that lies to the compiler by pretending that inline asm keeps
values within precision of float/double, where in reality it might not.

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] [PATCH] math: move i386 sqrtf to C
  2020-01-14 17:59         ` [musl] " Alexander Monakov
@ 2020-01-14 18:47           ` Szabolcs Nagy
  0 siblings, 0 replies; 55+ messages in thread
From: Szabolcs Nagy @ 2020-01-14 18:47 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2020-01-14 20:59:21 +0300]:
> On Thu, 9 Jan 2020, Szabolcs Nagy wrote:
> 
> > c99 required cast at return, but that was considered to be
> > a bug so in c11 it's no longer required: return now must
> > round
> 
> This is the interpretation that GCC uses (return statements
> imply removal of excess precision), but I wonder where the
> standard actually says that: afaics N1570 still has the same
> footnote in 6.8.6.4 that ends with "A cast may be used to
> remove this extra range and precision."

it's in annex f
http://port70.net/~nsz/c/c11/n1570.html#F.6

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
@ 2020-01-14 18:50   ` Szabolcs Nagy
  2020-01-14 18:58     ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Szabolcs Nagy @ 2020-01-14 18:50 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2020-01-14 21:17:55 +0300]:
> What to do with functions that return values with excess precision from asm?
> 
> Many of the remaining x87-based asm implementations do not remove excess
> precision on return. This looks like a bug, but matches Glibc behavior.
> Moving such functions to C would either introduce removal of excess precision,
> or result in code that lies to the compiler by pretending that inline asm keeps
> values within precision of float/double, where in reality it might not.

the intention is to remove excess precision so it's good that's happening.

otoh it would be nice if there was a way to tell the compiler not to
remove it (e.g. in case the asm already took care of it) even in c99
standard mode.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-01-14 18:50   ` Szabolcs Nagy
@ 2020-01-14 18:58     ` Rich Felker
  2020-01-14 19:53       ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-01-14 18:58 UTC (permalink / raw)
  To: musl

On Tue, Jan 14, 2020 at 07:50:58PM +0100, Szabolcs Nagy wrote:
> * Alexander Monakov <amonakov@ispras.ru> [2020-01-14 21:17:55 +0300]:
> > What to do with functions that return values with excess precision from asm?
> > 
> > Many of the remaining x87-based asm implementations do not remove excess
> > precision on return. This looks like a bug, but matches Glibc behavior.
> > Moving such functions to C would either introduce removal of excess precision,
> > or result in code that lies to the compiler by pretending that inline asm keeps
> > values within precision of float/double, where in reality it might not.
> 
> the intention is to remove excess precision so it's good that's happening.

Agreed. It may be a slight performance regression but it's the right
thing to do anyway.

> otoh it would be nice if there was a way to tell the compiler not to
> remove it (e.g. in case the asm already took care of it) even in c99
> standard mode.

Perhaps this happens if the output constraint is tied to a float
rather than a long double?

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-01-14 18:58     ` Rich Felker
@ 2020-01-14 19:53       ` Alexander Monakov
  2020-02-06 14:51         ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-14 19:53 UTC (permalink / raw)
  To: musl

On Tue, 14 Jan 2020, Rich Felker wrote:

> > otoh it would be nice if there was a way to tell the compiler not to
> > remove it (e.g. in case the asm already took care of it) even in c99
> > standard mode.
> 
> Perhaps this happens if the output constraint is tied to a float
> rather than a long double?

Precisely. That's what previously posted patches do, and they match
existing hand-tuned assembly.

I misspoke when saying that Glibc might return a value with excess precision.
I was looking at fmod-like functions and missed a slightly subtle point that
fprem does not introduce excess precision. So I don't actually have any
example where Glibc might misbehave in that regard.

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [musl] [PATCH] math: move x86-family remainder functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (8 preceding siblings ...)
  2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
@ 2020-01-14 20:41 ` Alexander Monakov
  2020-01-15  6:54   ` Szabolcs Nagy
  2020-01-15 15:44 ` [musl] [PATCH] math: move x86-family fmod " Alexander Monakov
  2020-01-16 21:00 ` [musl] [PATCH] math: add x86_64 remquol Alexander Monakov
  11 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-01-14 20:41 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

---
For an unclear reason commit 5d01ab4ac64b913c537e91f7c01d5c8e910151da
did not add long double aliases dreml and finitel. This patch
preserves the oddity.

 src/math/i386/remainder.c    | 12 ++++++++++++
 src/math/i386/remainder.s    | 14 --------------
 src/math/i386/remainderf.c   | 12 ++++++++++++
 src/math/i386/remainderf.s   | 14 --------------
 src/math/i386/remainderl.c   |  9 +++++++++
 src/math/i386/remainderl.s   | 11 -----------
 src/math/x86_64/remainderl.c |  9 +++++++++
 src/math/x86_64/remainderl.s | 11 -----------
 8 files changed, 42 insertions(+), 50 deletions(-)
 create mode 100644 src/math/i386/remainder.c
 delete mode 100644 src/math/i386/remainder.s
 create mode 100644 src/math/i386/remainderf.c
 delete mode 100644 src/math/i386/remainderf.s
 create mode 100644 src/math/i386/remainderl.c
 delete mode 100644 src/math/i386/remainderl.s
 create mode 100644 src/math/x86_64/remainderl.c
 delete mode 100644 src/math/x86_64/remainderl.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0009-math-move-x86-family-remainder-functions-to-C.patch --]
[-- Type: text/x-patch; name="0009-math-move-x86-family-remainder-functions-to-C.patch", Size: 3124 bytes --]

diff --git a/src/math/i386/remainder.c b/src/math/i386/remainder.c
new file mode 100644
index 00000000..c083df90
--- /dev/null
+++ b/src/math/i386/remainder.c
@@ -0,0 +1,12 @@
+#include <math.h>
+
+double remainder(double x, double y)
+{
+	unsigned short fpsr;
+	// fprem1 does not introduce excess precision into x
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
+
+weak_alias(remainder, drem);
diff --git a/src/math/i386/remainder.s b/src/math/i386/remainder.s
deleted file mode 100644
index ab1da95d..00000000
--- a/src/math/i386/remainder.s
+++ /dev/null
@@ -1,14 +0,0 @@
-.global remainder
-.type remainder,@function
-remainder:
-.weak drem
-.type drem,@function
-drem:
-	fldl 12(%esp)
-	fldl 4(%esp)
-1:	fprem1
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/remainderf.c b/src/math/i386/remainderf.c
new file mode 100644
index 00000000..280207d2
--- /dev/null
+++ b/src/math/i386/remainderf.c
@@ -0,0 +1,12 @@
+#include <math.h>
+
+float remainderf(float x, float y)
+{
+	unsigned short fpsr;
+	// fprem1 does not introduce excess precision into x
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
+
+weak_alias(remainderf, dremf);
diff --git a/src/math/i386/remainderf.s b/src/math/i386/remainderf.s
deleted file mode 100644
index 6a7378a3..00000000
--- a/src/math/i386/remainderf.s
+++ /dev/null
@@ -1,14 +0,0 @@
-.global remainderf
-.type remainderf,@function
-remainderf:
-.weak dremf
-.type dremf,@function
-dremf:
-	flds 8(%esp)
-	flds 4(%esp)
-1:	fprem1
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/remainderl.c b/src/math/i386/remainderl.c
new file mode 100644
index 00000000..8cf75071
--- /dev/null
+++ b/src/math/i386/remainderl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double remainderl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/remainderl.s b/src/math/i386/remainderl.s
deleted file mode 100644
index b41518ed..00000000
--- a/src/math/i386/remainderl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global remainderl
-.type remainderl,@function
-remainderl:
-	fldt 16(%esp)
-	fldt 4(%esp)
-1:	fprem1
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/x86_64/remainderl.c b/src/math/x86_64/remainderl.c
new file mode 100644
index 00000000..8cf75071
--- /dev/null
+++ b/src/math/x86_64/remainderl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double remainderl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/x86_64/remainderl.s b/src/math/x86_64/remainderl.s
deleted file mode 100644
index cb3857b4..00000000
--- a/src/math/x86_64/remainderl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global remainderl
-.type remainderl,@function
-remainderl:
-	fldt 24(%rsp)
-	fldt 8(%rsp)
-1:	fprem1
-	fnstsw %ax
-	testb $4,%ah
-	jnz 1b
-	fstp %st(1)
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] [PATCH] math: move x86-family remainder functions to C
  2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
@ 2020-01-15  6:54   ` Szabolcs Nagy
  0 siblings, 0 replies; 55+ messages in thread
From: Szabolcs Nagy @ 2020-01-15  6:54 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2020-01-14 23:41:19 +0300]:
> For an unclear reason commit 5d01ab4ac64b913c537e91f7c01d5c8e910151da
> did not add long double aliases dreml and finitel. This patch
> preserves the oddity.

see math.h

it does not declare dreml and finitel, those
are gnu only extensions, not present in bsds.
(they were obsolete before long double appeared)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [musl] [PATCH] math: move x86-family fmod functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (9 preceding siblings ...)
  2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
@ 2020-01-15 15:44 ` Alexander Monakov
  2020-01-16 21:00 ` [musl] [PATCH] math: add x86_64 remquol Alexander Monakov
  11 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-15 15:44 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

---
Exactly like remainder functions, but with fprem instruction instead of fprem1.

 src/math/i386/fmod.c    | 10 ++++++++++
 src/math/i386/fmod.s    | 11 -----------
 src/math/i386/fmodf.c   | 10 ++++++++++
 src/math/i386/fmodf.s   | 11 -----------
 src/math/i386/fmodl.c   |  9 +++++++++
 src/math/i386/fmodl.s   | 11 -----------
 src/math/x86_64/fmodl.c |  9 +++++++++
 src/math/x86_64/fmodl.s | 11 -----------
 8 files changed, 38 insertions(+), 44 deletions(-)
 create mode 100644 src/math/i386/fmod.c
 delete mode 100644 src/math/i386/fmod.s
 create mode 100644 src/math/i386/fmodf.c
 delete mode 100644 src/math/i386/fmodf.s
 create mode 100644 src/math/i386/fmodl.c
 delete mode 100644 src/math/i386/fmodl.s
 create mode 100644 src/math/x86_64/fmodl.c
 delete mode 100644 src/math/x86_64/fmodl.s


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0010-math-move-x86-family-fmod-functions-to-C.patch --]
[-- Type: text/x-patch; name="0010-math-move-x86-family-fmod-functions-to-C.patch", Size: 2763 bytes --]

diff --git a/src/math/i386/fmod.c b/src/math/i386/fmod.c
new file mode 100644
index 00000000..ea0c58d9
--- /dev/null
+++ b/src/math/i386/fmod.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+double fmod(double x, double y)
+{
+	unsigned short fpsr;
+	// fprem does not introduce excess precision into x
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/fmod.s b/src/math/i386/fmod.s
deleted file mode 100644
index 2113b3c5..00000000
--- a/src/math/i386/fmod.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmod
-.type fmod,@function
-fmod:
-	fldl 12(%esp)
-	fldl 4(%esp)
-1:	fprem
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/fmodf.c b/src/math/i386/fmodf.c
new file mode 100644
index 00000000..90b56ab0
--- /dev/null
+++ b/src/math/i386/fmodf.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+float fmodf(float x, float y)
+{
+	unsigned short fpsr;
+	// fprem does not introduce excess precision into x
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/fmodf.s b/src/math/i386/fmodf.s
deleted file mode 100644
index e04e2a56..00000000
--- a/src/math/i386/fmodf.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmodf
-.type fmodf,@function
-fmodf:
-	flds 8(%esp)
-	flds 4(%esp)
-1:	fprem
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/fmodl.c b/src/math/i386/fmodl.c
new file mode 100644
index 00000000..3daeab06
--- /dev/null
+++ b/src/math/i386/fmodl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double fmodl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/fmodl.s b/src/math/i386/fmodl.s
deleted file mode 100644
index 0cb3fe9b..00000000
--- a/src/math/i386/fmodl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmodl
-.type fmodl,@function
-fmodl:
-	fldt 16(%esp)
-	fldt 4(%esp)
-1:	fprem
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/x86_64/fmodl.c b/src/math/x86_64/fmodl.c
new file mode 100644
index 00000000..3daeab06
--- /dev/null
+++ b/src/math/x86_64/fmodl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double fmodl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/x86_64/fmodl.s b/src/math/x86_64/fmodl.s
deleted file mode 100644
index ea07b402..00000000
--- a/src/math/x86_64/fmodl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmodl
-.type fmodl,@function
-fmodl:
-	fldt 24(%rsp)
-	fldt 8(%rsp)
-1:	fprem
-	fnstsw %ax
-	testb $4,%ah
-	jnz 1b
-	fstp %st(1)
-	ret

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [musl] [PATCH] math: add x86_64 remquol
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (10 preceding siblings ...)
  2020-01-15 15:44 ` [musl] [PATCH] math: move x86-family fmod " Alexander Monakov
@ 2020-01-16 21:00 ` Alexander Monakov
  11 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-01-16 21:00 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 319 bytes --]

---
So proud of this one <3

(this is not a rewrite as x86_64 remquol.s does not exist, but that
looks unintentional and i386 remquo versions need similar rewrites anyway)

 src/math/x86_64/remquol.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100644 src/math/x86_64/remquol.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0011-math-add-x86_64-remquol.patch --]
[-- Type: text/x-patch; name="0011-math-add-x86_64-remquol.patch", Size: 1358 bytes --]

diff --git a/src/math/x86_64/remquol.c b/src/math/x86_64/remquol.c
new file mode 100644
index 00000000..60eef089
--- /dev/null
+++ b/src/math/x86_64/remquol.c
@@ -0,0 +1,32 @@
+#include <math.h>
+
+long double remquol(long double x, long double y, int *quo)
+{
+	signed char *cx = (void *)&x, *cy = (void *)&y;
+	/* By ensuring that addresses of x and y cannot be discarded,
+	 * this empty asm guides GCC into representing extraction of
+	 * their sign bits as memory loads rather than making x and y
+	 * not-address-taken internally and using bitfield operations,
+	 * which in the end wouldn't work out, as extraction from FPU
+	 * registers needs to go through memory anyway. This way GCC
+	 * should manage to use incoming stack slots without spills. */
+	__asm__ ("" :: "X"(cx), "X"(cy));
+
+	long double t = x;
+	unsigned fpsr;
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(t), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	/* C0, C1, C3 flags in x87 status word carry low bits of quotient:
+	 * 15 14 13 12 11 10  9  8
+	 *  . C3  .  .  . C2 C1 C0
+	 *  . b1  .  .  .  0 b0 b2 */
+	unsigned char i = fpsr >> 8;
+	i = i>>4 | i<<4;
+	/* i[5:2] is now {b0 b2 ? b1}. Retrieve {0 b2 b1 b0} via
+	 * in-register table lookup. */
+	unsigned qbits = 0x7575313164642020 >> (i & 60);
+	qbits &= 7;
+
+	*quo = (cx[9]^cy[9]) < 0 ? -qbits : qbits;
+	return t;
+}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-01-14 19:53       ` Alexander Monakov
@ 2020-02-06 14:51         ` Rich Felker
  2020-02-06 17:15           ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-02-06 14:51 UTC (permalink / raw)
  To: musl

On Tue, Jan 14, 2020 at 10:53:41PM +0300, Alexander Monakov wrote:
> On Tue, 14 Jan 2020, Rich Felker wrote:
> 
> > > otoh it would be nice if there was a way to tell the compiler not to
> > > remove it (e.g. in case the asm already took care of it) even in c99
> > > standard mode.
> > 
> > Perhaps this happens if the output constraint is tied to a float
> > rather than a long double?
> 
> Precisely. That's what previously posted patches do, and they match
> existing hand-tuned assembly.
> 
> I misspoke when saying that Glibc might return a value with excess precision.
> I was looking at fmod-like functions and missed a slightly subtle point that
> fprem does not introduce excess precision. So I don't actually have any
> example where Glibc might misbehave in that regard.

I think I might like to go ahead and apply these patches now, or at
least some of them -- the ones fixing excess precision -- rather
waiting, because I got a report of a nasty bug stemming from excess
precision of the inverse trig functions:

https://github.com/OSGeo/PROJ/issues/1906

The problem is that, by returning excess precision, these functions
violate hard constraints on their range (either entire range, or range
for a given input domain) -- invariants like acos(x)<=M_PI or
implications like atan2(y,x)>=M_PI_2 => "(x,y) outside first quadrant"
fail to hold.

Arguably this is just a 1ulp error issue, but I don't think we have
any actual inaccuracies of that degree at "important" angles where the
result is not the correctly rounded one, even with the generic C
implementations. Rather the problem is stemming purely from wrongly
retaining excess precision and the fact that the LD80 approximation of
pi is greater than the double approximation of pi.

If writing and testing the remaining i386 functions before release is
not practical, I wonder if just removing the asm for now, and adding
back the new code in next release cycle would be a good idea. Or I
could just leave it, but I don't like making a release with "known
bugs of consequence" like this.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-06 14:51         ` Rich Felker
@ 2020-02-06 17:15           ` Alexander Monakov
  2020-02-06 17:46             ` Rich Felker
  2020-02-22 19:59             ` Rich Felker
  0 siblings, 2 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-02-06 17:15 UTC (permalink / raw)
  To: musl

On Thu, 6 Feb 2020, Rich Felker wrote:

> I think I might like to go ahead and apply these patches now, or at
> least some of them -- the ones fixing excess precision -- rather
> waiting, because I got a report of a nasty bug stemming from excess
> precision of the inverse trig functions:

That might be exactly the empty set of patches, as I did not yet post
any for functions that might return with excess precision.

Be advised that I found bugs in my patches, so given that no one so far
has pointed them out on the mailing list indicates that either nobody
bothered to review, or people are keeping the findings to themselves.

> If writing and testing the remaining i386 functions before release is
> not practical, I wonder if just removing the asm for now, and adding
> back the new code in next release cycle would be a good idea. Or I
> could just leave it, but I don't like making a release with "known
> bugs of consequence" like this.

I think fixing excess precision in inverse trig functions might be
easier than removing the asm entirely.

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-06 17:15           ` Alexander Monakov
@ 2020-02-06 17:46             ` Rich Felker
  2020-02-06 19:03               ` Rich Felker
  2020-02-22 19:59             ` Rich Felker
  1 sibling, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-02-06 17:46 UTC (permalink / raw)
  To: musl

On Thu, Feb 06, 2020 at 08:15:30PM +0300, Alexander Monakov wrote:
> On Thu, 6 Feb 2020, Rich Felker wrote:
> 
> > I think I might like to go ahead and apply these patches now, or at
> > least some of them -- the ones fixing excess precision -- rather
> > waiting, because I got a report of a nasty bug stemming from excess
> > precision of the inverse trig functions:
> 
> That might be exactly the empty set of patches, as I did not yet post
> any for functions that might return with excess precision.

Indeed, I just discovered that...

> Be advised that I found bugs in my patches, so given that no one so far
> has pointed them out on the mailing list indicates that either nobody
> bothered to review, or people are keeping the findings to themselves.

I think it's just that I was planning to do further review after
release rather than before since I'm trying to get the release out..

> > If writing and testing the remaining i386 functions before release is
> > not practical, I wonder if just removing the asm for now, and adding
> > back the new code in next release cycle would be a good idea. Or I
> > could just leave it, but I don't like making a release with "known
> > bugs of consequence" like this.
> 
> I think fixing excess precision in inverse trig functions might be
> easier than removing the asm entirely.

Yes, what I'm looking at right now is fixing inverse trig and log
functions and removing the exp asm (since the exp logic is way too
messy for me to feel comfortable modifying right now) and possibly
re-adding it later as inline asm with the flow control in C.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-06 17:46             ` Rich Felker
@ 2020-02-06 19:03               ` Rich Felker
  2020-02-06 20:02                 ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-02-06 19:03 UTC (permalink / raw)
  To: musl

On Thu, Feb 06, 2020 at 12:46:08PM -0500, Rich Felker wrote:
> On Thu, Feb 06, 2020 at 08:15:30PM +0300, Alexander Monakov wrote:
> > On Thu, 6 Feb 2020, Rich Felker wrote:
> > 
> > > I think I might like to go ahead and apply these patches now, or at
> > > least some of them -- the ones fixing excess precision -- rather
> > > waiting, because I got a report of a nasty bug stemming from excess
> > > precision of the inverse trig functions:
> > 
> > That might be exactly the empty set of patches, as I did not yet post
> > any for functions that might return with excess precision.
> 
> Indeed, I just discovered that...
> 
> > Be advised that I found bugs in my patches, so given that no one so far
> > has pointed them out on the mailing list indicates that either nobody
> > bothered to review, or people are keeping the findings to themselves.
> 
> I think it's just that I was planning to do further review after
> release rather than before since I'm trying to get the release out..
> 
> > > If writing and testing the remaining i386 functions before release is
> > > not practical, I wonder if just removing the asm for now, and adding
> > > back the new code in next release cycle would be a good idea. Or I
> > > could just leave it, but I don't like making a release with "known
> > > bugs of consequence" like this.
> > 
> > I think fixing excess precision in inverse trig functions might be
> > easier than removing the asm entirely.
> 
> Yes, what I'm looking at right now is fixing inverse trig and log
> functions and removing the exp asm (since the exp logic is way too
> messy for me to feel comfortable modifying right now) and possibly
> re-adding it later as inline asm with the flow control in C.

FWIW nsz's new C exp seems considerably faster than the existing 386
asm on my box (Atom S1260) (6.7s vs >8s for summing exp(x) from
x=-2..2 stepping 0x1p-24). Test program attached in case anyone else
wants to try it.

So I think just removing exp*.s is the right approach for now. The
long double ones should actually be left, and that raises the issue
that expm1l is wrongly using the exp code rather than expl code nsz
added long ago in a8f73bb1a685dd7d67669c6f6ceb255cfa967790. I won't
try to fix this yet but will just move the files around so we can rm
the float/double ones and use the C for them without getting rid of
the ld asm.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-06 19:03               ` Rich Felker
@ 2020-02-06 20:02                 ` Rich Felker
  2020-02-06 22:08                   ` Szabolcs Nagy
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-02-06 20:02 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2549 bytes --]

On Thu, Feb 06, 2020 at 02:03:44PM -0500, Rich Felker wrote:
> On Thu, Feb 06, 2020 at 12:46:08PM -0500, Rich Felker wrote:
> > On Thu, Feb 06, 2020 at 08:15:30PM +0300, Alexander Monakov wrote:
> > > On Thu, 6 Feb 2020, Rich Felker wrote:
> > > 
> > > > I think I might like to go ahead and apply these patches now, or at
> > > > least some of them -- the ones fixing excess precision -- rather
> > > > waiting, because I got a report of a nasty bug stemming from excess
> > > > precision of the inverse trig functions:
> > > 
> > > That might be exactly the empty set of patches, as I did not yet post
> > > any for functions that might return with excess precision.
> > 
> > Indeed, I just discovered that...
> > 
> > > Be advised that I found bugs in my patches, so given that no one so far
> > > has pointed them out on the mailing list indicates that either nobody
> > > bothered to review, or people are keeping the findings to themselves.
> > 
> > I think it's just that I was planning to do further review after
> > release rather than before since I'm trying to get the release out..
> > 
> > > > If writing and testing the remaining i386 functions before release is
> > > > not practical, I wonder if just removing the asm for now, and adding
> > > > back the new code in next release cycle would be a good idea. Or I
> > > > could just leave it, but I don't like making a release with "known
> > > > bugs of consequence" like this.
> > > 
> > > I think fixing excess precision in inverse trig functions might be
> > > easier than removing the asm entirely.
> > 
> > Yes, what I'm looking at right now is fixing inverse trig and log
> > functions and removing the exp asm (since the exp logic is way too
> > messy for me to feel comfortable modifying right now) and possibly
> > re-adding it later as inline asm with the flow control in C.
> 
> FWIW nsz's new C exp seems considerably faster than the existing 386
> asm on my box (Atom S1260) (6.7s vs >8s for summing exp(x) from
> x=-2..2 stepping 0x1p-24). Test program attached in case anyone else
> wants to try it.
> 
> So I think just removing exp*.s is the right approach for now. The
> long double ones should actually be left, and that raises the issue
> that expm1l is wrongly using the exp code rather than expl code nsz
> added long ago in a8f73bb1a685dd7d67669c6f6ceb255cfa967790. I won't
> try to fix this yet but will just move the files around so we can rm
> the float/double ones and use the C for them without getting rid of
> the ld asm.

Attachment was missing.

Rich

[-- Attachment #2: exptime.c --]
[-- Type: text/plain, Size: 116 bytes --]

#include <math.h>

int main()
{
	double x, a=0;
	for (x=-2; x<2; x+=0x1p-24)
		a += exp(x);
	return (long long)a;
}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-06 20:02                 ` Rich Felker
@ 2020-02-06 22:08                   ` Szabolcs Nagy
  0 siblings, 0 replies; 55+ messages in thread
From: Szabolcs Nagy @ 2020-02-06 22:08 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-02-06 15:02:58 -0500]:
> #include <math.h>
> 
> int main()
> {
> 	double x, a=0;
> 	for (x=-2; x<2; x+=0x1p-24)
> 		a += exp(x);
> 	return (long long)a;
> }

same here on my old core i5 x86_64 laptop

asm: 3.13s
c  : 2.67s

optimized-routines benchmark is

asm:
      exp rthruput:   44.18 ns/elem   44176510 ns in [-9.9 9.9]
      exp  latency:   48.32 ns/call   48317455 ns in [-9.9 9.9]
     expf rthruput:   28.59 ns/elem   28586714 ns in [-9.9 9.9]
     expf  latency:   44.54 ns/call   44536368 ns in [-9.9 9.9]
     exp2 rthruput:   42.63 ns/elem   42634692 ns in [-9.9 9.9]
     exp2  latency:   46.54 ns/call   46543915 ns in [-9.9 9.9]
    exp2f rthruput:   27.75 ns/elem   27751888 ns in [-9.9 9.9]
    exp2f  latency:   42.91 ns/call   42908855 ns in [-9.9 9.9]
c:
      exp rthruput:   34.74 ns/elem   34735750 ns in [-9.9 9.9]
      exp  latency:   38.88 ns/call   38879834 ns in [-9.9 9.9]
     expf rthruput:   14.88 ns/elem   14878824 ns in [-9.9 9.9]
     expf  latency:   26.74 ns/call   26740464 ns in [-9.9 9.9]
     exp2 rthruput:   31.62 ns/elem   31622777 ns in [-9.9 9.9]
     exp2  latency:   35.76 ns/call   35760590 ns in [-9.9 9.9]
    exp2f rthruput:   14.87 ns/elem   14869128 ns in [-9.9 9.9]
    exp2f  latency:   24.80 ns/call   24795727 ns in [-9.9 9.9]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-06 17:15           ` Alexander Monakov
  2020-02-06 17:46             ` Rich Felker
@ 2020-02-22 19:59             ` Rich Felker
  2020-02-22 20:21               ` Alexander Monakov
  1 sibling, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-02-22 19:59 UTC (permalink / raw)
  To: musl

On Thu, Feb 06, 2020 at 08:15:30PM +0300, Alexander Monakov wrote:
> On Thu, 6 Feb 2020, Rich Felker wrote:
> 
> > I think I might like to go ahead and apply these patches now, or at
> > least some of them -- the ones fixing excess precision -- rather
> > waiting, because I got a report of a nasty bug stemming from excess
> > precision of the inverse trig functions:
> 
> That might be exactly the empty set of patches, as I did not yet post
> any for functions that might return with excess precision.
> 
> Be advised that I found bugs in my patches, so given that no one so far
> has pointed them out on the mailing list indicates that either nobody
> bothered to review, or people are keeping the findings to themselves.

Can you comment on these bugs? Now that the 1.2.0 release is out I'd
like to follow up with reviewing/merging these.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-22 19:59             ` Rich Felker
@ 2020-02-22 20:21               ` Alexander Monakov
  2020-02-23  0:19                 ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-02-22 20:21 UTC (permalink / raw)
  To: musl

On Sat, 22 Feb 2020, Rich Felker wrote:

> Can you comment on these bugs? Now that the 1.2.0 release is out I'd
> like to follow up with reviewing/merging these.

Sure, but before I do that I'd like to understand better how the review
process is going to work.

In particular, wouldn't it make more sense to wait until after you make
your round of reviews, so that your examination is not biased by what
I reveal?

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-22 20:21               ` Alexander Monakov
@ 2020-02-23  0:19                 ` Rich Felker
  2020-02-23 16:14                   ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-02-23  0:19 UTC (permalink / raw)
  To: musl

On Sat, Feb 22, 2020 at 11:21:58PM +0300, Alexander Monakov wrote:
> On Sat, 22 Feb 2020, Rich Felker wrote:
> 
> > Can you comment on these bugs? Now that the 1.2.0 release is out I'd
> > like to follow up with reviewing/merging these.
> 
> Sure, but before I do that I'd like to understand better how the review
> process is going to work.
> 
> In particular, wouldn't it make more sense to wait until after you make
> your round of reviews, so that your examination is not biased by what
> I reveal?

I'm not well acquainted with SSE, and only so-so with x87, so pretty
much I'm reading them for higher-level issues with tooling
compatibility (like the concerns I already raised and looked up and
seem to have resolved about x87 constraints and non-GCC compilers) and
logic, then planning to apply and test them. I think being aware of
non-obvious mistake modes that have already been found would be a lot
more useful than staring at things, especially if the bugs you've
found are in subtleties of the insn behavior or constraint behavior.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-23  0:19                 ` Rich Felker
@ 2020-02-23 16:14                   ` Alexander Monakov
  2020-03-20 18:12                     ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-02-23 16:14 UTC (permalink / raw)
  To: musl

On Sat, 22 Feb 2020, Rich Felker wrote:

> I'm not well acquainted with SSE, and only so-so with x87, so pretty
> much I'm reading them for higher-level issues with tooling
> compatibility (like the concerns I already raised and looked up and
> seem to have resolved about x87 constraints and non-GCC compilers) and
> logic, then planning to apply and test them. I think being aware of
> non-obvious mistake modes that have already been found would be a lot
> more useful than staring at things, especially if the bugs you've
> found are in subtleties of the insn behavior or constraint behavior.

Okay, thanks. I found two issues:

1. i386 lrint* functions mistakenly used fistpll instead of fistpl (I
posted the fix for x32 asm after noticing my own mistake).

2. Some functions bind a 32-bit lvalue as output for fnstsw %ax, which
as the operand says writes only 16 bits. They should be changed to either
use a 16-bit lvalue, or a zero-initialized 32-bit lvalue with "+a" constraint.

Plus, not bugs, but still worth mentioning:

3. The new remquol in C could alternatively be implemented by using fxam
to extract sign bits instead of loading them from stack slots. The current
approach makes sense given the ABI, but an implementation aiming for better
code after inlining could choose to use fxam instead of forcing a spill.

4. I did not manage to find a copy of Figueroa's "When is double rounding
innocuous", but I could cite e.g. "Innocuous Double Rounding of Basic
Arithmetic Operations" by Pierre Roux instead (in i386/sqrtf.c).

If you like you can fetch a Git tree with my patches from 

    https://git.sr.ht/~amonakov/musl

(issue 1 is already corrected in that repo)

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-02-23 16:14                   ` Alexander Monakov
@ 2020-03-20 18:12                     ` Rich Felker
  2020-03-22  1:19                       ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-03-20 18:12 UTC (permalink / raw)
  To: musl

On Sun, Feb 23, 2020 at 07:14:08PM +0300, Alexander Monakov wrote:
> On Sat, 22 Feb 2020, Rich Felker wrote:
> 
> > I'm not well acquainted with SSE, and only so-so with x87, so pretty
> > much I'm reading them for higher-level issues with tooling
> > compatibility (like the concerns I already raised and looked up and
> > seem to have resolved about x87 constraints and non-GCC compilers) and
> > logic, then planning to apply and test them. I think being aware of
> > non-obvious mistake modes that have already been found would be a lot
> > more useful than staring at things, especially if the bugs you've
> > found are in subtleties of the insn behavior or constraint behavior.
> 
> Okay, thanks. I found two issues:
> 
> 1. i386 lrint* functions mistakenly used fistpll instead of fistpl (I
> posted the fix for x32 asm after noticing my own mistake).
> 
> 2. Some functions bind a 32-bit lvalue as output for fnstsw %ax, which
> as the operand says writes only 16 bits. They should be changed to either
> use a 16-bit lvalue, or a zero-initialized 32-bit lvalue with "+a" constraint.
> 
> Plus, not bugs, but still worth mentioning:
> 
> 3. The new remquol in C could alternatively be implemented by using fxam
> to extract sign bits instead of loading them from stack slots. The current
> approach makes sense given the ABI, but an implementation aiming for better
> code after inlining could choose to use fxam instead of forcing a spill.
> 
> 4. I did not manage to find a copy of Figueroa's "When is double rounding
> innocuous", but I could cite e.g. "Innocuous Double Rounding of Basic
> Arithmetic Operations" by Pierre Roux instead (in i386/sqrtf.c).
> 
> If you like you can fetch a Git tree with my patches from 
> 
>     https://git.sr.ht/~amonakov/musl
> 
> (issue 1 is already corrected in that repo)

Hi. I'm trying to catch up on this and other patches after being sick
and not getting much done for a while. Is there anything else I should
be aware of before going forward with these?

One minor cosmetic change I'd like to make to commit names if you
don't object is changing "x86-family" to "x87-family" for the commits
that are "i386 + long double functions on x86_64" since I found it
confusing when there's one commit for "x86 family" then a separate one
for x86_64 versions of the same function. Let me know if you think of
any alternative that might be more clear than "x87"

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] [PATCH] math: move more x86-family fabs functions to C
  2020-01-06  8:40 ` [PATCH] math: move more x86-family fabs functions to C Alexander Monakov
@ 2020-03-21 17:06   ` Rich Felker
  0 siblings, 0 replies; 55+ messages in thread
From: Rich Felker @ 2020-03-21 17:06 UTC (permalink / raw)
  To: musl

On Mon, Jan 06, 2020 at 11:40:30AM +0300, Alexander Monakov wrote:
> diff --git a/src/math/i386/fabsf.c b/src/math/i386/fabsf.c
> new file mode 100644
> index 00000000..d07be321
> --- /dev/null
> +++ b/src/math/i386/fabsf.c
> @@ -0,0 +1,7 @@
> +#include <math.h>
> +
> +float fabs(float x)
         ^^^^
> +{
> +	__asm__ ("fabs" : "+t"(x));
> +	return x;
> +}

Should be fabsf -- found while smoketesting. Otherwise things built
ok for i386. Will followup with more results soon.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] [PATCH] math: move i386 sqrt to C
  2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
  2020-01-08  7:26   ` Rich Felker
@ 2020-03-21 17:53   ` Rich Felker
  2020-03-21 17:57     ` Rich Felker
  1 sibling, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-03-21 17:53 UTC (permalink / raw)
  To: musl

On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> ---
> Since union ldshape does not have a dedicated field for 32 least significant
> bits of the x87 long double mantissa, keeping the original approach with
> 
>     ux.i.m -= (fpsr & 0x200) - 0x100;
> 
> would lead to a 64-bit subtraction that is not trivial for the compiler to
> optimize to 32-bit subtraction as done in the original assembly. Therefore
> I have elected to change the approach and use
> 
>     ux.i.m ^= (fpsr & 0x200) + 0x200;
> 
> which is easier to optimize to a 32-bit rather than 64-bit xor.
> 
> Thoughts?

I'm getting test failures with sqrt and this seems to be the culprit
-- I don't think it's equivalent. The original version could offset
the value by +0x100 or -0x100 before rounding, and offsets in the
opposite direction of the rounding that already occurred. Your version
can only offset it by +0x200 or -0x400.

The (well, one) particular failing case is:

src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512
got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2

Here the mantissa is

fffffffffffffc00

and offset by -0x400 yields:

fffffffffffff800

which has exactly 53 bits and therefore does not round up like it
should.

I still like your approach better if there's a way to salvage it. Do
you see one?

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] [PATCH] math: move i386 sqrt to C
  2020-03-21 17:53   ` [musl] " Rich Felker
@ 2020-03-21 17:57     ` Rich Felker
  2020-03-21 20:30       ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-03-21 17:57 UTC (permalink / raw)
  To: musl

On Sat, Mar 21, 2020 at 01:53:51PM -0400, Rich Felker wrote:
> On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> > ---
> > Since union ldshape does not have a dedicated field for 32 least significant
> > bits of the x87 long double mantissa, keeping the original approach with
> > 
> >     ux.i.m -= (fpsr & 0x200) - 0x100;
> > 
> > would lead to a 64-bit subtraction that is not trivial for the compiler to
> > optimize to 32-bit subtraction as done in the original assembly. Therefore
> > I have elected to change the approach and use
> > 
> >     ux.i.m ^= (fpsr & 0x200) + 0x200;
> > 
> > which is easier to optimize to a 32-bit rather than 64-bit xor.
> > 
> > Thoughts?
> 
> I'm getting test failures with sqrt and this seems to be the culprit
> -- I don't think it's equivalent. The original version could offset
> the value by +0x100 or -0x100 before rounding, and offsets in the
> opposite direction of the rounding that already occurred. Your version
> can only offset it by +0x200 or -0x400.
> 
> The (well, one) particular failing case is:
> 
> src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512
> got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2
> 
> Here the mantissa is
> 
> fffffffffffffc00
> 
> and offset by -0x400 yields:
> 
> fffffffffffff800
> 
> which has exactly 53 bits and therefore does not round up like it
> should.
> 
> I still like your approach better if there's a way to salvage it. Do
> you see one?

And, I think I do. Changing it to:

    ux.i.m ^= (fpsr & 0x200) + 0x300;

yields an offset of +0x300 (^0x300) or -0x300 (^0x500). This looks
like it should work theoretically, and indeed it passes libc-test.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] [PATCH] math: move i386 sqrt to C
  2020-03-21 17:57     ` Rich Felker
@ 2020-03-21 20:30       ` Alexander Monakov
  0 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-03-21 20:30 UTC (permalink / raw)
  To: musl

On Sat, 21 Mar 2020, Rich Felker wrote:

> On Sat, Mar 21, 2020 at 01:53:51PM -0400, Rich Felker wrote:
> > On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> > > ---
> > > Since union ldshape does not have a dedicated field for 32 least significant
> > > bits of the x87 long double mantissa, keeping the original approach with
> > > 
> > >     ux.i.m -= (fpsr & 0x200) - 0x100;
> > > 
> > > would lead to a 64-bit subtraction that is not trivial for the compiler to
> > > optimize to 32-bit subtraction as done in the original assembly. Therefore
> > > I have elected to change the approach and use
> > > 
> > >     ux.i.m ^= (fpsr & 0x200) + 0x200;
> > > 
> > > which is easier to optimize to a 32-bit rather than 64-bit xor.
> > > 
> > > Thoughts?
> > 
> > I'm getting test failures with sqrt and this seems to be the culprit
> > -- I don't think it's equivalent. The original version could offset
> > the value by +0x100 or -0x100 before rounding, and offsets in the
> > opposite direction of the rounding that already occurred. Your version
> > can only offset it by +0x200 or -0x400.
> > 
> > The (well, one) particular failing case is:
> > 
> > src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512
> > got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2
> > 
> > Here the mantissa is
> > 
> > fffffffffffffc00
> > 
> > and offset by -0x400 yields:
> > 
> > fffffffffffff800
> > 
> > which has exactly 53 bits and therefore does not round up like it
> > should.
> > 
> > I still like your approach better if there's a way to salvage it. Do
> > you see one?
> 
> And, I think I do. Changing it to:
> 
>     ux.i.m ^= (fpsr & 0x200) + 0x300;
> 
> yields an offset of +0x300 (^0x300) or -0x300 (^0x500). This looks
> like it should work theoretically, and indeed it passes libc-test.

Indeed, I was considering only the default (to-nearest) rounding mode
and did not notice the problem for upwards rounding mode.

I think your change solves this nicely.

Thanks
Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-03-20 18:12                     ` Rich Felker
@ 2020-03-22  1:19                       ` Rich Felker
  2020-03-22 17:40                         ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-03-22  1:19 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2760 bytes --]

On Fri, Mar 20, 2020 at 02:12:50PM -0400, Rich Felker wrote:
> On Sun, Feb 23, 2020 at 07:14:08PM +0300, Alexander Monakov wrote:
> > On Sat, 22 Feb 2020, Rich Felker wrote:
> > 
> > > I'm not well acquainted with SSE, and only so-so with x87, so pretty
> > > much I'm reading them for higher-level issues with tooling
> > > compatibility (like the concerns I already raised and looked up and
> > > seem to have resolved about x87 constraints and non-GCC compilers) and
> > > logic, then planning to apply and test them. I think being aware of
> > > non-obvious mistake modes that have already been found would be a lot
> > > more useful than staring at things, especially if the bugs you've
> > > found are in subtleties of the insn behavior or constraint behavior.
> > 
> > Okay, thanks. I found two issues:
> > 
> > 1. i386 lrint* functions mistakenly used fistpll instead of fistpl (I
> > posted the fix for x32 asm after noticing my own mistake).
> > 
> > 2. Some functions bind a 32-bit lvalue as output for fnstsw %ax, which
> > as the operand says writes only 16 bits. They should be changed to either
> > use a 16-bit lvalue, or a zero-initialized 32-bit lvalue with "+a" constraint.
> > 
> > Plus, not bugs, but still worth mentioning:
> > 
> > 3. The new remquol in C could alternatively be implemented by using fxam
> > to extract sign bits instead of loading them from stack slots. The current
> > approach makes sense given the ABI, but an implementation aiming for better
> > code after inlining could choose to use fxam instead of forcing a spill.
> > 
> > 4. I did not manage to find a copy of Figueroa's "When is double rounding
> > innocuous", but I could cite e.g. "Innocuous Double Rounding of Basic
> > Arithmetic Operations" by Pierre Roux instead (in i386/sqrtf.c).
> > 
> > If you like you can fetch a Git tree with my patches from 
> > 
> >     https://git.sr.ht/~amonakov/musl
> > 
> > (issue 1 is already corrected in that repo)
> 
> Hi. I'm trying to catch up on this and other patches after being sick
> and not getting much done for a while. Is there anything else I should
> be aware of before going forward with these?
> 
> One minor cosmetic change I'd like to make to commit names if you
> don't object is changing "x86-family" to "x87-family" for the commits
> that are "i386 + long double functions on x86_64" since I found it
> confusing when there's one commit for "x86 family" then a separate one
> for x86_64 versions of the same function. Let me know if you think of
> any alternative that might be more clear than "x87"

With the fixups mentioned (included in attached patches), i386 and
x86_64 are passing libc-test and seem fine. OK if I merge them
(rebasing the fixups in as fixups)? With s/x86/x87/ naming?

Rich

[-- Attachment #2: 0001-fabsf-i386-fixup.patch --]
[-- Type: text/plain, Size: 552 bytes --]

From 3f99f4595aa3ce528008ee22d65512768376a9a6 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Sat, 21 Mar 2020 16:23:51 -0400
Subject: [PATCH 1/3] fabsf i386 fixup

---
 src/math/i386/fabsf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/math/i386/fabsf.c b/src/math/i386/fabsf.c
index d07be321..d882eee3 100644
--- a/src/math/i386/fabsf.c
+++ b/src/math/i386/fabsf.c
@@ -1,6 +1,6 @@
 #include <math.h>
 
-float fabs(float x)
+float fabsf(float x)
 {
 	__asm__ ("fabs" : "+t"(x));
 	return x;
-- 
2.21.0


[-- Attachment #3: 0002-sqrt-i386-fixup.patch --]
[-- Type: text/plain, Size: 747 bytes --]

From 7fd1bc3b8ecf61d758498edd7f4a75a079ee8df5 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Sat, 21 Mar 2020 16:24:04 -0400
Subject: [PATCH 2/3] sqrt i386 fixup

---
 src/math/i386/sqrt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/math/i386/sqrt.c b/src/math/i386/sqrt.c
index 619df056..934fbcca 100644
--- a/src/math/i386/sqrt.c
+++ b/src/math/i386/sqrt.c
@@ -10,6 +10,6 @@ double sqrt(double x)
 	/* Rounding to double would have encountered an exact halfway case.
 	   Adjust mantissa downwards if fsqrt rounded up, else upwards.
 	   (result of fsqrt could not have been exact) */
-	ux.i.m ^= (fpsr & 0x200) + 0x200;
+	ux.i.m ^= (fpsr & 0x200) + 0x300;
 	return (double)ux.f;
 }
-- 
2.21.0


[-- Attachment #4: 0003-sqrt-i386-fixup-2.patch --]
[-- Type: text/plain, Size: 649 bytes --]

From b60ddebd251ff28559b48aa758026da8fbbab3f7 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Sat, 21 Mar 2020 16:57:38 -0400
Subject: [PATCH 3/3] sqrt i386 fixup 2

---
 src/math/i386/sqrt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/math/i386/sqrt.c b/src/math/i386/sqrt.c
index 934fbcca..a34f4ce8 100644
--- a/src/math/i386/sqrt.c
+++ b/src/math/i386/sqrt.c
@@ -3,7 +3,7 @@
 double sqrt(double x)
 {
 	union ldshape ux;
-	unsigned fpsr;
+	unsigned short fpsr;
 	__asm__ ("fsqrt; fnstsw %%ax": "=t"(ux.f), "=a"(fpsr) : "0"(x));
 	if ((ux.i.m & 0x7ff) != 0x400)
 		return (double)ux.f;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-03-22  1:19                       ` Rich Felker
@ 2020-03-22 17:40                         ` Alexander Monakov
  2020-03-22 17:53                           ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-03-22 17:40 UTC (permalink / raw)
  To: musl

On Sat, 21 Mar 2020, Rich Felker wrote:

> With the fixups mentioned (included in attached patches), i386 and
> x86_64 are passing libc-test and seem fine. OK if I merge them
> (rebasing the fixups in as fixups)? With s/x86/x87/ naming?

Ack for x87 naming, fabsf fix and sqrt +0x300 fix.

Nack for sqrt 'unsigned short' fix, I recommend to consider
'unsigned fpsr = 0' and "+a" constraint, the resulting assembly is
much better (GCC doesn't seem to do a good job optimizing the 'unsigned short'
variant at all).

I also would like to see i386/sqrtf.c to explain why double rounding
is safe, either as a comment (my preference) or in the commit message.
I would write something like the following:

  The long double result has sufficient precision so that second rounding
  to float still keeps the returned value correctly rounded, see
  Pierre Roux, "Innocuous Double Rounding of Basic Arithmetic Operations".

(this paper elaborates the earlier [currently paywalled] paper by Figueroa)

Actually, may I ask why the initial commit did not mention that it relies
on this nontrivial property?

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-03-22 17:40                         ` Alexander Monakov
@ 2020-03-22 17:53                           ` Rich Felker
  2020-03-22 18:51                             ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-03-22 17:53 UTC (permalink / raw)
  To: musl

On Sun, Mar 22, 2020 at 08:40:19PM +0300, Alexander Monakov wrote:
> On Sat, 21 Mar 2020, Rich Felker wrote:
> 
> > With the fixups mentioned (included in attached patches), i386 and
> > x86_64 are passing libc-test and seem fine. OK if I merge them
> > (rebasing the fixups in as fixups)? With s/x86/x87/ naming?
> 
> Ack for x87 naming, fabsf fix and sqrt +0x300 fix.
> 
> Nack for sqrt 'unsigned short' fix, I recommend to consider
> 'unsigned fpsr = 0' and "+a" constraint, the resulting assembly is
> much better (GCC doesn't seem to do a good job optimizing the 'unsigned short'
> variant at all).

Sorry, I forgot to disassemble and check after making that change, and
indeed it is very bad.

How about just leaving it as-is? The value is masked such that upper
bits don't matter, and "whatever the register happened to contain
before" is a valid albeit ugly output from inline asm -- it doesn't
admit any compiler transformations that would cause inconsistent value
to be observed or other problems.

> I also would like to see i386/sqrtf.c to explain why double rounding
> is safe, either as a comment (my preference) or in the commit message.
> I would write something like the following:
> 
>   The long double result has sufficient precision so that second rounding
>   to float still keeps the returned value correctly rounded, see
>   Pierre Roux, "Innocuous Double Rounding of Basic Arithmetic Operations".

I'm fine with adding that as a comment.

> (this paper elaborates the earlier [currently paywalled] paper by Figueroa)
> 
> Actually, may I ask why the initial commit did not mention that it relies
> on this nontrivial property?

The need for fixup of double was only realized later, in commit
809556e60a3359f646946879dd94c4760e5b8e84. It was discussed at the time
that no action was needed for single, but it seems since there was no
change there wasn't any mention of it in log.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-03-22 17:53                           ` Rich Felker
@ 2020-03-22 18:51                             ` Alexander Monakov
  2020-03-22 19:10                               ` Rich Felker
  0 siblings, 1 reply; 55+ messages in thread
From: Alexander Monakov @ 2020-03-22 18:51 UTC (permalink / raw)
  To: musl

On Sun, 22 Mar 2020, Rich Felker wrote:

> > Nack for sqrt 'unsigned short' fix, I recommend to consider
> > 'unsigned fpsr = 0' and "+a" constraint, the resulting assembly is
> > much better (GCC doesn't seem to do a good job optimizing the 'unsigned short'
> > variant at all).
> 
> Sorry, I forgot to disassemble and check after making that change, and
> indeed it is very bad.
> 
> How about just leaving it as-is? The value is masked such that upper
> bits don't matter, and "whatever the register happened to contain
> before" is a valid albeit ugly output from inline asm -- it doesn't
> admit any compiler transformations that would cause inconsistent value
> to be observed or other problems.

Yes, I guess it's acceptable.

> > Actually, may I ask why the initial commit did not mention that it relies
> > on this nontrivial property?
> 
> The need for fixup of double was only realized later, in commit
> 809556e60a3359f646946879dd94c4760e5b8e84. It was discussed at the time
> that no action was needed for single, but it seems since there was no
> change there wasn't any mention of it in log.

Are you sure? single-precision sqrtf received a change just two days
prior to that in commit e0a54e6725 ("correct rounding for i387 sqrtf function")
which is the same day as when Stephen Canon supplied his answer in
https://stackoverflow.com/questions/9678224/is-there-any-way-to-get-correct-rounding-with-the-i387-fsqrt-instruction
(and also the same day you asked the question)

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-03-22 18:51                             ` Alexander Monakov
@ 2020-03-22 19:10                               ` Rich Felker
  2020-03-22 19:46                                 ` Alexander Monakov
  0 siblings, 1 reply; 55+ messages in thread
From: Rich Felker @ 2020-03-22 19:10 UTC (permalink / raw)
  To: musl

On Sun, Mar 22, 2020 at 09:51:09PM +0300, Alexander Monakov wrote:
> On Sun, 22 Mar 2020, Rich Felker wrote:
> 
> > > Nack for sqrt 'unsigned short' fix, I recommend to consider
> > > 'unsigned fpsr = 0' and "+a" constraint, the resulting assembly is
> > > much better (GCC doesn't seem to do a good job optimizing the 'unsigned short'
> > > variant at all).
> > 
> > Sorry, I forgot to disassemble and check after making that change, and
> > indeed it is very bad.
> > 
> > How about just leaving it as-is? The value is masked such that upper
> > bits don't matter, and "whatever the register happened to contain
> > before" is a valid albeit ugly output from inline asm -- it doesn't
> > admit any compiler transformations that would cause inconsistent value
> > to be observed or other problems.
> 
> Yes, I guess it's acceptable.
> 
> > > Actually, may I ask why the initial commit did not mention that it relies
> > > on this nontrivial property?
> > 
> > The need for fixup of double was only realized later, in commit
> > 809556e60a3359f646946879dd94c4760e5b8e84. It was discussed at the time
> > that no action was needed for single, but it seems since there was no
> > change there wasn't any mention of it in log.
> 
> Are you sure? single-precision sqrtf received a change just two days
> prior to that in commit e0a54e6725 ("correct rounding for i387 sqrtf function")
> which is the same day as when Stephen Canon supplied his answer in
> https://stackoverflow.com/questions/9678224/is-there-any-way-to-get-correct-rounding-with-the-i387-fsqrt-instruction
> (and also the same day you asked the question)

I just dug through the old gcc and glibc bz's it was mentioned on
(52593 and 14032 resp.) and didn't find anything, but I'm pretty sure
it was known in 2012 that it was a non-issue for single-precision.
What I didn't understand then was that the callee was responsible for
dropping the excess precision; I wrongly believed that "something" in
the caller would make it not-matter/get collapsed down to nominal
precision and rounded correctly. Commit e0a54e6725 and related commits
were fixing that mistake, which I'd since recognized was wrong but
hadn't yet recognized the urgency of fixing until I started seeing how
bad it could break the compiler.

Rich

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-03-22 19:10                               ` Rich Felker
@ 2020-03-22 19:46                                 ` Alexander Monakov
  0 siblings, 0 replies; 55+ messages in thread
From: Alexander Monakov @ 2020-03-22 19:46 UTC (permalink / raw)
  To: musl

On Sun, 22 Mar 2020, Rich Felker wrote:

> > Are you sure? single-precision sqrtf received a change just two days
> > prior to that in commit e0a54e6725 ("correct rounding for i387 sqrtf function")
> > which is the same day as when Stephen Canon supplied his answer in
> > https://stackoverflow.com/questions/9678224/is-there-any-way-to-get-correct-rounding-with-the-i387-fsqrt-instruction
> > (and also the same day you asked the question)
> 
> I just dug through the old gcc and glibc bz's it was mentioned on
> (52593 and 14032 resp.) and didn't find anything, but I'm pretty sure
> it was known in 2012 that it was a non-issue for single-precision.
> What I didn't understand then was that the callee was responsible for
> dropping the excess precision; I wrongly believed that "something" in
> the caller would make it not-matter/get collapsed down to nominal
> precision and rounded correctly. Commit e0a54e6725 and related commits
> were fixing that mistake, which I'd since recognized was wrong but
> hadn't yet recognized the urgency of fixing until I started seeing how
> bad it could break the compiler.

Yes, that stackoverflow thread is also from 2012. Reading the question,
it's clear that you are asking about both single and double precision
(no indication you're aware that double-rounding would be safe for floats),
then Stephen Canon answers, and soon after the aforementioned commit appears.

Anyhow - thanks for the extra historical background.

Alexander

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2020-03-22 19:46 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
2020-01-05 20:05   ` Rich Felker
2020-01-05 21:32     ` Alexander Monakov
2020-01-05 22:43       ` Rich Felker
2020-01-06  8:17         ` Alexander Monakov
2020-01-06  8:40 ` [PATCH] math: move more x86-family fabs functions to C Alexander Monakov
2020-03-21 17:06   ` [musl] " Rich Felker
2020-01-06 16:50 ` [PATCH] math: move trivial x86-family sqrt " Alexander Monakov
2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
2020-01-06 18:32   ` Pascal Cuoq
2020-01-09 15:55   ` Alexander Monakov
2020-01-09 17:00     ` Rich Felker
2020-01-09 21:00       ` Szabolcs Nagy
2020-01-09 22:00         ` Rich Felker
2020-01-09 23:18           ` Szabolcs Nagy
2020-01-10  2:07             ` Rich Felker
2020-01-10  9:17               ` Szabolcs Nagy
2020-01-14 17:59         ` [musl] " Alexander Monakov
2020-01-14 18:47           ` Szabolcs Nagy
2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
2020-01-08  7:26   ` Rich Felker
2020-03-21 17:53   ` [musl] " Rich Felker
2020-03-21 17:57     ` Rich Felker
2020-03-21 20:30       ` Alexander Monakov
2020-01-11 15:06 ` [PATCH] math: move x86_64 (l)lrint(f) functions " Alexander Monakov
2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
2020-01-11 16:07   ` Rich Felker
2020-01-11 16:22     ` Rich Felker
2020-01-14 11:54 ` [musl] [PATCH] math: move x86-family rint " Alexander Monakov
2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
2020-01-14 18:50   ` Szabolcs Nagy
2020-01-14 18:58     ` Rich Felker
2020-01-14 19:53       ` Alexander Monakov
2020-02-06 14:51         ` Rich Felker
2020-02-06 17:15           ` Alexander Monakov
2020-02-06 17:46             ` Rich Felker
2020-02-06 19:03               ` Rich Felker
2020-02-06 20:02                 ` Rich Felker
2020-02-06 22:08                   ` Szabolcs Nagy
2020-02-22 19:59             ` Rich Felker
2020-02-22 20:21               ` Alexander Monakov
2020-02-23  0:19                 ` Rich Felker
2020-02-23 16:14                   ` Alexander Monakov
2020-03-20 18:12                     ` Rich Felker
2020-03-22  1:19                       ` Rich Felker
2020-03-22 17:40                         ` Alexander Monakov
2020-03-22 17:53                           ` Rich Felker
2020-03-22 18:51                             ` Alexander Monakov
2020-03-22 19:10                               ` Rich Felker
2020-03-22 19:46                                 ` Alexander Monakov
2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
2020-01-15  6:54   ` Szabolcs Nagy
2020-01-15 15:44 ` [musl] [PATCH] math: move x86-family fmod " Alexander Monakov
2020-01-16 21:00 ` [musl] [PATCH] math: add x86_64 remquol Alexander Monakov

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).