mailing list of musl libc
 help / color / Atom feed
* math patches for moving bare asm to C inline asm
@ 2020-01-05 16:35 Alexander Monakov
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-05 16:35 UTC (permalink / raw)
  To: musl

Hi,

in a recent thread Rich reiterated that it would be nice to move
from numerous .s files under src/math/$arch to C files with inline
assembly. I'd like to help with that, but at the same time I'd
like to avoid getting again into a situation when I hear how nice
it would be to get something done, and provide patches only to get
(some of) them applied 1.5 years later :)

So to get going I intend to post patches one by one in reply to this thread,
starting from rather trivial updates to x86_64 math files in order to
get feedback about policies and formatting.

I don't expect any such patches applied before the next release, but
I'd like to ask for no huge delays on high-level questions.

Thanks.
Alexander


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
@ 2020-01-05 16:36 ` " Alexander Monakov
  2020-01-05 20:05   ` Rich Felker
  2020-01-06  8:40 ` [PATCH] math: move more x86-family fabs functions to C Alexander Monakov
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-05 16:36 UTC (permalink / raw)
  To: musl; +Cc: Alexander Monakov

[-- Attachment #1: Type: text/plain, Size: 636 bytes --]

---

Questions:

Why are there amd64-specific fabs implementations in the first place?
(Only) because GCC generated poor code for the generic C version?

Do annotations for mask manipulation in the patch help? Any way to make
them less ambiguous?


 src/math/x86_64/fabs.c  | 10 ++++++++++
 src/math/x86_64/fabs.s  |  9 ---------
 src/math/x86_64/fabsf.c | 10 ++++++++++
 src/math/x86_64/fabsf.s |  7 -------
 4 files changed, 20 insertions(+), 16 deletions(-)
 create mode 100644 src/math/x86_64/fabs.c
 delete mode 100644 src/math/x86_64/fabs.s
 create mode 100644 src/math/x86_64/fabsf.c
 delete mode 100644 src/math/x86_64/fabsf.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0001-math-move-x86_64-fabs-fabsf-to-C-with-inline-asm.patch --]
[-- Type: text/x-patch; name="0001-math-move-x86_64-fabs-fabsf-to-C-with-inline-asm.patch", Size: 1420 bytes --]

diff --git a/src/math/x86_64/fabs.c b/src/math/x86_64/fabs.c
new file mode 100644
index 00000000..16562477
--- /dev/null
+++ b/src/math/x86_64/fabs.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+double fabs(double x)
+{
+	double t;
+	__asm__ ("pcmpeqd %0, %0" : "=x"(t));          // t = ~0
+	__asm__ ("psrlq   $1, %0" : "+x"(t));          // t >>= 1
+	__asm__ ("andps   %1, %0" : "+x"(x) : "x"(t)); // x &= t
+	return x;
+}
diff --git a/src/math/x86_64/fabs.s b/src/math/x86_64/fabs.s
deleted file mode 100644
index 5715005e..00000000
--- a/src/math/x86_64/fabs.s
+++ /dev/null
@@ -1,9 +0,0 @@
-.global fabs
-.type fabs,@function
-fabs:
-	xor %eax,%eax
-	dec %rax
-	shr %rax
-	movq %rax,%xmm1
-	andpd %xmm1,%xmm0
-	ret
diff --git a/src/math/x86_64/fabsf.c b/src/math/x86_64/fabsf.c
new file mode 100644
index 00000000..36ea7481
--- /dev/null
+++ b/src/math/x86_64/fabsf.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+float fabsf(float x)
+{
+	float t;
+	__asm__ ("pcmpeqd %0, %0" : "=x"(t));          // t = ~0
+	__asm__ ("psrld   $1, %0" : "+x"(t));          // t >>= 1
+	__asm__ ("andps   %1, %0" : "+x"(x) : "x"(t)); // x &= t
+	return x;
+}
diff --git a/src/math/x86_64/fabsf.s b/src/math/x86_64/fabsf.s
deleted file mode 100644
index 501a1f17..00000000
--- a/src/math/x86_64/fabsf.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global fabsf
-.type fabsf,@function
-fabsf:
-	mov $0x7fffffff,%eax
-	movq %rax,%xmm1
-	andps %xmm1,%xmm0
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
@ 2020-01-05 20:05   ` Rich Felker
  2020-01-05 21:32     ` Alexander Monakov
  0 siblings, 1 reply; 34+ messages in thread
From: Rich Felker @ 2020-01-05 20:05 UTC (permalink / raw)
  To: musl

On Sun, Jan 05, 2020 at 07:36:39PM +0300, Alexander Monakov wrote:
> ---
> 
> Questions:
> 
> Why are there amd64-specific fabs implementations in the first place?
> (Only) because GCC generated poor code for the generic C version?

I think so. It generates:

0000000000000000 <fabs>:
   0:   66 48 0f 7e c0          movq   %xmm0,%rax
   5:   48 ba ff ff ff ff ff    movabs $0x7fffffffffffffff,%rdx
   c:   ff ff 7f
   f:   48 21 d0                and    %rdx,%rax
  12:   48 89 44 24 f8          mov    %rax,-0x8(%rsp)
  17:   f2 0f 10 44 24 f8       movsd  -0x8(%rsp),%xmm0
  1d:   c3                      retq

> Do annotations for mask manipulation in the patch help? Any way to make
> them less ambiguous?

I think so. I like how you did individual asm statements with
dependency relationship between them so compiler could even schedule
them if it likes. I wonder if you could just write 0x7fffffffffffffff
as an operand and have the compiler load it, though.

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 20:05   ` Rich Felker
@ 2020-01-05 21:32     ` Alexander Monakov
  2020-01-05 22:43       ` Rich Felker
  0 siblings, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-05 21:32 UTC (permalink / raw)
  To: musl



On Sun, 5 Jan 2020, Rich Felker wrote:

> On Sun, Jan 05, 2020 at 07:36:39PM +0300, Alexander Monakov wrote:
> > ---
> > 
> > Questions:
> > 
> > Why are there amd64-specific fabs implementations in the first place?
> > (Only) because GCC generated poor code for the generic C version?
> 
> I think so. It generates:
[snip]

*nod* In my eyes that's a missed optimization, but one that is probably not
going to be fully fixed anytime soon, although for the particular case of
generic fabs gcc-9 has improved:

        movq    %xmm0, %rax
        btrq    $63, %rax
        movq    %rax, %xmm0			

On Aarch64 GCC seems to do better with float bit manipulations (can emit code
that does them on vector registers directly without copying to/from general
registers). On x86 LLVM compiles fabs well, but not copysign.

(ideally the language would allow to express bit manipulations of floats
directly, then compilers probably would have better support as well)

FWIW GCC bugreport is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
but I'm not holding my breath.

By this logic, specialized implementations of copysign are also desirable,
right?  (2 instructions longer than fabs, except for long double)

> > Do annotations for mask manipulation in the patch help? Any way to make
> > them less ambiguous?
> 
> I think so. I like how you did individual asm statements with
> dependency relationship between them so compiler could even schedule
> them if it likes. I wonder if you could just write 0x7fffffffffffffff
> as an operand and have the compiler load it, though.

In this case the mask is so simple that building it with pcmpeq-psrl is cheaper
than loading from memory or moving from a general register. So not using an
immediate is intentional.

Alexander


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 21:32     ` Alexander Monakov
@ 2020-01-05 22:43       ` Rich Felker
  2020-01-06  8:17         ` Alexander Monakov
  0 siblings, 1 reply; 34+ messages in thread
From: Rich Felker @ 2020-01-05 22:43 UTC (permalink / raw)
  To: musl

On Mon, Jan 06, 2020 at 12:32:38AM +0300, Alexander Monakov wrote:
> 
> 
> On Sun, 5 Jan 2020, Rich Felker wrote:
> 
> > On Sun, Jan 05, 2020 at 07:36:39PM +0300, Alexander Monakov wrote:
> > > ---
> > > 
> > > Questions:
> > > 
> > > Why are there amd64-specific fabs implementations in the first place?
> > > (Only) because GCC generated poor code for the generic C version?
> > 
> > I think so. It generates:
> [snip]
> 
> *nod* In my eyes that's a missed optimization, but one that is probably not
> going to be fully fixed anytime soon, although for the particular case of
> generic fabs gcc-9 has improved:
> 
>         movq    %xmm0, %rax
>         btrq    $63, %rax
>         movq    %rax, %xmm0			
> 
> On Aarch64 GCC seems to do better with float bit manipulations (can emit code
> that does them on vector registers directly without copying to/from general
> registers). On x86 LLVM compiles fabs well, but not copysign.
> 
> (ideally the language would allow to express bit manipulations of floats
> directly, then compilers probably would have better support as well)
> 
> FWIW GCC bugreport is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039
> but I'm not holding my breath.
> 
> By this logic, specialized implementations of copysign are also desirable,
> right?  (2 instructions longer than fabs, except for long double)

I'm not sure if "this logic" carries over. fabs is a common operation
(ideally compiler would inline it anyway in the caller, though).
copysign not so much.

Really I'm not even sure it makes sense to have the asm here at all
for fabs either, but perhaps with the gratuitous stack access in the
older-GCC version it does...?

> > > Do annotations for mask manipulation in the patch help? Any way to make
> > > them less ambiguous?
> > 
> > I think so. I like how you did individual asm statements with
> > dependency relationship between them so compiler could even schedule
> > them if it likes. I wonder if you could just write 0x7fffffffffffffff
> > as an operand and have the compiler load it, though.
> 
> In this case the mask is so simple that building it with pcmpeq-psrl is cheaper
> than loading from memory or moving from a general register. So not using an
> immediate is intentional.

OK, I was figuring the compiler might be able to generate it easily
with vector insns if there were no "non-vector" arithmetic/bitwise ops
involved in the use of the result, but that's probably expecting too
much...

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm
  2020-01-05 22:43       ` Rich Felker
@ 2020-01-06  8:17         ` Alexander Monakov
  0 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-06  8:17 UTC (permalink / raw)
  To: musl

On Sun, 5 Jan 2020, Rich Felker wrote:

> I'm not sure if "this logic" carries over. fabs is a common operation
> (ideally compiler would inline it anyway in the caller, though).
> copysign not so much.

Indeed, in practice we want compilers to recognize and inline simple math
functions, especially on x86 where all float registers are call-clobbered.
This makes it moot, except maybe for musl-internal uses like in floatscan.c,
but there the right solution is probably to use __builtin_ forms if available.

(to be clear: I think musl may not implement math functions in terms of
math builtins because it might result in a circular dependency, but using
builtins for complex math and higher-level functions like in floatscan.c
should be fair game)

For functions like sqrt GCC can be changed to default to -fno-math-errno
on *-musl targets, as progress on flipping it globally seems slow.

> Really I'm not even sure it makes sense to have the asm here at all
> for fabs either, but perhaps with the gratuitous stack access in the
> older-GCC version it does...?

That's why I'm asking, the situation just looks ambiguous to me, I can't
deduce the intent and I could imagine arguments both ways. I think I'd
prefer a more inclusive playground. Please come up with some policy or
guidelines if possible.

Alexander


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] math: move more x86-family fabs functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
@ 2020-01-06  8:40 ` Alexander Monakov
  2020-01-06 16:50 ` [PATCH] math: move trivial x86-family sqrt " Alexander Monakov
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-06  8:40 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

---

Note I'm saving some effort by not touching x32 files, presumably it's
easier to bulk-update them after the dust settles.

 src/math/i386/fabs.c    | 7 +++++++
 src/math/i386/fabs.s    | 6 ------
 src/math/i386/fabsf.c   | 7 +++++++
 src/math/i386/fabsf.s   | 6 ------
 src/math/i386/fabsl.c   | 7 +++++++
 src/math/i386/fabsl.s   | 6 ------
 src/math/x86_64/fabsl.c | 7 +++++++
 src/math/x86_64/fabsl.s | 6 ------
 8 files changed, 28 insertions(+), 24 deletions(-)
 create mode 100644 src/math/i386/fabs.c
 delete mode 100644 src/math/i386/fabs.s
 create mode 100644 src/math/i386/fabsf.c
 delete mode 100644 src/math/i386/fabsf.s
 create mode 100644 src/math/i386/fabsl.c
 delete mode 100644 src/math/i386/fabsl.s
 create mode 100644 src/math/x86_64/fabsl.c
 delete mode 100644 src/math/x86_64/fabsl.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0002-math-move-more-x86-family-fabs-functions-to-C.patch --]
[-- Type: text/x-patch; name="0002-math-move-more-x86-family-fabs-functions-to-C.patch", Size: 2007 bytes --]

diff --git a/src/math/i386/fabs.c b/src/math/i386/fabs.c
new file mode 100644
index 00000000..39672786
--- /dev/null
+++ b/src/math/i386/fabs.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double fabs(double x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/fabs.s b/src/math/i386/fabs.s
deleted file mode 100644
index d66ea9a1..00000000
--- a/src/math/i386/fabs.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabs
-.type fabs,@function
-fabs:
-	fldl 4(%esp)
-	fabs
-	ret
diff --git a/src/math/i386/fabsf.c b/src/math/i386/fabsf.c
new file mode 100644
index 00000000..d07be321
--- /dev/null
+++ b/src/math/i386/fabsf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float fabs(float x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/fabsf.s b/src/math/i386/fabsf.s
deleted file mode 100644
index a981c422..00000000
--- a/src/math/i386/fabsf.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabsf
-.type fabsf,@function
-fabsf:
-	flds 4(%esp)
-	fabs
-	ret
diff --git a/src/math/i386/fabsl.c b/src/math/i386/fabsl.c
new file mode 100644
index 00000000..cc1c9ed9
--- /dev/null
+++ b/src/math/i386/fabsl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double fabsl(long double x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/fabsl.s b/src/math/i386/fabsl.s
deleted file mode 100644
index ceef9e4c..00000000
--- a/src/math/i386/fabsl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabsl
-.type fabsl,@function
-fabsl:
-	fldt 4(%esp)
-	fabs
-	ret
diff --git a/src/math/x86_64/fabsl.c b/src/math/x86_64/fabsl.c
new file mode 100644
index 00000000..cc1c9ed9
--- /dev/null
+++ b/src/math/x86_64/fabsl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double fabsl(long double x)
+{
+	__asm__ ("fabs" : "+t"(x));
+	return x;
+}
diff --git a/src/math/x86_64/fabsl.s b/src/math/x86_64/fabsl.s
deleted file mode 100644
index 4e7ab525..00000000
--- a/src/math/x86_64/fabsl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global fabsl
-.type fabsl,@function
-fabsl:
-	fldt 8(%rsp)
-	fabs
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] math: move trivial x86-family sqrt functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
  2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
  2020-01-06  8:40 ` [PATCH] math: move more x86-family fabs functions to C Alexander Monakov
@ 2020-01-06 16:50 ` " Alexander Monakov
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-06 16:50 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

---
This does not touch i386 sqrt and sqrtf, which are non-trivial.

 src/math/i386/sqrtl.c   | 7 +++++++
 src/math/i386/sqrtl.s   | 5 -----
 src/math/x86_64/sqrt.c  | 7 +++++++
 src/math/x86_64/sqrt.s  | 4 ----
 src/math/x86_64/sqrtf.c | 7 +++++++
 src/math/x86_64/sqrtf.s | 4 ----
 src/math/x86_64/sqrtl.c | 7 +++++++
 src/math/x86_64/sqrtl.s | 5 -----
 8 files changed, 28 insertions(+), 18 deletions(-)
 create mode 100644 src/math/i386/sqrtl.c
 delete mode 100644 src/math/i386/sqrtl.s
 create mode 100644 src/math/x86_64/sqrt.c
 delete mode 100644 src/math/x86_64/sqrt.s
 create mode 100644 src/math/x86_64/sqrtf.c
 delete mode 100644 src/math/x86_64/sqrtf.s
 create mode 100644 src/math/x86_64/sqrtl.c
 delete mode 100644 src/math/x86_64/sqrtl.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0003-math-move-trivial-x86-family-sqrt-functions-to-C.patch --]
[-- Type: text/x-patch; name="0003-math-move-trivial-x86-family-sqrt-functions-to-C.patch", Size: 2065 bytes --]

diff --git a/src/math/i386/sqrtl.c b/src/math/i386/sqrtl.c
new file mode 100644
index 00000000..864cfcc4
--- /dev/null
+++ b/src/math/i386/sqrtl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double sqrtl(long double x)
+{
+	__asm__ ("fsqrt" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/sqrtl.s b/src/math/i386/sqrtl.s
deleted file mode 100644
index e0d42616..00000000
--- a/src/math/i386/sqrtl.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global sqrtl
-.type sqrtl,@function
-sqrtl:	fldt 4(%esp)
-	fsqrt
-	ret
diff --git a/src/math/x86_64/sqrt.c b/src/math/x86_64/sqrt.c
new file mode 100644
index 00000000..657e09e3
--- /dev/null
+++ b/src/math/x86_64/sqrt.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double sqrt(double x)
+{
+	__asm__ ("sqrtsd %1, %0" : "=x"(x) : "x"(x));
+	return x;
+}
diff --git a/src/math/x86_64/sqrt.s b/src/math/x86_64/sqrt.s
deleted file mode 100644
index d3c609f9..00000000
--- a/src/math/x86_64/sqrt.s
+++ /dev/null
@@ -1,4 +0,0 @@
-.global sqrt
-.type sqrt,@function
-sqrt:	sqrtsd %xmm0, %xmm0
-	ret
diff --git a/src/math/x86_64/sqrtf.c b/src/math/x86_64/sqrtf.c
new file mode 100644
index 00000000..720baec6
--- /dev/null
+++ b/src/math/x86_64/sqrtf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float sqrtf(float x)
+{
+	__asm__ ("sqrtss %1, %0" : "=x"(x) : "x"(x));
+	return x;
+}
diff --git a/src/math/x86_64/sqrtf.s b/src/math/x86_64/sqrtf.s
deleted file mode 100644
index eec48c60..00000000
--- a/src/math/x86_64/sqrtf.s
+++ /dev/null
@@ -1,4 +0,0 @@
-.global sqrtf
-.type sqrtf,@function
-sqrtf:  sqrtss %xmm0, %xmm0
-	ret
diff --git a/src/math/x86_64/sqrtl.c b/src/math/x86_64/sqrtl.c
new file mode 100644
index 00000000..864cfcc4
--- /dev/null
+++ b/src/math/x86_64/sqrtl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double sqrtl(long double x)
+{
+	__asm__ ("fsqrt" : "+t"(x));
+	return x;
+}
diff --git a/src/math/x86_64/sqrtl.s b/src/math/x86_64/sqrtl.s
deleted file mode 100644
index 23cd687d..00000000
--- a/src/math/x86_64/sqrtl.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global sqrtl
-.type sqrtl,@function
-sqrtl:	fldt 8(%rsp)
-	fsqrt
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] math: move i386 sqrtf to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (2 preceding siblings ...)
  2020-01-06 16:50 ` [PATCH] math: move trivial x86-family sqrt " Alexander Monakov
@ 2020-01-06 17:43 ` " Alexander Monakov
  2020-01-06 18:32   ` Pascal Cuoq
  2020-01-09 15:55   ` Alexander Monakov
  2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-06 17:43 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 460 bytes --]

---
Yay, first not entirely trivial function.

Apparently that double rounding yields the correct result here was
suggested by Stephen Canon in https://stackoverflow.com/a/9689746/4755075
Should his contribution be mentioned somehow in the source code?

 src/math/i386/sqrtf.c | 8 ++++++++
 src/math/i386/sqrtf.s | 7 -------
 2 files changed, 8 insertions(+), 7 deletions(-)
 create mode 100644 src/math/i386/sqrtf.c
 delete mode 100644 src/math/i386/sqrtf.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0004-math-move-i386-sqrtf-to-C.patch --]
[-- Type: text/x-patch; name="0004-math-move-i386-sqrtf-to-C.patch", Size: 559 bytes --]

diff --git a/src/math/i386/sqrtf.c b/src/math/i386/sqrtf.c
new file mode 100644
index 00000000..2bcc7754
--- /dev/null
+++ b/src/math/i386/sqrtf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+float sqrtf(float x)
+{
+	long double t;
+	__asm__ ("fsqrt" : "=t"(t) : "0"(x));
+	return (float)t;
+}
diff --git a/src/math/i386/sqrtf.s b/src/math/i386/sqrtf.s
deleted file mode 100644
index 9e944f45..00000000
--- a/src/math/i386/sqrtf.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global sqrtf
-.type sqrtf,@function
-sqrtf:	flds 4(%esp)
-	fsqrt
-	fstps 4(%esp)
-	flds 4(%esp)
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
@ 2020-01-06 18:32   ` Pascal Cuoq
  2020-01-09 15:55   ` Alexander Monakov
  1 sibling, 0 replies; 34+ messages in thread
From: Pascal Cuoq @ 2020-01-06 18:32 UTC (permalink / raw)
  To: musl



> On 6 Jan 2020, at 12:44, Alexander Monakov <amonakov@ispras.ru> wrote:
> 
> Apparently that double rounding yields the correct result here was
> suggested by Stephen Canon in https://stackoverflow.com/a/9689746/4755075
> Should his contribution be mentioned somehow in the source code?

You should rather credit Figueroa, for instance in the article “when is double rounding innocuous?”. This would be more of a primary source, and Figueroa wrote demonstrations.

Posting from phone, I apologize if I got something wrong. I'm pretty confident Stephen was implicitly citing Figueroa's results though.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] math: move i386 sqrt to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (3 preceding siblings ...)
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
@ 2020-01-07 13:06 ` " Alexander Monakov
  2020-01-08  7:26   ` Rich Felker
  2020-01-11 15:06 ` [PATCH] math: move x86_64 (l)lrint(f) functions " Alexander Monakov
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-07 13:06 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

---
Since union ldshape does not have a dedicated field for 32 least significant
bits of the x87 long double mantissa, keeping the original approach with

    ux.i.m -= (fpsr & 0x200) - 0x100;

would lead to a 64-bit subtraction that is not trivial for the compiler to
optimize to 32-bit subtraction as done in the original assembly. Therefore
I have elected to change the approach and use

    ux.i.m ^= (fpsr & 0x200) + 0x200;

which is easier to optimize to a 32-bit rather than 64-bit xor.

Thoughts?

 src/math/i386/sqrt.c | 15 +++++++++++++++
 src/math/i386/sqrt.s | 21 ---------------------
 2 files changed, 15 insertions(+), 21 deletions(-)
 create mode 100644 src/math/i386/sqrt.c
 delete mode 100644 src/math/i386/sqrt.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0005-math-move-i386-sqrt-to-C.patch --]
[-- Type: text/x-patch; name="0005-math-move-i386-sqrt-to-C.patch", Size: 1103 bytes --]

diff --git a/src/math/i386/sqrt.c b/src/math/i386/sqrt.c
new file mode 100644
index 00000000..619df056
--- /dev/null
+++ b/src/math/i386/sqrt.c
@@ -0,0 +1,15 @@
+#include "libm.h"
+
+double sqrt(double x)
+{
+	union ldshape ux;
+	unsigned fpsr;
+	__asm__ ("fsqrt; fnstsw %%ax": "=t"(ux.f), "=a"(fpsr) : "0"(x));
+	if ((ux.i.m & 0x7ff) != 0x400)
+		return (double)ux.f;
+	/* Rounding to double would have encountered an exact halfway case.
+	   Adjust mantissa downwards if fsqrt rounded up, else upwards.
+	   (result of fsqrt could not have been exact) */
+	ux.i.m ^= (fpsr & 0x200) + 0x200;
+	return (double)ux.f;
+}
diff --git a/src/math/i386/sqrt.s b/src/math/i386/sqrt.s
deleted file mode 100644
index 57837e25..00000000
--- a/src/math/i386/sqrt.s
+++ /dev/null
@@ -1,21 +0,0 @@
-.global sqrt
-.type sqrt,@function
-sqrt:	fldl 4(%esp)
-	fsqrt
-	fnstsw %ax
-	sub $12,%esp
-	fld %st(0)
-	fstpt (%esp)
-	mov (%esp),%ecx
-	and $0x7ff,%ecx
-	cmp $0x400,%ecx
-	jnz 1f
-	and $0x200,%eax
-	sub $0x100,%eax
-	sub %eax,(%esp)
-	fstp %st(0)
-	fldt (%esp)
-1:	add $12,%esp
-	fstpl 4(%esp)
-	fldl 4(%esp)
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrt to C
  2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
@ 2020-01-08  7:26   ` Rich Felker
  0 siblings, 0 replies; 34+ messages in thread
From: Rich Felker @ 2020-01-08  7:26 UTC (permalink / raw)
  To: musl

On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> ---
> Since union ldshape does not have a dedicated field for 32 least significant
> bits of the x87 long double mantissa, keeping the original approach with
> 
>     ux.i.m -= (fpsr & 0x200) - 0x100;
> 
> would lead to a 64-bit subtraction that is not trivial for the compiler to
> optimize to 32-bit subtraction as done in the original assembly. Therefore
> I have elected to change the approach and use
> 
>     ux.i.m ^= (fpsr & 0x200) + 0x200;
> 
> which is easier to optimize to a 32-bit rather than 64-bit xor.
> 
> Thoughts?

I think it looks like a good change. Ideally the compiler would see,
since the branch was taken where the low bits are 0x400, that
subtracting a value less than 0x400 can't borrow from the high 32
bits. But I don't think current compilers do such range/value
analysis, and it doesn't seem like there are anny downsides to your
approach anyway.

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
  2020-01-06 18:32   ` Pascal Cuoq
@ 2020-01-09 15:55   ` Alexander Monakov
  2020-01-09 17:00     ` Rich Felker
  1 sibling, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-09 15:55 UTC (permalink / raw)
  To: musl

I should have asked earlier, but - is everyone happy with this style of
expressing removal of excess precision, or should I use eval_as_float?

+{
+       long double t;
[...]
+       return (float)t;
+}

Alexander


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 15:55   ` Alexander Monakov
@ 2020-01-09 17:00     ` Rich Felker
  2020-01-09 21:00       ` Szabolcs Nagy
  0 siblings, 1 reply; 34+ messages in thread
From: Rich Felker @ 2020-01-09 17:00 UTC (permalink / raw)
  To: musl

On Thu, Jan 09, 2020 at 06:55:11PM +0300, Alexander Monakov wrote:
> I should have asked earlier, but - is everyone happy with this style of
> expressing removal of excess precision, or should I use eval_as_float?
> 
> +{
> +       long double t;
> [...]
> +       return (float)t;
> +}

musl requires C99+ with conforming floating point behavior so I think
this is okay, and I'd prefer to avoid having dependency on libm.h
machinery that's not strictly needed in src/math/$(ARCH) since it
increases the cost of changing that machinery (requires looking at
arch-specific files). My recollection was that eval_as_float is mostly
an artifact of nsz's math functions supporting and being compatible
with somewhat non-conforming compilers.

If we want to ensure correct rounding (important for sqrt[f]) even on
broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
perhaps we should just do the store from asm too?

Note that eval_as_float only helps if -ffloat-store is used, which is
a nasty hack and also nonconforming, arguably worse than the behavior
without it, so we should probably drop use of that as a fallback, and
use fp_barrier[f] instead if needed.

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 17:00     ` Rich Felker
@ 2020-01-09 21:00       ` Szabolcs Nagy
  2020-01-09 22:00         ` Rich Felker
  2020-01-14 17:59         ` [musl] " Alexander Monakov
  0 siblings, 2 replies; 34+ messages in thread
From: Szabolcs Nagy @ 2020-01-09 21:00 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-01-09 12:00:02 -0500]:
> On Thu, Jan 09, 2020 at 06:55:11PM +0300, Alexander Monakov wrote:
> > I should have asked earlier, but - is everyone happy with this style of
> > expressing removal of excess precision, or should I use eval_as_float?
> > 
> > +{
> > +       long double t;
> > [...]
> > +       return (float)t;
> > +}
> 
> musl requires C99+ with conforming floating point behavior so I think
> this is okay, and I'd prefer to avoid having dependency on libm.h
> machinery that's not strictly needed in src/math/$(ARCH) since it
> increases the cost of changing that machinery (requires looking at
> arch-specific files). My recollection was that eval_as_float is mostly
> an artifact of nsz's math functions supporting and being compatible
> with somewhat non-conforming compilers.

it's an artifact of glibc using -fexcess-precision=fast,
so the compiler can do clever transformations (without
relying on float_t and double_t), and then they annotate
where the excess precision really has to be dropped
(using inline asm).

i wanted to upstream that code into glibc as well as other
math libraries and c99 standard excess precision handling
is not the default in most environments (e.g. gcc default
-std=gnu* or iso c++) and actually the annotation is rarely
needed in math functions: only exact arithmetic tricks
need it or at return from public api functions (like in
this case) to force a final rounding for side-effects
or for sane call abi.

so for me it made sense to add explicit annotations which
document where this issue is really relevant and allows
using the code in more environments by just redefining
the eval_as_float etc calls. (on systems where this is
relevant results can be different compared to other targets
anyway, so allowing more variance depending on different
excess-precision settings is not that big deal.)

about dropping excess precision at return:

c99 required cast at return, but that was considered to be
a bug so in c11 it's no longer required: return now must
round. of course in non-standard mode return continues to
keep excess precision, which seems to require to always
annotate returns, but in most math functions all relevant
fenv side-effects of the final rounding would need special
handling anyway to set errno (assuming you still care about
errno: glibc does so my code also cares about it), so
overflow etc can't happen in the normal return paths, all
such cases are directed to special code that needs special
annotations anyway, so return annotation rarely comes up
(other than with correctly rounded functions like sqrt).

> 
> If we want to ensure correct rounding (important for sqrt[f]) even on
> broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
> perhaps we should just do the store from asm too?

that would be a bit safer, but then correct compiler would
store twice i think. it's hard to get excited about this
issue: it only matters on m68k and i386 which are not the
main targets for new float code (and old code had to deal
with this and bigger brokenness already).

> 
> Note that eval_as_float only helps if -ffloat-store is used, which is
> a nasty hack and also nonconforming, arguably worse than the behavior
> without it, so we should probably drop use of that as a fallback, and
> use fp_barrier[f] instead if needed.

i think -ffloat-store almost always drops excess precision
including returns and assignments, so with that no
annotation is needed. but yes the way the annotation is
defined now is not useful against broken compilers or
non-standard excess precision setting, in glibc the
annotation is defined differently (with inline asm).

btw, i'm not against the cast, but in my optimized-routines
code i will keep using my annotations.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 21:00       ` Szabolcs Nagy
@ 2020-01-09 22:00         ` Rich Felker
  2020-01-09 23:18           ` Szabolcs Nagy
  2020-01-14 17:59         ` [musl] " Alexander Monakov
  1 sibling, 1 reply; 34+ messages in thread
From: Rich Felker @ 2020-01-09 22:00 UTC (permalink / raw)
  To: musl

On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > If we want to ensure correct rounding (important for sqrt[f]) even on
> > broken compilers (some ppl use gcc 3.x, and pcc may be broken too?)
> > perhaps we should just do the store from asm too?
> 
> that would be a bit safer, but then correct compiler would
> store twice i think.

If you did something like:

	float y = expr_with_excess_precision;
	__asm__( "" : "+m"(y));
	return y;

then I think you'd get just one store and one load, as intended. It
seems to work as intended here. Oddly though my local gcc (7.3) is
gratuitously pushing/popping a single gpr to align stack to 8 (but not
16) despite being a leaf function. No idea why.

> it's hard to get excited about this
> issue: it only matters on m68k and i386 which are not the
> main targets for new float code (and old code had to deal
> with this and bigger brokenness already).

Indeed, but context of present thread is getting rid of the i386 asm
files so it's relevant here.

> > Note that eval_as_float only helps if -ffloat-store is used, which is
> > a nasty hack and also nonconforming, arguably worse than the behavior
> > without it, so we should probably drop use of that as a fallback, and
> > use fp_barrier[f] instead if needed.
> 
> i think -ffloat-store almost always drops excess precision
> including returns and assignments, so with that no
> annotation is needed. but yes the way the annotation is
> defined now is not useful against broken compilers or
> non-standard excess precision setting, in glibc the
> annotation is defined differently (with inline asm).

I was thinking in the context of wanting to remove from configure the:

|| { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }

which is probably doing more harm than good. Do you know if there are
things that'd break if we did that? I think eval_as_float should
probably be defined as fp_barrierf to make it safe in your code,
conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 22:00         ` Rich Felker
@ 2020-01-09 23:18           ` Szabolcs Nagy
  2020-01-10  2:07             ` Rich Felker
  0 siblings, 1 reply; 34+ messages in thread
From: Szabolcs Nagy @ 2020-01-09 23:18 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-01-09 17:00:14 -0500]:
> On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > > Note that eval_as_float only helps if -ffloat-store is used, which is
> > > a nasty hack and also nonconforming, arguably worse than the behavior
> > > without it, so we should probably drop use of that as a fallback, and
> > > use fp_barrier[f] instead if needed.
> > 
> > i think -ffloat-store almost always drops excess precision
> > including returns and assignments, so with that no
> > annotation is needed. but yes the way the annotation is
> > defined now is not useful against broken compilers or
> > non-standard excess precision setting, in glibc the
> > annotation is defined differently (with inline asm).
> 
> I was thinking in the context of wanting to remove from configure the:
> 
> || { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }
> 
> which is probably doing more harm than good. Do you know if there are
> things that'd break if we did that? I think eval_as_float should
> probably be defined as fp_barrierf to make it safe in your code,
> conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).

i think -fexcess-precision=standard was introduced in
gcc 4.5 and to get reliable behaviour before that we
needed -ffloat-store. the freebsd math code had
volatile hacks to avoid -ffloat-store but those were
incomplete (there were only a few bugs e.g. see commit
c4359e01303da2755fe7e8033826b132eb3659b1 freebsd sets
x87 precision to double so for them some of these were
not issues in practice).

since we had -ffloat-store i turned off the volatile
hacks in commit 6d3f1a39c14b12026df84f386875b094e3652990
and later completely removed the annotations in commit
9b0fcb441a44456c7b071c7cdaf90403f81ec05a

on new compilers -fexcess-precision=standard is used,
but that turned out to do too many stores on the fdlibm
code (which is why glibc kept using =fast), so in commit
e216951f509b71da193da2fc63e25b998740d58b i started using
float_t and double_t to get fast code in standard mode.
(of course this made things worse for -ffloat-store).

i think we would need to add back the old annotations
to make old compilers safe without -ffloat-store.
(fdlibm often raises fenv exceptions via a final rounding
before return, those could be often handled more cleanly
by __math_oflow etc helpers, but since it was not designed
for inline errno handling some normal return paths can
raise fp exceptions too and thus need eval_as_* annotation).


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-09 23:18           ` Szabolcs Nagy
@ 2020-01-10  2:07             ` Rich Felker
  2020-01-10  9:17               ` Szabolcs Nagy
  0 siblings, 1 reply; 34+ messages in thread
From: Rich Felker @ 2020-01-10  2:07 UTC (permalink / raw)
  To: musl

On Fri, Jan 10, 2020 at 12:18:58AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2020-01-09 17:00:14 -0500]:
> > On Thu, Jan 09, 2020 at 10:00:06PM +0100, Szabolcs Nagy wrote:
> > > > Note that eval_as_float only helps if -ffloat-store is used, which is
> > > > a nasty hack and also nonconforming, arguably worse than the behavior
> > > > without it, so we should probably drop use of that as a fallback, and
> > > > use fp_barrier[f] instead if needed.
> > > 
> > > i think -ffloat-store almost always drops excess precision
> > > including returns and assignments, so with that no
> > > annotation is needed. but yes the way the annotation is
> > > defined now is not useful against broken compilers or
> > > non-standard excess precision setting, in glibc the
> > > annotation is defined differently (with inline asm).
> > 
> > I was thinking in the context of wanting to remove from configure the:
> > 
> > || { test "$ARCH" = i386 && tryflag CFLAGS_C99FSE -ffloat-store ; }
> > 
> > which is probably doing more harm than good. Do you know if there are
> > things that'd break if we did that? I think eval_as_float should
> > probably be defined as fp_barrierf to make it safe in your code,
> > conditional on FLT_EVAL_METHOD>0 (and likewise >1 for eval_as_double).
> 
> i think -fexcess-precision=standard was introduced in
> gcc 4.5 and to get reliable behaviour before that we
> needed -ffloat-store.

I don't think the behavior was "reliable" with -ffloat-store; it's
wrong with respect to the defined meaning of FLT_EVAL_METHOD.

> since we had -ffloat-store i turned off the volatile
> hacks in commit 6d3f1a39c14b12026df84f386875b094e3652990
> and later completely removed the annotations in commit
> 9b0fcb441a44456c7b071c7cdaf90403f81ec05a

Thanks for these references.

> on new compilers -fexcess-precision=standard is used,
> but that turned out to do too many stores on the fdlibm
> code (which is why glibc kept using =fast), so in commit
> e216951f509b71da193da2fc63e25b998740d58b i started using
> float_t and double_t to get fast code in standard mode.
> (of course this made things worse for -ffloat-store).

This was the right thing to do, and I think it largely but not
entirely eliminates the need for caring about how the compiler handles
this, except in a few cases. It could probably be eliminated in more.
For example the argument reduction code cited above could use the
right constants for double_t rather than double to avoid the need to
store/load to drop excess precision.

> i think we would need to add back the old annotations
> to make old compilers safe without -ffloat-store.
> (fdlibm often raises fenv exceptions via a final rounding
> before return, those could be often handled more cleanly
> by __math_oflow etc helpers, but since it was not designed
> for inline errno handling some normal return paths can
> raise fp exceptions too and thus need eval_as_* annotation).

I think I asked you before, but from a standpoint of fenv stuff, I'm
confused why the eval_as_* things are useful at all; it looks like you
would need fp_barrier* to ensure they're actually evaluated (e.g. in
the presence of LTO with a compiler that doesn't honor fenv right).

But I think it's also useful to distinguish between possibility of
wrong exceptions being raised, which is a rather minor issue since
some widely-used compilers don't even support fenv reasonably at at
all, and the possibility of wrong values being returned for functions
where the result is required to be correctly rounded. I would deem it
a serious problem for sqrt[f] or fma[f] to return the wrong value when
compiled with gcc3 or pcc. I don't think I would particularly care if
exceptions failed to be raised properly when compiled with gcc3 or
pcc, though. So I probably would like to ensure that, whatever code we
end up with in i386 sqrt[f].c, it it ends up working even if the
compiler does not handle excess precision correctly.

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move i386 sqrtf to C
  2020-01-10  2:07             ` Rich Felker
@ 2020-01-10  9:17               ` Szabolcs Nagy
  0 siblings, 0 replies; 34+ messages in thread
From: Szabolcs Nagy @ 2020-01-10  9:17 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-01-09 21:07:47 -0500]:
> On Fri, Jan 10, 2020 at 12:18:58AM +0100, Szabolcs Nagy wrote:
> > i think -fexcess-precision=standard was introduced in
> > gcc 4.5 and to get reliable behaviour before that we
> > needed -ffloat-store.
> 
> I don't think the behavior was "reliable" with -ffloat-store; it's
> wrong with respect to the defined meaning of FLT_EVAL_METHOD.

well reliable in the sense that the results are more
consistent with other targets that have FLT_EVAL_METHOD==0.

> > i think we would need to add back the old annotations
> > to make old compilers safe without -ffloat-store.
> > (fdlibm often raises fenv exceptions via a final rounding
> > before return, those could be often handled more cleanly
> > by __math_oflow etc helpers, but since it was not designed
> > for inline errno handling some normal return paths can
> > raise fp exceptions too and thus need eval_as_* annotation).
> 
> I think I asked you before, but from a standpoint of fenv stuff, I'm
> confused why the eval_as_* things are useful at all; it looks like you
> would need fp_barrier* to ensure they're actually evaluated (e.g. in
> the presence of LTO with a compiler that doesn't honor fenv right).

if you mean the current definition of eval_as_* in musl,
then those are not useful (except in the middle of some
arithmetic expression or assignment to double_t etc
where c99 normally would not drop excess precision)

their point is to annotate where we need to drop excess
precision so we can define it appropriately for specific
targets (and allow all other places to use excess prec).

e.g. in case of 'return 0x1p999*0x1p999' to raise overflow
and return inf, we need fp barrier to avoid const folds and
eval_as_double to drop excess precision so it becomes:

  return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

in principle fp_barrier need not drop excess precision so

  return fp_barrier(fp_barrier(0x1p999) * 0x1p999);

would not be enough, but i ended up using float and double
in fp_barrier instead of float_t and double_t so now it
drops excess precision too (may be a mistake, but we almost
always want to drop excess prec when forcing an evaluation).
separate eval_as_double is still useful since on most targets
it is a nop while fp_barrier is not a nop. (but e.g. i386
could define eval_as_double as fp_barrier)

the bigger problem with fp_barrier is that it just hides
a value (behind volatile or asm volatile) but it should be
forcing the mul operation, e.g. currently

  if (cond)
    return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

may be transformed to

  if (cond)
    x = fp_barrier(0x1p999);
  y = x * 0x1p999;
  if (cond)
    return eval_as_double(y);

by a compiler that assumes mul has no side-effects and
then the mul is evaluated unconditionally (but such
transformation is unlikely so in practice the barrier
works).

i don't think there is a trick that allows us to force
the evaluation of an individual fp op, so for now i'm
happy with fp_barrier and eval_as_*.

> 
> But I think it's also useful to distinguish between possibility of
> wrong exceptions being raised, which is a rather minor issue since
> some widely-used compilers don't even support fenv reasonably at at
> all, and the possibility of wrong values being returned for functions
> where the result is required to be correctly rounded. I would deem it
> a serious problem for sqrt[f] or fma[f] to return the wrong value when
> compiled with gcc3 or pcc. I don't think I would particularly care if
> exceptions failed to be raised properly when compiled with gcc3 or
> pcc, though. So I probably would like to ensure that, whatever code we
> end up with in i386 sqrt[f].c, it it ends up working even if the
> compiler does not handle excess precision correctly.

note that rounding down to double to force an overflow
usually also forces an infinity and without the force
the result would be >DBL_MAX finite which then can cause
problems, just like sqrt with excess prec.

so most eval_as_double has the same importance: if any
of them is missed the result may be wrong.

i have fp_force_eval which has no return value and is
only used to force side-effects, if you don't care about
fenv then that may be defined as nop.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] math: move x86_64 (l)lrint(f) functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (4 preceding siblings ...)
  2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
@ 2020-01-11 15:06 ` " Alexander Monakov
  2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-11 15:06 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

---
Some more trivial conversions.

 src/math/x86_64/llrint.c  | 8 ++++++++
 src/math/x86_64/llrint.s  | 5 -----
 src/math/x86_64/llrintf.c | 8 ++++++++
 src/math/x86_64/llrintf.s | 5 -----
 src/math/x86_64/lrint.c   | 8 ++++++++
 src/math/x86_64/lrint.s   | 5 -----
 src/math/x86_64/lrintf.c  | 8 ++++++++
 src/math/x86_64/lrintf.s  | 5 -----
 8 files changed, 32 insertions(+), 20 deletions(-)
 create mode 100644 src/math/x86_64/llrint.c
 delete mode 100644 src/math/x86_64/llrint.s
 create mode 100644 src/math/x86_64/llrintf.c
 delete mode 100644 src/math/x86_64/llrintf.s
 create mode 100644 src/math/x86_64/lrint.c
 delete mode 100644 src/math/x86_64/lrint.s
 create mode 100644 src/math/x86_64/lrintf.c
 delete mode 100644 src/math/x86_64/lrintf.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0006-math-move-x86_64-l-lrint-f-functions-to-C.patch --]
[-- Type: text/x-patch; name="0006-math-move-x86_64-l-lrint-f-functions-to-C.patch", Size: 2206 bytes --]

diff --git a/src/math/x86_64/llrint.c b/src/math/x86_64/llrint.c
new file mode 100644
index 00000000..dd38a722
--- /dev/null
+++ b/src/math/x86_64/llrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrint(double x)
+{
+	long long r;
+	__asm__ ("cvtsd2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/llrint.s b/src/math/x86_64/llrint.s
deleted file mode 100644
index bf476498..00000000
--- a/src/math/x86_64/llrint.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global llrint
-.type llrint,@function
-llrint:
-	cvtsd2si %xmm0,%rax
-	ret
diff --git a/src/math/x86_64/llrintf.c b/src/math/x86_64/llrintf.c
new file mode 100644
index 00000000..fc8625e8
--- /dev/null
+++ b/src/math/x86_64/llrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintf(float x)
+{
+	long long r;
+	__asm__ ("cvtss2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/llrintf.s b/src/math/x86_64/llrintf.s
deleted file mode 100644
index d7204ac0..00000000
--- a/src/math/x86_64/llrintf.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global llrintf
-.type llrintf,@function
-llrintf:
-	cvtss2si %xmm0,%rax
-	ret
diff --git a/src/math/x86_64/lrint.c b/src/math/x86_64/lrint.c
new file mode 100644
index 00000000..a742fec6
--- /dev/null
+++ b/src/math/x86_64/lrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrint(double x)
+{
+	long r;
+	__asm__ ("cvtsd2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/lrint.s b/src/math/x86_64/lrint.s
deleted file mode 100644
index 15fc2454..00000000
--- a/src/math/x86_64/lrint.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global lrint
-.type lrint,@function
-lrint:
-	cvtsd2si %xmm0,%rax
-	ret
diff --git a/src/math/x86_64/lrintf.c b/src/math/x86_64/lrintf.c
new file mode 100644
index 00000000..2ba5639d
--- /dev/null
+++ b/src/math/x86_64/lrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintf(float x)
+{
+	long r;
+	__asm__ ("cvtss2si %1, %0" : "=r"(r) : "x"(x));
+	return r;
+}
diff --git a/src/math/x86_64/lrintf.s b/src/math/x86_64/lrintf.s
deleted file mode 100644
index 488423d2..00000000
--- a/src/math/x86_64/lrintf.s
+++ /dev/null
@@ -1,5 +0,0 @@
-.global lrintf
-.type lrintf,@function
-lrintf:
-	cvtss2si %xmm0,%rax
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] math: move more x86-family lrint functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (5 preceding siblings ...)
  2020-01-11 15:06 ` [PATCH] math: move x86_64 (l)lrint(f) functions " Alexander Monakov
@ 2020-01-11 15:23 ` " Alexander Monakov
  2020-01-11 16:07   ` Rich Felker
  2020-01-14 11:54 ` [musl] [PATCH] math: move x86-family rint " Alexander Monakov
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-11 15:23 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]

---
It was news to me that GCC inline asm conventions allow to specify
effect on x87 register stack. Here a clobber tells the compiler
that the asm pops the topmost element off the stack.

 src/math/i386/llrint.c    | 8 ++++++++
 src/math/i386/llrint.s    | 8 --------
 src/math/i386/llrintf.c   | 8 ++++++++
 src/math/i386/llrintf.s   | 9 ---------
 src/math/i386/llrintl.c   | 8 ++++++++
 src/math/i386/llrintl.s   | 8 --------
 src/math/i386/lrint.c     | 8 ++++++++
 src/math/i386/lrint.s     | 7 -------
 src/math/i386/lrintf.c    | 8 ++++++++
 src/math/i386/lrintf.s    | 7 -------
 src/math/i386/lrintl.c    | 8 ++++++++
 src/math/i386/lrintl.s    | 7 -------
 src/math/x86_64/llrintl.c | 8 ++++++++
 src/math/x86_64/llrintl.s | 7 -------
 src/math/x86_64/lrintl.c  | 8 ++++++++
 src/math/x86_64/lrintl.s  | 7 -------
 16 files changed, 64 insertions(+), 60 deletions(-)
 create mode 100644 src/math/i386/llrint.c
 delete mode 100644 src/math/i386/llrint.s
 create mode 100644 src/math/i386/llrintf.c
 delete mode 100644 src/math/i386/llrintf.s
 create mode 100644 src/math/i386/llrintl.c
 delete mode 100644 src/math/i386/llrintl.s
 create mode 100644 src/math/i386/lrint.c
 delete mode 100644 src/math/i386/lrint.s
 create mode 100644 src/math/i386/lrintf.c
 delete mode 100644 src/math/i386/lrintf.s
 create mode 100644 src/math/i386/lrintl.c
 delete mode 100644 src/math/i386/lrintl.s
 create mode 100644 src/math/x86_64/llrintl.c
 delete mode 100644 src/math/x86_64/llrintl.s
 create mode 100644 src/math/x86_64/lrintl.c
 delete mode 100644 src/math/x86_64/lrintl.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0007-math-move-more-x86-family-lrint-functions-to-C.patch --]
[-- Type: text/x-patch; name="0007-math-move-more-x86-family-lrint-functions-to-C.patch", Size: 4690 bytes --]

diff --git a/src/math/i386/llrint.c b/src/math/i386/llrint.c
new file mode 100644
index 00000000..aa400817
--- /dev/null
+++ b/src/math/i386/llrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrint(double x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/llrint.s b/src/math/i386/llrint.s
deleted file mode 100644
index 8e89cd91..00000000
--- a/src/math/i386/llrint.s
+++ /dev/null
@@ -1,8 +0,0 @@
-.global llrint
-.type llrint,@function
-llrint:
-	fldl 4(%esp)
-	fistpll 4(%esp)
-	mov 4(%esp),%eax
-	mov 8(%esp),%edx
-	ret
diff --git a/src/math/i386/llrintf.c b/src/math/i386/llrintf.c
new file mode 100644
index 00000000..c41a317b
--- /dev/null
+++ b/src/math/i386/llrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintf(float x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/llrintf.s b/src/math/i386/llrintf.s
deleted file mode 100644
index aa850c6c..00000000
--- a/src/math/i386/llrintf.s
+++ /dev/null
@@ -1,9 +0,0 @@
-.global llrintf
-.type llrintf,@function
-llrintf:
-	sub $8,%esp
-	flds 12(%esp)
-	fistpll (%esp)
-	pop %eax
-	pop %edx
-	ret
diff --git a/src/math/i386/llrintl.c b/src/math/i386/llrintl.c
new file mode 100644
index 00000000..c439ef28
--- /dev/null
+++ b/src/math/i386/llrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintl(long double x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/llrintl.s b/src/math/i386/llrintl.s
deleted file mode 100644
index 1cfb56f1..00000000
--- a/src/math/i386/llrintl.s
+++ /dev/null
@@ -1,8 +0,0 @@
-.global llrintl
-.type llrintl,@function
-llrintl:
-	fldt 4(%esp)
-	fistpll 4(%esp)
-	mov 4(%esp),%eax
-	mov 8(%esp),%edx
-	ret
diff --git a/src/math/i386/lrint.c b/src/math/i386/lrint.c
new file mode 100644
index 00000000..f16b8a9c
--- /dev/null
+++ b/src/math/i386/lrint.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrint(double x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/lrint.s b/src/math/i386/lrint.s
deleted file mode 100644
index 02b83d9f..00000000
--- a/src/math/i386/lrint.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrint
-.type lrint,@function
-lrint:
-	fldl 4(%esp)
-	fistpl 4(%esp)
-	mov 4(%esp),%eax
-	ret
diff --git a/src/math/i386/lrintf.c b/src/math/i386/lrintf.c
new file mode 100644
index 00000000..932d353e
--- /dev/null
+++ b/src/math/i386/lrintf.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintf(float x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/lrintf.s b/src/math/i386/lrintf.s
deleted file mode 100644
index 907aac29..00000000
--- a/src/math/i386/lrintf.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrintf
-.type lrintf,@function
-lrintf:
-	flds 4(%esp)
-	fistpl 4(%esp)
-	mov 4(%esp),%eax
-	ret
diff --git a/src/math/i386/lrintl.c b/src/math/i386/lrintl.c
new file mode 100644
index 00000000..068e2e4d
--- /dev/null
+++ b/src/math/i386/lrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintl(long double x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/i386/lrintl.s b/src/math/i386/lrintl.s
deleted file mode 100644
index 3ae05aac..00000000
--- a/src/math/i386/lrintl.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrintl
-.type lrintl,@function
-lrintl:
-	fldt 4(%esp)
-	fistpl 4(%esp)
-	mov 4(%esp),%eax
-	ret
diff --git a/src/math/x86_64/llrintl.c b/src/math/x86_64/llrintl.c
new file mode 100644
index 00000000..c439ef28
--- /dev/null
+++ b/src/math/x86_64/llrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long long llrintl(long double x)
+{
+	long long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/x86_64/llrintl.s b/src/math/x86_64/llrintl.s
deleted file mode 100644
index 1ec0817d..00000000
--- a/src/math/x86_64/llrintl.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global llrintl
-.type llrintl,@function
-llrintl:
-	fldt 8(%rsp)
-	fistpll 8(%rsp)
-	mov 8(%rsp),%rax
-	ret
diff --git a/src/math/x86_64/lrintl.c b/src/math/x86_64/lrintl.c
new file mode 100644
index 00000000..068e2e4d
--- /dev/null
+++ b/src/math/x86_64/lrintl.c
@@ -0,0 +1,8 @@
+#include <math.h>
+
+long lrintl(long double x)
+{
+	long r;
+	__asm__ ("fistpll %0" : "=m"(r) : "t"(x) : "st");
+	return r;
+}
diff --git a/src/math/x86_64/lrintl.s b/src/math/x86_64/lrintl.s
deleted file mode 100644
index d587b12b..00000000
--- a/src/math/x86_64/lrintl.s
+++ /dev/null
@@ -1,7 +0,0 @@
-.global lrintl
-.type lrintl,@function
-lrintl:
-	fldt 8(%rsp)
-	fistpll 8(%rsp)
-	mov 8(%rsp),%rax
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move more x86-family lrint functions to C
  2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
@ 2020-01-11 16:07   ` Rich Felker
  2020-01-11 16:22     ` Rich Felker
  0 siblings, 1 reply; 34+ messages in thread
From: Rich Felker @ 2020-01-11 16:07 UTC (permalink / raw)
  To: musl

On Sat, Jan 11, 2020 at 06:23:54PM +0300, Alexander Monakov wrote:
> ---
> It was news to me that GCC inline asm conventions allow to specify
> effect on x87 register stack. Here a clobber tells the compiler
> that the asm pops the topmost element off the stack.

Is this documented/reliable/supported by other compilers (clang, pcc,
old gcc)? It looks very nice and I hope we can do it; otherwise it
looks like a gratuitous instruction would be needed. But I want to
make sure it's safe.

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] math: move more x86-family lrint functions to C
  2020-01-11 16:07   ` Rich Felker
@ 2020-01-11 16:22     ` Rich Felker
  0 siblings, 0 replies; 34+ messages in thread
From: Rich Felker @ 2020-01-11 16:22 UTC (permalink / raw)
  To: musl

On Sat, Jan 11, 2020 at 11:07:01AM -0500, Rich Felker wrote:
> On Sat, Jan 11, 2020 at 06:23:54PM +0300, Alexander Monakov wrote:
> > ---
> > It was news to me that GCC inline asm conventions allow to specify
> > effect on x87 register stack. Here a clobber tells the compiler
> > that the asm pops the topmost element off the stack.
> 
> Is this documented/reliable/supported by other compilers (clang, pcc,
> old gcc)? It looks very nice and I hope we can do it; otherwise it
> looks like a gratuitous instruction would be needed. But I want to
> make sure it's safe.

OK, I see it documented at the end of the page here:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html under 6.47.2.9
x86 Floating-Point asm Operands. So if other compilers don't honor
this they're not implementing it as documented.

I noticed also there (bullet point 5) that, if inline asm pushes
anything to x87 stack, it needs appropriate clobbers so the compiler
can ensure the stack is not full on entry. This would be easy to miss
because the stack will be empty in standalone functions, and it would
only come up if they get inlined via LTO.

Rich


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [musl] [PATCH] math: move x86-family rint functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (6 preceding siblings ...)
  2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
@ 2020-01-14 11:54 ` " Alexander Monakov
  2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-14 11:54 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

---
One more batch of trivial rewrites.

 src/math/i386/rint.c    | 7 +++++++
 src/math/i386/rint.s    | 6 ------
 src/math/i386/rintf.c   | 7 +++++++
 src/math/i386/rintf.s   | 6 ------
 src/math/i386/rintl.c   | 7 +++++++
 src/math/i386/rintl.s   | 6 ------
 src/math/x86_64/rintl.c | 7 +++++++
 src/math/x86_64/rintl.s | 6 ------
 8 files changed, 28 insertions(+), 24 deletions(-)
 create mode 100644 src/math/i386/rint.c
 delete mode 100644 src/math/i386/rint.s
 create mode 100644 src/math/i386/rintf.c
 delete mode 100644 src/math/i386/rintf.s
 create mode 100644 src/math/i386/rintl.c
 delete mode 100644 src/math/i386/rintl.s
 create mode 100644 src/math/x86_64/rintl.c
 delete mode 100644 src/math/x86_64/rintl.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0008-math-move-x86-family-rint-functions-to-C.patch --]
[-- Type: text/x-patch; name="0008-math-move-x86-family-rint-functions-to-C.patch", Size: 2032 bytes --]

diff --git a/src/math/i386/rint.c b/src/math/i386/rint.c
new file mode 100644
index 00000000..a5276a60
--- /dev/null
+++ b/src/math/i386/rint.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double rint(double x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/rint.s b/src/math/i386/rint.s
deleted file mode 100644
index bb99a11c..00000000
--- a/src/math/i386/rint.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rint
-.type rint,@function
-rint:
-	fldl 4(%esp)
-	frndint
-	ret
diff --git a/src/math/i386/rintf.c b/src/math/i386/rintf.c
new file mode 100644
index 00000000..bb4121a4
--- /dev/null
+++ b/src/math/i386/rintf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float rintf(float x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/rintf.s b/src/math/i386/rintf.s
deleted file mode 100644
index bce4c5a6..00000000
--- a/src/math/i386/rintf.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rintf
-.type rintf,@function
-rintf:
-	flds 4(%esp)
-	frndint
-	ret
diff --git a/src/math/i386/rintl.c b/src/math/i386/rintl.c
new file mode 100644
index 00000000..e1a92077
--- /dev/null
+++ b/src/math/i386/rintl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double rintl(long double x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/i386/rintl.s b/src/math/i386/rintl.s
deleted file mode 100644
index cd2bf9a9..00000000
--- a/src/math/i386/rintl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rintl
-.type rintl,@function
-rintl:
-	fldt 4(%esp)
-	frndint
-	ret
diff --git a/src/math/x86_64/rintl.c b/src/math/x86_64/rintl.c
new file mode 100644
index 00000000..e1a92077
--- /dev/null
+++ b/src/math/x86_64/rintl.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+long double rintl(long double x)
+{
+	__asm__ ("frndint" : "+t"(x));
+	return x;
+}
diff --git a/src/math/x86_64/rintl.s b/src/math/x86_64/rintl.s
deleted file mode 100644
index 64e663cd..00000000
--- a/src/math/x86_64/rintl.s
+++ /dev/null
@@ -1,6 +0,0 @@
-.global rintl
-.type rintl,@function
-rintl:
-	fldt 8(%rsp)
-	frndint
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [musl] [PATCH] math: move i386 sqrtf to C
  2020-01-09 21:00       ` Szabolcs Nagy
  2020-01-09 22:00         ` Rich Felker
@ 2020-01-14 17:59         ` " Alexander Monakov
  2020-01-14 18:47           ` Szabolcs Nagy
  1 sibling, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-14 17:59 UTC (permalink / raw)
  To: musl

On Thu, 9 Jan 2020, Szabolcs Nagy wrote:

> c99 required cast at return, but that was considered to be
> a bug so in c11 it's no longer required: return now must
> round

This is the interpretation that GCC uses (return statements
imply removal of excess precision), but I wonder where the
standard actually says that: afaics N1570 still has the same
footnote in 6.8.6.4 that ends with "A cast may be used to
remove this extra range and precision."

Alexander

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [musl] Q: dealing with missing removal of excess precision
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (7 preceding siblings ...)
  2020-01-14 11:54 ` [musl] [PATCH] math: move x86-family rint " Alexander Monakov
@ 2020-01-14 18:17 ` Alexander Monakov
  2020-01-14 18:50   ` Szabolcs Nagy
  2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-14 18:17 UTC (permalink / raw)
  To: musl


What to do with functions that return values with excess precision from asm?

Many of the remaining x87-based asm implementations do not remove excess
precision on return. This looks like a bug, but matches Glibc behavior.
Moving such functions to C would either introduce removal of excess precision,
or result in code that lies to the compiler by pretending that inline asm keeps
values within precision of float/double, where in reality it might not.

Alexander

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [musl] [PATCH] math: move i386 sqrtf to C
  2020-01-14 17:59         ` [musl] " Alexander Monakov
@ 2020-01-14 18:47           ` Szabolcs Nagy
  0 siblings, 0 replies; 34+ messages in thread
From: Szabolcs Nagy @ 2020-01-14 18:47 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2020-01-14 20:59:21 +0300]:
> On Thu, 9 Jan 2020, Szabolcs Nagy wrote:
> 
> > c99 required cast at return, but that was considered to be
> > a bug so in c11 it's no longer required: return now must
> > round
> 
> This is the interpretation that GCC uses (return statements
> imply removal of excess precision), but I wonder where the
> standard actually says that: afaics N1570 still has the same
> footnote in 6.8.6.4 that ends with "A cast may be used to
> remove this extra range and precision."

it's in annex f
http://port70.net/~nsz/c/c11/n1570.html#F.6

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
@ 2020-01-14 18:50   ` Szabolcs Nagy
  2020-01-14 18:58     ` Rich Felker
  0 siblings, 1 reply; 34+ messages in thread
From: Szabolcs Nagy @ 2020-01-14 18:50 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2020-01-14 21:17:55 +0300]:
> What to do with functions that return values with excess precision from asm?
> 
> Many of the remaining x87-based asm implementations do not remove excess
> precision on return. This looks like a bug, but matches Glibc behavior.
> Moving such functions to C would either introduce removal of excess precision,
> or result in code that lies to the compiler by pretending that inline asm keeps
> values within precision of float/double, where in reality it might not.

the intention is to remove excess precision so it's good that's happening.

otoh it would be nice if there was a way to tell the compiler not to
remove it (e.g. in case the asm already took care of it) even in c99
standard mode.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-01-14 18:50   ` Szabolcs Nagy
@ 2020-01-14 18:58     ` Rich Felker
  2020-01-14 19:53       ` Alexander Monakov
  0 siblings, 1 reply; 34+ messages in thread
From: Rich Felker @ 2020-01-14 18:58 UTC (permalink / raw)
  To: musl

On Tue, Jan 14, 2020 at 07:50:58PM +0100, Szabolcs Nagy wrote:
> * Alexander Monakov <amonakov@ispras.ru> [2020-01-14 21:17:55 +0300]:
> > What to do with functions that return values with excess precision from asm?
> > 
> > Many of the remaining x87-based asm implementations do not remove excess
> > precision on return. This looks like a bug, but matches Glibc behavior.
> > Moving such functions to C would either introduce removal of excess precision,
> > or result in code that lies to the compiler by pretending that inline asm keeps
> > values within precision of float/double, where in reality it might not.
> 
> the intention is to remove excess precision so it's good that's happening.

Agreed. It may be a slight performance regression but it's the right
thing to do anyway.

> otoh it would be nice if there was a way to tell the compiler not to
> remove it (e.g. in case the asm already took care of it) even in c99
> standard mode.

Perhaps this happens if the output constraint is tied to a float
rather than a long double?

Rich

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [musl] Q: dealing with missing removal of excess precision
  2020-01-14 18:58     ` Rich Felker
@ 2020-01-14 19:53       ` Alexander Monakov
  0 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-14 19:53 UTC (permalink / raw)
  To: musl

On Tue, 14 Jan 2020, Rich Felker wrote:

> > otoh it would be nice if there was a way to tell the compiler not to
> > remove it (e.g. in case the asm already took care of it) even in c99
> > standard mode.
> 
> Perhaps this happens if the output constraint is tied to a float
> rather than a long double?

Precisely. That's what previously posted patches do, and they match
existing hand-tuned assembly.

I misspoke when saying that Glibc might return a value with excess precision.
I was looking at fmod-like functions and missed a slightly subtle point that
fprem does not introduce excess precision. So I don't actually have any
example where Glibc might misbehave in that regard.

Alexander

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [musl] [PATCH] math: move x86-family remainder functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (8 preceding siblings ...)
  2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
@ 2020-01-14 20:41 ` Alexander Monakov
  2020-01-15  6:54   ` Szabolcs Nagy
  2020-01-15 15:44 ` [musl] [PATCH] math: move x86-family fmod " Alexander Monakov
  2020-01-16 21:00 ` [musl] [PATCH] math: add x86_64 remquol Alexander Monakov
  11 siblings, 1 reply; 34+ messages in thread
From: Alexander Monakov @ 2020-01-14 20:41 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

---
For an unclear reason commit 5d01ab4ac64b913c537e91f7c01d5c8e910151da
did not add long double aliases dreml and finitel. This patch
preserves the oddity.

 src/math/i386/remainder.c    | 12 ++++++++++++
 src/math/i386/remainder.s    | 14 --------------
 src/math/i386/remainderf.c   | 12 ++++++++++++
 src/math/i386/remainderf.s   | 14 --------------
 src/math/i386/remainderl.c   |  9 +++++++++
 src/math/i386/remainderl.s   | 11 -----------
 src/math/x86_64/remainderl.c |  9 +++++++++
 src/math/x86_64/remainderl.s | 11 -----------
 8 files changed, 42 insertions(+), 50 deletions(-)
 create mode 100644 src/math/i386/remainder.c
 delete mode 100644 src/math/i386/remainder.s
 create mode 100644 src/math/i386/remainderf.c
 delete mode 100644 src/math/i386/remainderf.s
 create mode 100644 src/math/i386/remainderl.c
 delete mode 100644 src/math/i386/remainderl.s
 create mode 100644 src/math/x86_64/remainderl.c
 delete mode 100644 src/math/x86_64/remainderl.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0009-math-move-x86-family-remainder-functions-to-C.patch --]
[-- Type: text/x-patch; name="0009-math-move-x86-family-remainder-functions-to-C.patch", Size: 3124 bytes --]

diff --git a/src/math/i386/remainder.c b/src/math/i386/remainder.c
new file mode 100644
index 00000000..c083df90
--- /dev/null
+++ b/src/math/i386/remainder.c
@@ -0,0 +1,12 @@
+#include <math.h>
+
+double remainder(double x, double y)
+{
+	unsigned short fpsr;
+	// fprem1 does not introduce excess precision into x
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
+
+weak_alias(remainder, drem);
diff --git a/src/math/i386/remainder.s b/src/math/i386/remainder.s
deleted file mode 100644
index ab1da95d..00000000
--- a/src/math/i386/remainder.s
+++ /dev/null
@@ -1,14 +0,0 @@
-.global remainder
-.type remainder,@function
-remainder:
-.weak drem
-.type drem,@function
-drem:
-	fldl 12(%esp)
-	fldl 4(%esp)
-1:	fprem1
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/remainderf.c b/src/math/i386/remainderf.c
new file mode 100644
index 00000000..280207d2
--- /dev/null
+++ b/src/math/i386/remainderf.c
@@ -0,0 +1,12 @@
+#include <math.h>
+
+float remainderf(float x, float y)
+{
+	unsigned short fpsr;
+	// fprem1 does not introduce excess precision into x
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
+
+weak_alias(remainderf, dremf);
diff --git a/src/math/i386/remainderf.s b/src/math/i386/remainderf.s
deleted file mode 100644
index 6a7378a3..00000000
--- a/src/math/i386/remainderf.s
+++ /dev/null
@@ -1,14 +0,0 @@
-.global remainderf
-.type remainderf,@function
-remainderf:
-.weak dremf
-.type dremf,@function
-dremf:
-	flds 8(%esp)
-	flds 4(%esp)
-1:	fprem1
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/remainderl.c b/src/math/i386/remainderl.c
new file mode 100644
index 00000000..8cf75071
--- /dev/null
+++ b/src/math/i386/remainderl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double remainderl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/remainderl.s b/src/math/i386/remainderl.s
deleted file mode 100644
index b41518ed..00000000
--- a/src/math/i386/remainderl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global remainderl
-.type remainderl,@function
-remainderl:
-	fldt 16(%esp)
-	fldt 4(%esp)
-1:	fprem1
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/x86_64/remainderl.c b/src/math/x86_64/remainderl.c
new file mode 100644
index 00000000..8cf75071
--- /dev/null
+++ b/src/math/x86_64/remainderl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double remainderl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/x86_64/remainderl.s b/src/math/x86_64/remainderl.s
deleted file mode 100644
index cb3857b4..00000000
--- a/src/math/x86_64/remainderl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global remainderl
-.type remainderl,@function
-remainderl:
-	fldt 24(%rsp)
-	fldt 8(%rsp)
-1:	fprem1
-	fnstsw %ax
-	testb $4,%ah
-	jnz 1b
-	fstp %st(1)
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [musl] [PATCH] math: move x86-family remainder functions to C
  2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
@ 2020-01-15  6:54   ` Szabolcs Nagy
  0 siblings, 0 replies; 34+ messages in thread
From: Szabolcs Nagy @ 2020-01-15  6:54 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2020-01-14 23:41:19 +0300]:
> For an unclear reason commit 5d01ab4ac64b913c537e91f7c01d5c8e910151da
> did not add long double aliases dreml and finitel. This patch
> preserves the oddity.

see math.h

it does not declare dreml and finitel, those
are gnu only extensions, not present in bsds.
(they were obsolete before long double appeared)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [musl] [PATCH] math: move x86-family fmod functions to C
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (9 preceding siblings ...)
  2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
@ 2020-01-15 15:44 ` " Alexander Monakov
  2020-01-16 21:00 ` [musl] [PATCH] math: add x86_64 remquol Alexander Monakov
  11 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-15 15:44 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 806 bytes --]

---
Exactly like remainder functions, but with fprem instruction instead of fprem1.

 src/math/i386/fmod.c    | 10 ++++++++++
 src/math/i386/fmod.s    | 11 -----------
 src/math/i386/fmodf.c   | 10 ++++++++++
 src/math/i386/fmodf.s   | 11 -----------
 src/math/i386/fmodl.c   |  9 +++++++++
 src/math/i386/fmodl.s   | 11 -----------
 src/math/x86_64/fmodl.c |  9 +++++++++
 src/math/x86_64/fmodl.s | 11 -----------
 8 files changed, 38 insertions(+), 44 deletions(-)
 create mode 100644 src/math/i386/fmod.c
 delete mode 100644 src/math/i386/fmod.s
 create mode 100644 src/math/i386/fmodf.c
 delete mode 100644 src/math/i386/fmodf.s
 create mode 100644 src/math/i386/fmodl.c
 delete mode 100644 src/math/i386/fmodl.s
 create mode 100644 src/math/x86_64/fmodl.c
 delete mode 100644 src/math/x86_64/fmodl.s


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0010-math-move-x86-family-fmod-functions-to-C.patch --]
[-- Type: text/x-patch; name="0010-math-move-x86-family-fmod-functions-to-C.patch", Size: 2763 bytes --]

diff --git a/src/math/i386/fmod.c b/src/math/i386/fmod.c
new file mode 100644
index 00000000..ea0c58d9
--- /dev/null
+++ b/src/math/i386/fmod.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+double fmod(double x, double y)
+{
+	unsigned short fpsr;
+	// fprem does not introduce excess precision into x
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/fmod.s b/src/math/i386/fmod.s
deleted file mode 100644
index 2113b3c5..00000000
--- a/src/math/i386/fmod.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmod
-.type fmod,@function
-fmod:
-	fldl 12(%esp)
-	fldl 4(%esp)
-1:	fprem
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/fmodf.c b/src/math/i386/fmodf.c
new file mode 100644
index 00000000..90b56ab0
--- /dev/null
+++ b/src/math/i386/fmodf.c
@@ -0,0 +1,10 @@
+#include <math.h>
+
+float fmodf(float x, float y)
+{
+	unsigned short fpsr;
+	// fprem does not introduce excess precision into x
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/fmodf.s b/src/math/i386/fmodf.s
deleted file mode 100644
index e04e2a56..00000000
--- a/src/math/i386/fmodf.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmodf
-.type fmodf,@function
-fmodf:
-	flds 8(%esp)
-	flds 4(%esp)
-1:	fprem
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/i386/fmodl.c b/src/math/i386/fmodl.c
new file mode 100644
index 00000000..3daeab06
--- /dev/null
+++ b/src/math/i386/fmodl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double fmodl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/i386/fmodl.s b/src/math/i386/fmodl.s
deleted file mode 100644
index 0cb3fe9b..00000000
--- a/src/math/i386/fmodl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmodl
-.type fmodl,@function
-fmodl:
-	fldt 16(%esp)
-	fldt 4(%esp)
-1:	fprem
-	fnstsw %ax
-	sahf
-	jp 1b
-	fstp %st(1)
-	ret
diff --git a/src/math/x86_64/fmodl.c b/src/math/x86_64/fmodl.c
new file mode 100644
index 00000000..3daeab06
--- /dev/null
+++ b/src/math/x86_64/fmodl.c
@@ -0,0 +1,9 @@
+#include <math.h>
+
+long double fmodl(long double x, long double y)
+{
+	unsigned short fpsr;
+	do __asm__ ("fprem; fnstsw %%ax" : "+t"(x), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	return x;
+}
diff --git a/src/math/x86_64/fmodl.s b/src/math/x86_64/fmodl.s
deleted file mode 100644
index ea07b402..00000000
--- a/src/math/x86_64/fmodl.s
+++ /dev/null
@@ -1,11 +0,0 @@
-.global fmodl
-.type fmodl,@function
-fmodl:
-	fldt 24(%rsp)
-	fldt 8(%rsp)
-1:	fprem
-	fnstsw %ax
-	testb $4,%ah
-	jnz 1b
-	fstp %st(1)
-	ret

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [musl] [PATCH] math: add x86_64 remquol
  2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
                   ` (10 preceding siblings ...)
  2020-01-15 15:44 ` [musl] [PATCH] math: move x86-family fmod " Alexander Monakov
@ 2020-01-16 21:00 ` Alexander Monakov
  11 siblings, 0 replies; 34+ messages in thread
From: Alexander Monakov @ 2020-01-16 21:00 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 319 bytes --]

---
So proud of this one <3

(this is not a rewrite as x86_64 remquol.s does not exist, but that
looks unintentional and i386 remquo versions need similar rewrites anyway)

 src/math/x86_64/remquol.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100644 src/math/x86_64/remquol.c


[-- Warning: decoded text below may be mangled --]
[-- Attachment #2: 0011-math-add-x86_64-remquol.patch --]
[-- Type: text/x-patch; name="0011-math-add-x86_64-remquol.patch", Size: 1358 bytes --]

diff --git a/src/math/x86_64/remquol.c b/src/math/x86_64/remquol.c
new file mode 100644
index 00000000..60eef089
--- /dev/null
+++ b/src/math/x86_64/remquol.c
@@ -0,0 +1,32 @@
+#include <math.h>
+
+long double remquol(long double x, long double y, int *quo)
+{
+	signed char *cx = (void *)&x, *cy = (void *)&y;
+	/* By ensuring that addresses of x and y cannot be discarded,
+	 * this empty asm guides GCC into representing extraction of
+	 * their sign bits as memory loads rather than making x and y
+	 * not-address-taken internally and using bitfield operations,
+	 * which in the end wouldn't work out, as extraction from FPU
+	 * registers needs to go through memory anyway. This way GCC
+	 * should manage to use incoming stack slots without spills. */
+	__asm__ ("" :: "X"(cx), "X"(cy));
+
+	long double t = x;
+	unsigned fpsr;
+	do __asm__ ("fprem1; fnstsw %%ax" : "+t"(t), "=a"(fpsr) : "u"(y));
+	while (fpsr & 0x400);
+	/* C0, C1, C3 flags in x87 status word carry low bits of quotient:
+	 * 15 14 13 12 11 10  9  8
+	 *  . C3  .  .  . C2 C1 C0
+	 *  . b1  .  .  .  0 b0 b2 */
+	unsigned char i = fpsr >> 8;
+	i = i>>4 | i<<4;
+	/* i[5:2] is now {b0 b2 ? b1}. Retrieve {0 b2 b1 b0} via
+	 * in-register table lookup. */
+	unsigned qbits = 0x7575313164642020 >> (i & 60);
+	qbits &= 7;
+
+	*quo = (cx[9]^cy[9]) < 0 ? -qbits : qbits;
+	return t;
+}

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, back to index

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-05 16:35 math patches for moving bare asm to C inline asm Alexander Monakov
2020-01-05 16:36 ` [PATCH] math: move x86_64 fabs, fabsf to C with " Alexander Monakov
2020-01-05 20:05   ` Rich Felker
2020-01-05 21:32     ` Alexander Monakov
2020-01-05 22:43       ` Rich Felker
2020-01-06  8:17         ` Alexander Monakov
2020-01-06  8:40 ` [PATCH] math: move more x86-family fabs functions to C Alexander Monakov
2020-01-06 16:50 ` [PATCH] math: move trivial x86-family sqrt " Alexander Monakov
2020-01-06 17:43 ` [PATCH] math: move i386 sqrtf " Alexander Monakov
2020-01-06 18:32   ` Pascal Cuoq
2020-01-09 15:55   ` Alexander Monakov
2020-01-09 17:00     ` Rich Felker
2020-01-09 21:00       ` Szabolcs Nagy
2020-01-09 22:00         ` Rich Felker
2020-01-09 23:18           ` Szabolcs Nagy
2020-01-10  2:07             ` Rich Felker
2020-01-10  9:17               ` Szabolcs Nagy
2020-01-14 17:59         ` [musl] " Alexander Monakov
2020-01-14 18:47           ` Szabolcs Nagy
2020-01-07 13:06 ` [PATCH] math: move i386 sqrt " Alexander Monakov
2020-01-08  7:26   ` Rich Felker
2020-01-11 15:06 ` [PATCH] math: move x86_64 (l)lrint(f) functions " Alexander Monakov
2020-01-11 15:23 ` [PATCH] math: move more x86-family lrint " Alexander Monakov
2020-01-11 16:07   ` Rich Felker
2020-01-11 16:22     ` Rich Felker
2020-01-14 11:54 ` [musl] [PATCH] math: move x86-family rint " Alexander Monakov
2020-01-14 18:17 ` [musl] Q: dealing with missing removal of excess precision Alexander Monakov
2020-01-14 18:50   ` Szabolcs Nagy
2020-01-14 18:58     ` Rich Felker
2020-01-14 19:53       ` Alexander Monakov
2020-01-14 20:41 ` [musl] [PATCH] math: move x86-family remainder functions to C Alexander Monakov
2020-01-15  6:54   ` Szabolcs Nagy
2020-01-15 15:44 ` [musl] [PATCH] math: move x86-family fmod " Alexander Monakov
2020-01-16 21:00 ` [musl] [PATCH] math: add x86_64 remquol Alexander Monakov

mailing list of musl libc

Archives are clonable: git clone --mirror http://inbox.vuxu.org/musl

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.musl


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git