[PATCH 0/5] add FP_FAST

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [PATCH 0/5] add FP_FAST_FMA to math.h
@ 2018-09-23 15:09 Szabolcs Nagy
  2018-09-23 15:11 ` Szabolcs Nagy
  2018-09-26 20:52 ` Szabolcs Nagy
  0 siblings, 2 replies; 3+ messages in thread
From: Szabolcs Nagy @ 2018-09-23 15:09 UTC (permalink / raw)
  To: musl

lightly tested, generated code for new fma inline asm is the same as
with __builtin_fma().

doing runtime dispatch for fma on x86 is tricky (requires cpuid
handling), on arm it's enough to check the VFPv4 hwcap.

on other targets fma is part of the base isa or not available at all.

(for various math code it's important to have FP_FAST_FMA
defined: when hw fma allows much more efficent implementation.
some such code use __FP_FAST_FMA of gcc with __builtin_fma,
but that does not work with clang, bionic does not seem to
have correct setting for FP_FAST_FMA at all, there are other
problematic platforms, but i think musl should support it.

it's annoying that there is no way to tell using preprocessing
macros which standard functions are treated as builtins by the
compiler and then which builtins may get inlined.  users will
invent their own fast fma ifdef hacks which is bad.)

Szabolcs Nagy (5):
  s390x: add single instruction fma and fmaf
  powerpc: add single instruction fabs, fabsf, fma, fmaf, sqrt, sqrtf
  arm: add single instruction fma
  x86_64: add single instruction fma
  define FP_FAST_FMA and FP_FAST_FMAF when fma and fmaf can be inlined

 arch/aarch64/bits/math.h   |  2 ++
 arch/arm/bits/math.h       |  4 ++++
 arch/generic/bits/math.h   |  0
 arch/powerpc/bits/math.h   |  4 ++++
 arch/powerpc64/bits/math.h |  2 ++
 arch/s390x/bits/math.h     |  2 ++
 arch/x32/bits/math.h       |  4 ++++
 arch/x86_64/bits/math.h    |  4 ++++
 include/math.h             |  2 ++
 src/math/arm/fma.c         | 15 +++++++++++++++
 src/math/arm/fmaf.c        | 15 +++++++++++++++
 src/math/powerpc/fabs.c    | 15 +++++++++++++++
 src/math/powerpc/fabsf.c   | 15 +++++++++++++++
 src/math/powerpc/fma.c     | 15 +++++++++++++++
 src/math/powerpc/fmaf.c    | 15 +++++++++++++++
 src/math/powerpc/sqrt.c    | 15 +++++++++++++++
 src/math/powerpc/sqrtf.c   | 15 +++++++++++++++
 src/math/s390x/fma.c       |  7 +++++++
 src/math/s390x/fmaf.c      |  7 +++++++
 src/math/x32/fma.c         | 23 +++++++++++++++++++++++
 src/math/x32/fmaf.c        | 23 +++++++++++++++++++++++
 src/math/x86_64/fma.c      | 23 +++++++++++++++++++++++
 src/math/x86_64/fmaf.c     | 23 +++++++++++++++++++++++
 23 files changed, 250 insertions(+)
 create mode 100644 arch/aarch64/bits/math.h
 create mode 100644 arch/arm/bits/math.h
 create mode 100644 arch/generic/bits/math.h
 create mode 100644 arch/powerpc/bits/math.h
 create mode 100644 arch/powerpc64/bits/math.h
 create mode 100644 arch/s390x/bits/math.h
 create mode 100644 arch/x32/bits/math.h
 create mode 100644 arch/x86_64/bits/math.h
 create mode 100644 src/math/arm/fma.c
 create mode 100644 src/math/arm/fmaf.c
 create mode 100644 src/math/powerpc/fabs.c
 create mode 100644 src/math/powerpc/fabsf.c
 create mode 100644 src/math/powerpc/fma.c
 create mode 100644 src/math/powerpc/fmaf.c
 create mode 100644 src/math/powerpc/sqrt.c
 create mode 100644 src/math/powerpc/sqrtf.c
 create mode 100644 src/math/s390x/fma.c
 create mode 100644 src/math/s390x/fmaf.c
 create mode 100644 src/math/x32/fma.c
 create mode 100644 src/math/x32/fmaf.c
 create mode 100644 src/math/x86_64/fma.c
 create mode 100644 src/math/x86_64/fmaf.c

-- 
2.18.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/5] add FP_FAST_FMA to math.h
  2018-09-23 15:09 [PATCH 0/5] add FP_FAST_FMA to math.h Szabolcs Nagy
@ 2018-09-23 15:11 ` Szabolcs Nagy
  2018-09-26 20:52 ` Szabolcs Nagy
  1 sibling, 0 replies; 3+ messages in thread
From: Szabolcs Nagy @ 2018-09-23 15:11 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 30 bytes --]

i meant to attach the patches

[-- Attachment #2: 0001-s390x-add-single-instruction-fma-and-fmaf.patch --]
[-- Type: text/x-diff, Size: 1075 bytes --]

From 58cff58a39fe21d0ad55b572670b2ece0ed6c00e Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Thu, 13 Sep 2018 22:35:13 +0000
Subject: [PATCH 1/5] s390x: add single instruction fma and fmaf

These are available in the s390x baseline isa -march=z900.
---
 src/math/s390x/fma.c  | 7 +++++++
 src/math/s390x/fmaf.c | 7 +++++++
 2 files changed, 14 insertions(+)
 create mode 100644 src/math/s390x/fma.c
 create mode 100644 src/math/s390x/fmaf.c

diff --git a/src/math/s390x/fma.c b/src/math/s390x/fma.c
new file mode 100644
index 00000000..86da0e49
--- /dev/null
+++ b/src/math/s390x/fma.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("madbr %0, %1, %2" : "+f"(z) : "f"(x), "f"(y));
+	return z;
+}
diff --git a/src/math/s390x/fmaf.c b/src/math/s390x/fmaf.c
new file mode 100644
index 00000000..f1aec6ad
--- /dev/null
+++ b/src/math/s390x/fmaf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("maebr %0, %1, %2" : "+f"(z) : "f"(x), "f"(y));
+	return z;
+}
-- 
2.18.0


[-- Attachment #3: 0002-powerpc-add-single-instruction-fabs-fabsf-fma-fmaf-s.patch --]
[-- Type: text/x-diff, Size: 3164 bytes --]

From 6adb9c5683f8e9846223d6c062bfcfa1f21a85fe Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Thu, 20 Sep 2018 23:14:11 +0000
Subject: [PATCH 2/5] powerpc: add single instruction fabs, fabsf, fma, fmaf,
 sqrt, sqrtf

These are only available on hard float target and sqrt is not available
in the base ISA, so further check is used.
---
 src/math/powerpc/fabs.c  | 15 +++++++++++++++
 src/math/powerpc/fabsf.c | 15 +++++++++++++++
 src/math/powerpc/fma.c   | 15 +++++++++++++++
 src/math/powerpc/fmaf.c  | 15 +++++++++++++++
 src/math/powerpc/sqrt.c  | 15 +++++++++++++++
 src/math/powerpc/sqrtf.c | 15 +++++++++++++++
 6 files changed, 90 insertions(+)
 create mode 100644 src/math/powerpc/fabs.c
 create mode 100644 src/math/powerpc/fabsf.c
 create mode 100644 src/math/powerpc/fma.c
 create mode 100644 src/math/powerpc/fmaf.c
 create mode 100644 src/math/powerpc/sqrt.c
 create mode 100644 src/math/powerpc/sqrtf.c

diff --git a/src/math/powerpc/fabs.c b/src/math/powerpc/fabs.c
new file mode 100644
index 00000000..f6ec4433
--- /dev/null
+++ b/src/math/powerpc/fabs.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fabs.c"
+
+#else
+
+double fabs(double x)
+{
+	__asm__ ("fabs %0, %1" : "=d"(x) : "d"(x));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/fabsf.c b/src/math/powerpc/fabsf.c
new file mode 100644
index 00000000..d88b5911
--- /dev/null
+++ b/src/math/powerpc/fabsf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fabsf.c"
+
+#else
+
+float fabsf(float x)
+{
+	__asm__ ("fabs %0, %1" : "=f"(x) : "f"(x));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/fma.c b/src/math/powerpc/fma.c
new file mode 100644
index 00000000..fd268f5f
--- /dev/null
+++ b/src/math/powerpc/fma.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fma.c"
+
+#else
+
+double fma(double x, double y, double z)
+{
+	__asm__("fmadd %0, %1, %2, %3" : "=d"(x) : "d"(x), "d"(y), "d"(z));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/fmaf.c b/src/math/powerpc/fmaf.c
new file mode 100644
index 00000000..a99a2a3b
--- /dev/null
+++ b/src/math/powerpc/fmaf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fmaf.c"
+
+#else
+
+float fmaf(float x, float y, float z)
+{
+	__asm__("fmadds %0, %1, %2, %3" : "=f"(x) : "f"(x), "f"(y), "f"(z));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/sqrt.c b/src/math/powerpc/sqrt.c
new file mode 100644
index 00000000..8718dbd0
--- /dev/null
+++ b/src/math/powerpc/sqrt.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if !defined _SOFT_FLOAT && defined _ARCH_PPCSQ
+
+double sqrt(double x)
+{
+	__asm__ ("fsqrt %0, %1\n" : "=d" (x) : "d" (x));
+	return x;
+}
+
+#else
+
+#include "../sqrt.c"
+
+#endif
diff --git a/src/math/powerpc/sqrtf.c b/src/math/powerpc/sqrtf.c
new file mode 100644
index 00000000..3431b672
--- /dev/null
+++ b/src/math/powerpc/sqrtf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if !defined _SOFT_FLOAT && defined _ARCH_PPCSQ
+
+float sqrtf(float x)
+{
+	__asm__ ("fsqrts %0, %1\n" : "=f" (x) : "f" (x));
+	return x;
+}
+
+#else
+
+#include "../sqrtf.c"
+
+#endif
-- 
2.18.0


[-- Attachment #4: 0003-arm-add-single-instruction-fma.patch --]
[-- Type: text/x-diff, Size: 1494 bytes --]

From 693d5d07dba7a368979c69bf30674a9a272880ab Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sat, 22 Sep 2018 18:47:27 +0000
Subject: [PATCH 3/5] arm: add single instruction fma

vfma is available in vfpv4, feature test is __ARM_FEATURE_FMA && __ARM_FP&8
for double precision.  __ARM_FP&8 is set if double precision fp instructions
are available in the hardware and the float-abi allows it.  Old compilers
may not define it, but then __ARM_FEATURE_FMA won't be defined either.
---
 src/math/arm/fma.c  | 15 +++++++++++++++
 src/math/arm/fmaf.c | 15 +++++++++++++++
 2 files changed, 30 insertions(+)
 create mode 100644 src/math/arm/fma.c
 create mode 100644 src/math/arm/fmaf.c

diff --git a/src/math/arm/fma.c b/src/math/arm/fma.c
new file mode 100644
index 00000000..3e52c45a
--- /dev/null
+++ b/src/math/arm/fma.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if __ARM_FEATURE_FMA && __ARM_FP&8
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfma.f64 %P0, %P1, %P2" : "+w"(z) : "w"(x), "w"(y));
+	return z;
+}
+
+#else
+
+#include "../fma.c"
+
+#endif
diff --git a/src/math/arm/fmaf.c b/src/math/arm/fmaf.c
new file mode 100644
index 00000000..54451c1f
--- /dev/null
+++ b/src/math/arm/fmaf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if __ARM_FEATURE_FMA && __ARM_FP&4 && !BROKEN_VFP_ASM
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfma.f32 %0, %1, %2" : "+t"(z) : "t"(x), "t"(y));
+	return z;
+}
+
+#else
+
+#include "../fmaf.c"
+
+#endif
-- 
2.18.0


[-- Attachment #5: 0004-x86_64-add-single-instruction-fma.patch --]
[-- Type: text/x-diff, Size: 2983 bytes --]

From 21ece1306d717665c03fc9a118fe17811b69ec89 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sat, 22 Sep 2018 21:43:42 +0000
Subject: [PATCH 4/5] x86_64: add single instruction fma

fma is only available on recent x86_64 cpus and it is much faster than
a software fma, so this should be done with a runtime check, however
that requires more changes, this patch just adds the code so it can be
tested when musl is compiled with -mfma or -mfma4.
---
 src/math/x32/fma.c     | 23 +++++++++++++++++++++++
 src/math/x32/fmaf.c    | 23 +++++++++++++++++++++++
 src/math/x86_64/fma.c  | 23 +++++++++++++++++++++++
 src/math/x86_64/fmaf.c | 23 +++++++++++++++++++++++
 4 files changed, 92 insertions(+)
 create mode 100644 src/math/x32/fma.c
 create mode 100644 src/math/x32/fmaf.c
 create mode 100644 src/math/x86_64/fma.c
 create mode 100644 src/math/x86_64/fmaf.c

diff --git a/src/math/x32/fma.c b/src/math/x32/fma.c
new file mode 100644
index 00000000..4dd53f2a
--- /dev/null
+++ b/src/math/x32/fma.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmadd132sd %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmaddsd %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fma.c"
+
+#endif
diff --git a/src/math/x32/fmaf.c b/src/math/x32/fmaf.c
new file mode 100644
index 00000000..30b971ff
--- /dev/null
+++ b/src/math/x32/fmaf.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmadd132ss %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmaddss %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fmaf.c"
+
+#endif
diff --git a/src/math/x86_64/fma.c b/src/math/x86_64/fma.c
new file mode 100644
index 00000000..4dd53f2a
--- /dev/null
+++ b/src/math/x86_64/fma.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmadd132sd %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmaddsd %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fma.c"
+
+#endif
diff --git a/src/math/x86_64/fmaf.c b/src/math/x86_64/fmaf.c
new file mode 100644
index 00000000..30b971ff
--- /dev/null
+++ b/src/math/x86_64/fmaf.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmadd132ss %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmaddss %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fmaf.c"
+
+#endif
-- 
2.18.0


[-- Attachment #6: 0005-define-FP_FAST_FMA-and-FP_FAST_FMAF-when-fma-and-fma.patch --]
[-- Type: text/x-diff, Size: 4196 bytes --]

From c11ba119569609cfe3f910891405452306bdf303 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sun, 19 Mar 2017 03:56:01 +0000
Subject: [PATCH 5/5] define FP_FAST_FMA and FP_FAST_FMAF when fma and fmaf can
 be inlined

FP_FAST_FMA can be defined if "the fma function generally executes about
as fast as, or faster than, a multiply and an add of double operands",
which can only be true if the fma call is inlined as an instruction.

gcc sets __FP_FAST_FMA if __builtin_fma is inlined as an instruction,
but that does not mean an fma call will be inlined (e.g. it is defined
with -fno-builtin-fma), other compilers (clang) don't even have such
macro, so there is no reliable way to tell when fma is inlined.

one approach is to define FP_FAST_FMA based on the libc implementation:
when it has a single instruction implementation, then the compiler should
also be able to do the inlining and in case that fails at least the libc
code is still fast (there is just an extern call overhead).

on aarch64, powerpc, powerpc64, s390x we can give this guarantee, but
on arm, x32 and x86_64 runtime checks would be needed to do the same.

for now arm, x32 and x86_64 set FP_FAST_FMA when the compiler should be
able to inline fma, but if that fails the libc code will be slow (unless
musl is built for an isa baseline that includes an fma instruction).
---
 arch/aarch64/bits/math.h   | 2 ++
 arch/arm/bits/math.h       | 4 ++++
 arch/generic/bits/math.h   | 0
 arch/powerpc/bits/math.h   | 4 ++++
 arch/powerpc64/bits/math.h | 2 ++
 arch/s390x/bits/math.h     | 2 ++
 arch/x32/bits/math.h       | 4 ++++
 arch/x86_64/bits/math.h    | 4 ++++
 include/math.h             | 2 ++
 9 files changed, 24 insertions(+)
 create mode 100644 arch/aarch64/bits/math.h
 create mode 100644 arch/arm/bits/math.h
 create mode 100644 arch/generic/bits/math.h
 create mode 100644 arch/powerpc/bits/math.h
 create mode 100644 arch/powerpc64/bits/math.h
 create mode 100644 arch/s390x/bits/math.h
 create mode 100644 arch/x32/bits/math.h
 create mode 100644 arch/x86_64/bits/math.h

diff --git a/arch/aarch64/bits/math.h b/arch/aarch64/bits/math.h
new file mode 100644
index 00000000..c7ec28c5
--- /dev/null
+++ b/arch/aarch64/bits/math.h
@@ -0,0 +1,2 @@
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
diff --git a/arch/arm/bits/math.h b/arch/arm/bits/math.h
new file mode 100644
index 00000000..2fbf371c
--- /dev/null
+++ b/arch/arm/bits/math.h
@@ -0,0 +1,4 @@
+#if __ARM_FEATURE_FMA && (__ARM_FP&12) == 12
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
+#endif
diff --git a/arch/generic/bits/math.h b/arch/generic/bits/math.h
new file mode 100644
index 00000000..e69de29b
diff --git a/arch/powerpc/bits/math.h b/arch/powerpc/bits/math.h
new file mode 100644
index 00000000..3913b15e
--- /dev/null
+++ b/arch/powerpc/bits/math.h
@@ -0,0 +1,4 @@
+#ifndef _SOFT_FLOAT
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
+#endif
diff --git a/arch/powerpc64/bits/math.h b/arch/powerpc64/bits/math.h
new file mode 100644
index 00000000..c7ec28c5
--- /dev/null
+++ b/arch/powerpc64/bits/math.h
@@ -0,0 +1,2 @@
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
diff --git a/arch/s390x/bits/math.h b/arch/s390x/bits/math.h
new file mode 100644
index 00000000..c7ec28c5
--- /dev/null
+++ b/arch/s390x/bits/math.h
@@ -0,0 +1,2 @@
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
diff --git a/arch/x32/bits/math.h b/arch/x32/bits/math.h
new file mode 100644
index 00000000..c7569d6c
--- /dev/null
+++ b/arch/x32/bits/math.h
@@ -0,0 +1,4 @@
+#if __FMA__ || __FMA4__
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
+#endif
diff --git a/arch/x86_64/bits/math.h b/arch/x86_64/bits/math.h
new file mode 100644
index 00000000..c7569d6c
--- /dev/null
+++ b/arch/x86_64/bits/math.h
@@ -0,0 +1,4 @@
+#if __FMA__ || __FMA4__
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
+#endif
diff --git a/include/math.h b/include/math.h
index fea34686..58da26c2 100644
--- a/include/math.h
+++ b/include/math.h
@@ -11,6 +11,8 @@ extern "C" {
 #define __NEED_double_t
 #include <bits/alltypes.h>
 
+#include <bits/math.h>
+
 #if 100*__GNUC__+__GNUC_MINOR__ >= 303
 #define NAN       __builtin_nanf("")
 #define INFINITY  __builtin_inff()
-- 
2.18.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/5] add FP_FAST_FMA to math.h
  2018-09-23 15:09 [PATCH 0/5] add FP_FAST_FMA to math.h Szabolcs Nagy
  2018-09-23 15:11 ` Szabolcs Nagy
@ 2018-09-26 20:52 ` Szabolcs Nagy
  1 sibling, 0 replies; 3+ messages in thread
From: Szabolcs Nagy @ 2018-09-26 20:52 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 52 bytes --]

v2: fixed the arm patch to work around a clang bug.

[-- Attachment #2: 0001-s390x-add-single-instruction-fma-and-fmaf.patch --]
[-- Type: text/x-diff, Size: 1075 bytes --]

From 58cff58a39fe21d0ad55b572670b2ece0ed6c00e Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Thu, 13 Sep 2018 22:35:13 +0000
Subject: [PATCH 1/5] s390x: add single instruction fma and fmaf

These are available in the s390x baseline isa -march=z900.
---
 src/math/s390x/fma.c  | 7 +++++++
 src/math/s390x/fmaf.c | 7 +++++++
 2 files changed, 14 insertions(+)
 create mode 100644 src/math/s390x/fma.c
 create mode 100644 src/math/s390x/fmaf.c

diff --git a/src/math/s390x/fma.c b/src/math/s390x/fma.c
new file mode 100644
index 00000000..86da0e49
--- /dev/null
+++ b/src/math/s390x/fma.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("madbr %0, %1, %2" : "+f"(z) : "f"(x), "f"(y));
+	return z;
+}
diff --git a/src/math/s390x/fmaf.c b/src/math/s390x/fmaf.c
new file mode 100644
index 00000000..f1aec6ad
--- /dev/null
+++ b/src/math/s390x/fmaf.c
@@ -0,0 +1,7 @@
+#include <math.h>
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("maebr %0, %1, %2" : "+f"(z) : "f"(x), "f"(y));
+	return z;
+}
-- 
2.18.0


[-- Attachment #3: 0002-powerpc-add-single-instruction-fabs-fabsf-fma-fmaf-s.patch --]
[-- Type: text/x-diff, Size: 3164 bytes --]

From 6adb9c5683f8e9846223d6c062bfcfa1f21a85fe Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Thu, 20 Sep 2018 23:14:11 +0000
Subject: [PATCH 2/5] powerpc: add single instruction fabs, fabsf, fma, fmaf,
 sqrt, sqrtf

These are only available on hard float target and sqrt is not available
in the base ISA, so further check is used.
---
 src/math/powerpc/fabs.c  | 15 +++++++++++++++
 src/math/powerpc/fabsf.c | 15 +++++++++++++++
 src/math/powerpc/fma.c   | 15 +++++++++++++++
 src/math/powerpc/fmaf.c  | 15 +++++++++++++++
 src/math/powerpc/sqrt.c  | 15 +++++++++++++++
 src/math/powerpc/sqrtf.c | 15 +++++++++++++++
 6 files changed, 90 insertions(+)
 create mode 100644 src/math/powerpc/fabs.c
 create mode 100644 src/math/powerpc/fabsf.c
 create mode 100644 src/math/powerpc/fma.c
 create mode 100644 src/math/powerpc/fmaf.c
 create mode 100644 src/math/powerpc/sqrt.c
 create mode 100644 src/math/powerpc/sqrtf.c

diff --git a/src/math/powerpc/fabs.c b/src/math/powerpc/fabs.c
new file mode 100644
index 00000000..f6ec4433
--- /dev/null
+++ b/src/math/powerpc/fabs.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fabs.c"
+
+#else
+
+double fabs(double x)
+{
+	__asm__ ("fabs %0, %1" : "=d"(x) : "d"(x));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/fabsf.c b/src/math/powerpc/fabsf.c
new file mode 100644
index 00000000..d88b5911
--- /dev/null
+++ b/src/math/powerpc/fabsf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fabsf.c"
+
+#else
+
+float fabsf(float x)
+{
+	__asm__ ("fabs %0, %1" : "=f"(x) : "f"(x));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/fma.c b/src/math/powerpc/fma.c
new file mode 100644
index 00000000..fd268f5f
--- /dev/null
+++ b/src/math/powerpc/fma.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fma.c"
+
+#else
+
+double fma(double x, double y, double z)
+{
+	__asm__("fmadd %0, %1, %2, %3" : "=d"(x) : "d"(x), "d"(y), "d"(z));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/fmaf.c b/src/math/powerpc/fmaf.c
new file mode 100644
index 00000000..a99a2a3b
--- /dev/null
+++ b/src/math/powerpc/fmaf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#ifdef _SOFT_FLOAT
+
+#include "../fmaf.c"
+
+#else
+
+float fmaf(float x, float y, float z)
+{
+	__asm__("fmadds %0, %1, %2, %3" : "=f"(x) : "f"(x), "f"(y), "f"(z));
+	return x;
+}
+
+#endif
diff --git a/src/math/powerpc/sqrt.c b/src/math/powerpc/sqrt.c
new file mode 100644
index 00000000..8718dbd0
--- /dev/null
+++ b/src/math/powerpc/sqrt.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if !defined _SOFT_FLOAT && defined _ARCH_PPCSQ
+
+double sqrt(double x)
+{
+	__asm__ ("fsqrt %0, %1\n" : "=d" (x) : "d" (x));
+	return x;
+}
+
+#else
+
+#include "../sqrt.c"
+
+#endif
diff --git a/src/math/powerpc/sqrtf.c b/src/math/powerpc/sqrtf.c
new file mode 100644
index 00000000..3431b672
--- /dev/null
+++ b/src/math/powerpc/sqrtf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if !defined _SOFT_FLOAT && defined _ARCH_PPCSQ
+
+float sqrtf(float x)
+{
+	__asm__ ("fsqrts %0, %1\n" : "=f" (x) : "f" (x));
+	return x;
+}
+
+#else
+
+#include "../sqrtf.c"
+
+#endif
-- 
2.18.0


[-- Attachment #4: 0003-arm-add-single-instruction-fma.patch --]
[-- Type: text/x-diff, Size: 1719 bytes --]

From e33ac3fd4a39416c4b681d610c7ad9737a279260 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sat, 22 Sep 2018 18:47:27 +0000
Subject: [PATCH 3/5] arm: add single instruction fma

vfma is available in the vfpv4 fpu and above, the ACLE standard feature
test for double precision hardware fma support is
  __ARM_FEATURE_FMA && __ARM_FP&8
we need further checks to work around clang bugs (fixed in clang >=7.0)
  && !__SOFTFP__
because __ARM_FP is defined even with -mfloat-abi=soft
  && !BROKEN_VFP_ASM
to disable the single precision code when inline asm handling is broken.

For runtime selection the HWCAP_ARM_VFPv4 hwcap flag can be used, but
that requires further work.
---
 src/math/arm/fma.c  | 15 +++++++++++++++
 src/math/arm/fmaf.c | 15 +++++++++++++++
 2 files changed, 30 insertions(+)
 create mode 100644 src/math/arm/fma.c
 create mode 100644 src/math/arm/fmaf.c

diff --git a/src/math/arm/fma.c b/src/math/arm/fma.c
new file mode 100644
index 00000000..2a9b8efa
--- /dev/null
+++ b/src/math/arm/fma.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if __ARM_FEATURE_FMA && __ARM_FP&8 && !__SOFTFP__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfma.f64 %P0, %P1, %P2" : "+w"(z) : "w"(x), "w"(y));
+	return z;
+}
+
+#else
+
+#include "../fma.c"
+
+#endif
diff --git a/src/math/arm/fmaf.c b/src/math/arm/fmaf.c
new file mode 100644
index 00000000..a1793d27
--- /dev/null
+++ b/src/math/arm/fmaf.c
@@ -0,0 +1,15 @@
+#include <math.h>
+
+#if __ARM_FEATURE_FMA && __ARM_FP&4 && !__SOFTFP__ && !BROKEN_VFP_ASM
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfma.f32 %0, %1, %2" : "+t"(z) : "t"(x), "t"(y));
+	return z;
+}
+
+#else
+
+#include "../fmaf.c"
+
+#endif
-- 
2.18.0


[-- Attachment #5: 0004-x86_64-add-single-instruction-fma.patch --]
[-- Type: text/x-diff, Size: 2983 bytes --]

From 7a54c4fee1771cdc9de42445c813d9e7d43d272e Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sat, 22 Sep 2018 21:43:42 +0000
Subject: [PATCH 4/5] x86_64: add single instruction fma

fma is only available on recent x86_64 cpus and it is much faster than
a software fma, so this should be done with a runtime check, however
that requires more changes, this patch just adds the code so it can be
tested when musl is compiled with -mfma or -mfma4.
---
 src/math/x32/fma.c     | 23 +++++++++++++++++++++++
 src/math/x32/fmaf.c    | 23 +++++++++++++++++++++++
 src/math/x86_64/fma.c  | 23 +++++++++++++++++++++++
 src/math/x86_64/fmaf.c | 23 +++++++++++++++++++++++
 4 files changed, 92 insertions(+)
 create mode 100644 src/math/x32/fma.c
 create mode 100644 src/math/x32/fmaf.c
 create mode 100644 src/math/x86_64/fma.c
 create mode 100644 src/math/x86_64/fmaf.c

diff --git a/src/math/x32/fma.c b/src/math/x32/fma.c
new file mode 100644
index 00000000..4dd53f2a
--- /dev/null
+++ b/src/math/x32/fma.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmadd132sd %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmaddsd %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fma.c"
+
+#endif
diff --git a/src/math/x32/fmaf.c b/src/math/x32/fmaf.c
new file mode 100644
index 00000000..30b971ff
--- /dev/null
+++ b/src/math/x32/fmaf.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmadd132ss %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmaddss %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fmaf.c"
+
+#endif
diff --git a/src/math/x86_64/fma.c b/src/math/x86_64/fma.c
new file mode 100644
index 00000000..4dd53f2a
--- /dev/null
+++ b/src/math/x86_64/fma.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmadd132sd %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+double fma(double x, double y, double z)
+{
+	__asm__ ("vfmaddsd %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fma.c"
+
+#endif
diff --git a/src/math/x86_64/fmaf.c b/src/math/x86_64/fmaf.c
new file mode 100644
index 00000000..30b971ff
--- /dev/null
+++ b/src/math/x86_64/fmaf.c
@@ -0,0 +1,23 @@
+#include <math.h>
+
+#if __FMA__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmadd132ss %1, %2, %0" : "+x" (x) : "x" (y), "x" (z));
+	return x;
+}
+
+#elif __FMA4__
+
+float fmaf(float x, float y, float z)
+{
+	__asm__ ("vfmaddss %3, %2, %1, %0" : "=x" (x) : "x" (x), "x" (y), "x" (z));
+	return x;
+}
+
+#else
+
+#include "../fmaf.c"
+
+#endif
-- 
2.18.0


[-- Attachment #6: 0005-define-FP_FAST_FMA-and-FP_FAST_FMAF-when-fma-and-fma.patch --]
[-- Type: text/x-diff, Size: 4264 bytes --]

From 03768f71f9ece09838c86828f710180d9e1803d6 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sun, 19 Mar 2017 03:56:01 +0000
Subject: [PATCH 5/5] define FP_FAST_FMA and FP_FAST_FMAF when fma and fmaf can
 be inlined

FP_FAST_FMA can be defined if "the fma function generally executes about
as fast as, or faster than, a multiply and an add of double operands",
which can only be true if the fma call is inlined as an instruction.

gcc sets __FP_FAST_FMA if __builtin_fma is inlined as an instruction,
but that does not mean an fma call will be inlined (e.g. it is defined
with -fno-builtin-fma), other compilers (clang) don't even have such
macro, so there is no reliable way to tell when fma is inlined.

one approach is to define FP_FAST_FMA based on the libc implementation:
when it has a single instruction implementation, then the compiler should
also be able to do the inlining and in case that fails at least the libc
code is still fast (there is just an extern call overhead).

on aarch64, powerpc, powerpc64, s390x we can give this guarantee, but
on arm, x32 and x86_64 runtime checks would be needed to do the same.

for now arm, x32 and x86_64 set FP_FAST_FMA when the compiler should be
able to inline fma, but if that fails the libc code will be slow (unless
musl is built for an isa baseline that includes an fma instruction).
---
 arch/aarch64/bits/math.h   | 2 ++
 arch/arm/bits/math.h       | 6 ++++++
 arch/generic/bits/math.h   | 0
 arch/powerpc/bits/math.h   | 4 ++++
 arch/powerpc64/bits/math.h | 2 ++
 arch/s390x/bits/math.h     | 2 ++
 arch/x32/bits/math.h       | 4 ++++
 arch/x86_64/bits/math.h    | 4 ++++
 include/math.h             | 2 ++
 9 files changed, 26 insertions(+)
 create mode 100644 arch/aarch64/bits/math.h
 create mode 100644 arch/arm/bits/math.h
 create mode 100644 arch/generic/bits/math.h
 create mode 100644 arch/powerpc/bits/math.h
 create mode 100644 arch/powerpc64/bits/math.h
 create mode 100644 arch/s390x/bits/math.h
 create mode 100644 arch/x32/bits/math.h
 create mode 100644 arch/x86_64/bits/math.h

diff --git a/arch/aarch64/bits/math.h b/arch/aarch64/bits/math.h
new file mode 100644
index 00000000..c7ec28c5
--- /dev/null
+++ b/arch/aarch64/bits/math.h
@@ -0,0 +1,2 @@
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
diff --git a/arch/arm/bits/math.h b/arch/arm/bits/math.h
new file mode 100644
index 00000000..f87817f0
--- /dev/null
+++ b/arch/arm/bits/math.h
@@ -0,0 +1,6 @@
+#if __ARM_FEATURE_FMA && __ARM_FP&8 && !__SOFTFP__
+#define FP_FAST_FMA 1
+#endif
+#if __ARM_FEATURE_FMA && __ARM_FP&4 && !__SOFTFP__
+#define FP_FAST_FMAF 1
+#endif
diff --git a/arch/generic/bits/math.h b/arch/generic/bits/math.h
new file mode 100644
index 00000000..e69de29b
diff --git a/arch/powerpc/bits/math.h b/arch/powerpc/bits/math.h
new file mode 100644
index 00000000..3913b15e
--- /dev/null
+++ b/arch/powerpc/bits/math.h
@@ -0,0 +1,4 @@
+#ifndef _SOFT_FLOAT
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
+#endif
diff --git a/arch/powerpc64/bits/math.h b/arch/powerpc64/bits/math.h
new file mode 100644
index 00000000..c7ec28c5
--- /dev/null
+++ b/arch/powerpc64/bits/math.h
@@ -0,0 +1,2 @@
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
diff --git a/arch/s390x/bits/math.h b/arch/s390x/bits/math.h
new file mode 100644
index 00000000..c7ec28c5
--- /dev/null
+++ b/arch/s390x/bits/math.h
@@ -0,0 +1,2 @@
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
diff --git a/arch/x32/bits/math.h b/arch/x32/bits/math.h
new file mode 100644
index 00000000..c7569d6c
--- /dev/null
+++ b/arch/x32/bits/math.h
@@ -0,0 +1,4 @@
+#if __FMA__ || __FMA4__
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
+#endif
diff --git a/arch/x86_64/bits/math.h b/arch/x86_64/bits/math.h
new file mode 100644
index 00000000..c7569d6c
--- /dev/null
+++ b/arch/x86_64/bits/math.h
@@ -0,0 +1,4 @@
+#if __FMA__ || __FMA4__
+#define FP_FAST_FMA 1
+#define FP_FAST_FMAF 1
+#endif
diff --git a/include/math.h b/include/math.h
index fea34686..58da26c2 100644
--- a/include/math.h
+++ b/include/math.h
@@ -11,6 +11,8 @@ extern "C" {
 #define __NEED_double_t
 #include <bits/alltypes.h>
 
+#include <bits/math.h>
+
 #if 100*__GNUC__+__GNUC_MINOR__ >= 303
 #define NAN       __builtin_nanf("")
 #define INFINITY  __builtin_inff()
-- 
2.18.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-09-26 20:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-23 15:09 [PATCH 0/5] add FP_FAST_FMA to math.h Szabolcs Nagy
2018-09-23 15:11 ` Szabolcs Nagy
2018-09-26 20:52 ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).