mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH] x86: optimize fp_arch.h
@ 2019-04-24 23:51 Szabolcs Nagy
  2019-04-25  2:01 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Szabolcs Nagy @ 2019-04-24 23:51 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 26 bytes --]

tested on x86_64 and i386

[-- Attachment #2: 0001-x86-optimize-fp_arch.h.patch --]
[-- Type: text/x-diff, Size: 2784 bytes --]

From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Wed, 24 Apr 2019 23:29:05 +0000
Subject: [PATCH] x86: optimize fp_arch.h

Use fp register constraint instead of volatile store when sse2 math is
available, and use memory constraint when only x87 fpu is available.
---
 arch/i386/fp_arch.h   | 31 +++++++++++++++++++++++++++++++
 arch/x32/fp_arch.h    | 25 +++++++++++++++++++++++++
 arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++
 3 files changed, 81 insertions(+)
 create mode 100644 arch/i386/fp_arch.h
 create mode 100644 arch/x32/fp_arch.h
 create mode 100644 arch/x86_64/fp_arch.h

diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h
new file mode 100644
index 00000000..b4019de2
--- /dev/null
+++ b/arch/i386/fp_arch.h
@@ -0,0 +1,31 @@
+#ifdef __SSE2_MATH__
+#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x))
+#else
+#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x))
+#endif
+
+#define fp_barrierf fp_barrierf
+static inline float fp_barrierf(float x)
+{
+	FP_BARRIER(x);
+	return x;
+}
+
+#define fp_barrier fp_barrier
+static inline double fp_barrier(double x)
+{
+	FP_BARRIER(x);
+	return x;
+}
+
+#define fp_force_evalf fp_force_evalf
+static inline void fp_force_evalf(float x)
+{
+	FP_BARRIER(x);
+}
+
+#define fp_force_eval fp_force_eval
+static inline void fp_force_eval(double x)
+{
+	FP_BARRIER(x);
+}
diff --git a/arch/x32/fp_arch.h b/arch/x32/fp_arch.h
new file mode 100644
index 00000000..ff9b8311
--- /dev/null
+++ b/arch/x32/fp_arch.h
@@ -0,0 +1,25 @@
+#define fp_barrierf fp_barrierf
+static inline float fp_barrierf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_barrier fp_barrier
+static inline double fp_barrier(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_force_evalf fp_force_evalf
+static inline void fp_force_evalf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
+
+#define fp_force_eval fp_force_eval
+static inline void fp_force_eval(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
diff --git a/arch/x86_64/fp_arch.h b/arch/x86_64/fp_arch.h
new file mode 100644
index 00000000..ff9b8311
--- /dev/null
+++ b/arch/x86_64/fp_arch.h
@@ -0,0 +1,25 @@
+#define fp_barrierf fp_barrierf
+static inline float fp_barrierf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_barrier fp_barrier
+static inline double fp_barrier(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+	return x;
+}
+
+#define fp_force_evalf fp_force_evalf
+static inline void fp_force_evalf(float x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
+
+#define fp_force_eval fp_force_eval
+static inline void fp_force_eval(double x)
+{
+	__asm__ __volatile__ ("" : "+x"(x));
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86: optimize fp_arch.h
  2019-04-24 23:51 [PATCH] x86: optimize fp_arch.h Szabolcs Nagy
@ 2019-04-25  2:01 ` Rich Felker
  2019-04-25  8:53   ` Szabolcs Nagy
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2019-04-25  2:01 UTC (permalink / raw)
  To: musl

On Thu, Apr 25, 2019 at 01:51:06AM +0200, Szabolcs Nagy wrote:
> tested on x86_64 and i386

> >From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@port70.net>
> Date: Wed, 24 Apr 2019 23:29:05 +0000
> Subject: [PATCH] x86: optimize fp_arch.h
> 
> Use fp register constraint instead of volatile store when sse2 math is
> available, and use memory constraint when only x87 fpu is available.
> ---
>  arch/i386/fp_arch.h   | 31 +++++++++++++++++++++++++++++++
>  arch/x32/fp_arch.h    | 25 +++++++++++++++++++++++++
>  arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++
>  3 files changed, 81 insertions(+)
>  create mode 100644 arch/i386/fp_arch.h
>  create mode 100644 arch/x32/fp_arch.h
>  create mode 100644 arch/x86_64/fp_arch.h
> 
> diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h
> new file mode 100644
> index 00000000..b4019de2
> --- /dev/null
> +++ b/arch/i386/fp_arch.h
> @@ -0,0 +1,31 @@
> +#ifdef __SSE2_MATH__
> +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x))
> +#else
> +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x))
> +#endif

I guess for float and double you need the "m" constraint to ensure
that a broken compiler doesn't skip dropping of precision (although I
still wish we didn't bother with complexity to support that, and just
relied on cast working correctly), but at least for long double
couldn't we use an x87 register constraint to avoid the spill to
memory?

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86: optimize fp_arch.h
  2019-04-25  2:01 ` Rich Felker
@ 2019-04-25  8:53   ` Szabolcs Nagy
  0 siblings, 0 replies; 3+ messages in thread
From: Szabolcs Nagy @ 2019-04-25  8:53 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2019-04-24 22:01:08 -0400]:
> On Thu, Apr 25, 2019 at 01:51:06AM +0200, Szabolcs Nagy wrote:
> > tested on x86_64 and i386
> 
> > >From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001
> > From: Szabolcs Nagy <nsz@port70.net>
> > Date: Wed, 24 Apr 2019 23:29:05 +0000
> > Subject: [PATCH] x86: optimize fp_arch.h
> > 
> > Use fp register constraint instead of volatile store when sse2 math is
> > available, and use memory constraint when only x87 fpu is available.
> > ---
> >  arch/i386/fp_arch.h   | 31 +++++++++++++++++++++++++++++++
> >  arch/x32/fp_arch.h    | 25 +++++++++++++++++++++++++
> >  arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++
> >  3 files changed, 81 insertions(+)
> >  create mode 100644 arch/i386/fp_arch.h
> >  create mode 100644 arch/x32/fp_arch.h
> >  create mode 100644 arch/x86_64/fp_arch.h
> > 
> > diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h
> > new file mode 100644
> > index 00000000..b4019de2
> > --- /dev/null
> > +++ b/arch/i386/fp_arch.h
> > @@ -0,0 +1,31 @@
> > +#ifdef __SSE2_MATH__
> > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x))
> > +#else
> > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x))
> > +#endif
> 
> I guess for float and double you need the "m" constraint to ensure
> that a broken compiler doesn't skip dropping of precision (although I
> still wish we didn't bother with complexity to support that, and just
> relied on cast working correctly), but at least for long double
> couldn't we use an x87 register constraint to avoid the spill to
> memory?

i think fp_barrier does not have to drop excess precision:
it is supposed to be an identity op that is hidden from
the compiler e.g. to prevent const folding or hoisting,
but fp_force_eval is used to force side-effects that may only
happen if the excess precision is dropped.

i think modern gcc drops excess precision at arg passing
in standard mode, so "+m" is not needed, but makes the code
behave the same in non-standard mode too.

and yes the long double version could use "+t", maybe i should
add that (the patch saves about 400byte .text because of
volatile load/store overhead).


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-04-25  8:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 23:51 [PATCH] x86: optimize fp_arch.h Szabolcs Nagy
2019-04-25  2:01 ` Rich Felker
2019-04-25  8:53   ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).