mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] [PATCH v4] math: add riscv64 round/roundf
@ 2024-06-13  8:07 Meng Zhuo
  2024-06-13 15:34 ` Markus Wichmann
  0 siblings, 1 reply; 2+ messages in thread
From: Meng Zhuo @ 2024-06-13  8:07 UTC (permalink / raw)
  To: musl

---
v3 -> v4:
* add fabs(f) to avoild overflow.
* add comment on -0 copysign

Thanks for review!
How to implement "single cmp+branch on the bit representation of x"?
I've tried Google bit hacks of it. Could you give some tip or refs?
---
 src/math/riscv64/round.c  | 21 +++++++++++++++++++++
 src/math/riscv64/roundf.c | 21 +++++++++++++++++++++
 2 files changed, 42 insertions(+)
 create mode 100644 src/math/riscv64/round.c
 create mode 100644 src/math/riscv64/roundf.c

diff --git a/src/math/riscv64/round.c b/src/math/riscv64/round.c
new file mode 100644
index 00000000..7d7ade9d
--- /dev/null
+++ b/src/math/riscv64/round.c
@@ -0,0 +1,21 @@
+#include <math.h>
+
+#if __riscv_flen >= 64
+
+double round(double x)
+{
+	if (!isfinite(x) || fabs(x) >= 0x1p52) return x;
+	double tmp;
+	long long n;
+	__asm__ ("fcvt.l.d %0, %1, rmm" : "=r"(n) : "f"(x));
+	__asm__ ("fcvt.d.l %0, %1" : "=f"(tmp) : "r"(n));
+	// the sign bit is only copied to handle round(-0.0)
+	__asm__ ("fsgnj.d %0, %1, %2" : "=f"(x) : "f"(tmp), "f"(x));
+	return x;
+}
+
+#else
+
+#include "../round.c"
+
+#endif
diff --git a/src/math/riscv64/roundf.c b/src/math/riscv64/roundf.c
new file mode 100644
index 00000000..be588574
--- /dev/null
+++ b/src/math/riscv64/roundf.c
@@ -0,0 +1,21 @@
+#include <math.h>
+
+#if __riscv_flen >= 32
+
+float roundf(float x)
+{
+	if (!isfinite(x) || fabsf(x) >= 0x1p23) return x;
+	float tmp;
+	long n;
+	__asm__ ("fcvt.w.s %0, %1, rmm" : "=r"(n) : "f"(x));
+	__asm__ ("fcvt.s.w %0, %1" : "=f"(tmp) : "r"(n));
+	// the sign bit is only copied to handle round(-0.0)
+	__asm__ ("fsgnj.s %0, %1, %2" : "=f"(x) : "f"(tmp), "f"(x));
+	return x;
+}
+
+#else
+
+#include "../roundf.c"
+
+#endif
-- 
2.39.2


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [musl] [PATCH v4] math: add riscv64 round/roundf
  2024-06-13  8:07 [musl] [PATCH v4] math: add riscv64 round/roundf Meng Zhuo
@ 2024-06-13 15:34 ` Markus Wichmann
  0 siblings, 0 replies; 2+ messages in thread
From: Markus Wichmann @ 2024-06-13 15:34 UTC (permalink / raw)
  To: musl; +Cc: Meng Zhuo

Am Thu, Jun 13, 2024 at 04:07:17PM +0800 schrieb Meng Zhuo:
> ---
> v3 -> v4:
> * add fabs(f) to avoild overflow.
> * add comment on -0 copysign
>
> Thanks for review!
> How to implement "single cmp+branch on the bit representation of x"?
> I've tried Google bit hacks of it. Could you give some tip or refs?
> ---

musl's libm.h contains macro asuint64() to get the bit representation.
And you can use it in this case like this:

Original:
> +	if (!isfinite(x) || fabs(x) >= 0x1p52) return x;

All IEEE FP numbers have the sign bit as MSB, then comes the exponent,
then the significand. Note that all numbers that are not finite have an
exponent field of all 1-bits. Meaning for this one, you only need to
check if the exponent field is larger than 52 after biasing, or in other
words

if (((asuint64(x) >> 52) & 0x7ff) >= 52+0x3ff) return x;

The exponent bias value is always half the maximum value that would fit
the exponent field. double has an 11-bit exponent field. And mantissa
does not matter for the selection, so we can shift it away.

Adapting for float is left as an exercise for the reader.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-06-13 15:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-13  8:07 [musl] [PATCH v4] math: add riscv64 round/roundf Meng Zhuo
2024-06-13 15:34 ` Markus Wichmann

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).