From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/15101
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Alexander Monakov <amonakov@ispras.ru>
Newsgroups: gmane.linux.lib.musl.general
Subject: [PATCH] math: move i386 sqrt to C
Date: Tue,  7 Jan 2020 16:06:05 +0300
Message-ID: <20200107130605.7618-1-amonakov@ispras.ru>
References: <alpine.LNX.2.20.13.2001051915090.31907@monopod.intra.ispras.ru>
Reply-To: musl@lists.openwall.com
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2.11.0"
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="259116"; mail-complaints-to="usenet@blaine.gmane.org"
To: musl@lists.openwall.com
Original-X-From: musl-return-15117-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jan 07 14:06:20 2020
Return-path: <musl-return-15117-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <musl-return-15117-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1iooYt-0012L6-Pr
	for gllmg-musl@m.gmane.org; Tue, 07 Jan 2020 14:06:19 +0100
Original-Received: (qmail 28482 invoked by uid 550); 7 Jan 2020 13:06:17 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 28450 invoked from network); 7 Jan 2020 13:06:16 -0000
X-Mailer: git-send-email 2.11.0
In-Reply-To: <alpine.LNX.2.20.13.2001051915090.31907@monopod.intra.ispras.ru>
Xref: news.gmane.org gmane.linux.lib.musl.general:15101
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/15101>

This is a multi-part message in MIME format.
--------------2.11.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit

---
Since union ldshape does not have a dedicated field for 32 least significant
bits of the x87 long double mantissa, keeping the original approach with

    ux.i.m -= (fpsr & 0x200) - 0x100;

would lead to a 64-bit subtraction that is not trivial for the compiler to
optimize to 32-bit subtraction as done in the original assembly. Therefore
I have elected to change the approach and use

    ux.i.m ^= (fpsr & 0x200) + 0x200;

which is easier to optimize to a 32-bit rather than 64-bit xor.

Thoughts?

 src/math/i386/sqrt.c | 15 +++++++++++++++
 src/math/i386/sqrt.s | 21 ---------------------
 2 files changed, 15 insertions(+), 21 deletions(-)
 create mode 100644 src/math/i386/sqrt.c
 delete mode 100644 src/math/i386/sqrt.s


--------------2.11.0
Content-Type: text/x-patch; name="0005-math-move-i386-sqrt-to-C.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline; filename="0005-math-move-i386-sqrt-to-C.patch"

diff --git a/src/math/i386/sqrt.c b/src/math/i386/sqrt.c
new file mode 100644
index 00000000..619df056
--- /dev/null
+++ b/src/math/i386/sqrt.c
@@ -0,0 +1,15 @@
+#include "libm.h"
+
+double sqrt(double x)
+{
+	union ldshape ux;
+	unsigned fpsr;
+	__asm__ ("fsqrt; fnstsw %%ax": "=t"(ux.f), "=a"(fpsr) : "0"(x));
+	if ((ux.i.m & 0x7ff) != 0x400)
+		return (double)ux.f;
+	/* Rounding to double would have encountered an exact halfway case.
+	   Adjust mantissa downwards if fsqrt rounded up, else upwards.
+	   (result of fsqrt could not have been exact) */
+	ux.i.m ^= (fpsr & 0x200) + 0x200;
+	return (double)ux.f;
+}
diff --git a/src/math/i386/sqrt.s b/src/math/i386/sqrt.s
deleted file mode 100644
index 57837e25..00000000
--- a/src/math/i386/sqrt.s
+++ /dev/null
@@ -1,21 +0,0 @@
-.global sqrt
-.type sqrt,@function
-sqrt:	fldl 4(%esp)
-	fsqrt
-	fnstsw %ax
-	sub $12,%esp
-	fld %st(0)
-	fstpt (%esp)
-	mov (%esp),%ecx
-	and $0x7ff,%ecx
-	cmp $0x400,%ecx
-	jnz 1f
-	and $0x200,%eax
-	sub $0x100,%eax
-	sub %eax,(%esp)
-	fstp %st(0)
-	fldt (%esp)
-1:	add $12,%esp
-	fstpl 4(%esp)
-	fldl 4(%esp)
-	ret

--------------2.11.0--