From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 17002 invoked from network); 13 Aug 2021 12:18:03 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 13 Aug 2021 12:18:03 -0000 Received: (qmail 9454 invoked by uid 550); 13 Aug 2021 12:17:58 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 9436 invoked from network); 13 Aug 2021 12:17:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1628857065; bh=bxO5T61Tdzs9rVma1WIbloNgXKJh4RQwxSlKYZvdy3E=; h=From:To:Cc:References:In-Reply-To:Subject:Date; b=gaqJmGc4HKBDpuJt+R2G93G/MT5jcan+mZmdcOq3bvmZ9NMYDolSElNxMErO122sE 38NQ2rWy3mYRCFnbZn8AoYUjzrsAClrupDwiN3x8JpVnnFZF7bb3QZsZhBzMchDdVI pRhaRmiQu+b7zsI+aZ2VWW35pDISDngSLZzpEtw8= Message-ID: From: "Stefan Kanthak" To: "Szabolcs Nagy" Cc: References: <0C6AAAD55DA44C6189B2FF4F5FB2C3E7@H270> <20210810213455.GB37904@port70.net> In-Reply-To: <20210810213455.GB37904@port70.net> Date: Fri, 13 Aug 2021 14:04:51 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_8F6B_01D7904C.2F6484F0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate: clean X-purgate-size: 6421 X-purgate-ID: 155817::1628857065-00006056-A4649763/0/0 Subject: Re: [musl] [PATCH #2] Properly simplified nextafter() This is a multi-part message in MIME format. ------=_NextPart_000_8F6B_01D7904C.2F6484F0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Szabolcs Nagy wrote on 2021-08-10 at 23:34: >* Stefan Kanthak [2021-08-10 08:23:46 +0200]: >> >> has quite some superfluous statements: >> >> 1. there's absolutely no need for 2 uint64_t holding |x| and |y|; >> 2. IEEE-754 specifies -0.0 == +0.0, so (x == y) is equivalent to >> (ax == 0) && (ay == 0): the latter 2 tests can be removed; > > you replaced 4 int cmps with 4 float cmps (among other things). > > it's target dependent if float compares are fast or not. It's also target dependent whether the FP additions and multiplies used to raise overflow/underflow are SLOOOWWW: how can you justify them, especially for targets using soft-float? | /* raise overflow if ux.f is infinite and x is finite */ | if (e == 0x7ff) | FORCE_EVAL(x+x); | /* raise underflow if ux.f is subnormal or zero */ | if (e == 0) | FORCE_EVAL(x*x + ux.f*ux.f); > (the i386 machine where i originally tested this preferred int > cmp and float cmp was very slow in the subnormal range This also and still holds for i386 FPU fadd/fmul as well as SSE addsd/addss/mulss/mulsd additions/multiplies! Second version: --- -/src/math/nextafter.c +++ +/src/math/nextafter.c @@ -10,13 +10,13 @@ return x + y; if (ux.i == uy.i) return y; - ax = ux.i & -1ULL/2; - ay = uy.i & -1ULL/2; + ax = ux.i << 2; + ay = uy.i << 2; if (ax == 0) { if (ay == 0) return y; ux.i = (uy.i & 1ULL<<63) | 1; - } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63)) + } else if ((ax < ay) == ((int64_t) ux.i < 0)) ux.i--; else ux.i++; For AMD64, GCC generates the following ABSOLUTELY HORRIBLE CRAP (the original code compiles even worse): 0000000000000000 : 0: 48 83 ec 38 sub $0x38,%rsp 4: 0f 29 74 24 20 movaps %xmm6,0x20(%rsp) 9: 49 b8 ff ff ff ff ff movabs $0x7fffffffffffffff,%r8 10: ff ff 7f 13: 49 b9 00 00 00 00 00 movabs $0x7ff0000000000000,%r9 1a: 00 f0 7f 1d: 66 49 0f 7e c2 movq %xmm0,%r10 22: 66 48 0f 7e c2 movq %xmm0,%rdx 27: 66 48 0f 7e c8 movq %xmm1,%rax 2c: 4d 21 c2 and %r8,%r10 2f: 66 48 0f 7e c1 movq %xmm0,%rcx 34: 4d 39 ca cmp %r9,%r10 37: 0f 87 83 00 00 00 ja bb 3d: 49 21 c0 and %rax,%r8 40: 66 49 0f 7e ca movq %xmm1,%r10 45: 4d 39 c8 cmp %r9,%r8 48: 77 76 ja bb 4a: 66 0f 28 f1 movapd %xmm1,%xmm6 4e: 48 39 d0 cmp %rdx,%rax 51: 74 7b je c9 53: 66 49 0f 7e c0 movq %xmm0,%r8 58: 48 8d 04 85 00 00 00 lea 0x0(,%rax,4),%rax 5f: 00 60: 49 c1 e0 02 shl $0x2,%r8 64: 74 7a je db 66: 49 39 c0 cmp %rax,%r8 69: 66 49 0f 7e c0 movq %xmm0,%r8 6e: 48 8d 42 ff lea -0x1(%rdx),%rax 72: 41 0f 93 c1 setae %r9b 76: 49 c1 e8 3f shr $0x3f,%r8 7a: 48 83 c1 01 add $0x1,%rcx 7e: 45 38 c1 cmp %r8b,%r9b 81: 48 0f 44 c1 cmove %rcx,%rax 85: 48 89 c1 mov %rax,%rcx 88: 66 48 0f 6e f0 movq %rax,%xmm6 8d: 48 c1 e9 34 shr $0x34,%rcx 91: 81 e1 ff 07 00 00 and $0x7ff,%ecx 97: 81 f9 ff 07 00 00 cmp $0x7ff,%ecx 9d: 74 61 je ef 9f: 85 c9 test %ecx,%ecx a1: 75 2b jne c9 a3: 66 48 0f 6e c2 movq %rdx,%xmm0 a8: 66 48 0f 6e c8 movq %rax,%xmm1 ad: f2 0f 59 ce mulsd %xmm6,%xmm1 b1: f2 0f 59 c0 mulsd %xmm0,%xmm0 b5: f2 0f 58 c1 addsd %xmm1,%xmm0 b9: eb 0e jmp c9 bb: 66 48 0f 6e f2 movq %rdx,%xmm6 c0: 66 48 0f 6e d0 movq %rax,%xmm2 c5: f2 0f 58 f2 addsd %xmm2,%xmm6 c9: 66 0f 28 c6 movapd %xmm6,%xmm0 cd: 0f 28 74 24 20 movaps 0x20(%rsp),%xmm6 d2: 48 83 c4 38 add $0x38,%rsp d6: c3 retq d7: 48 85 c0 test %rax,%rax da: 74 e9 je c9 dc: 48 b8 00 00 00 00 00 movabs $0x8000000000000000,%rax e3: 00 00 80 e6: 4c 21 d0 and %r10,%rax ea: 48 83 c8 01 or $0x1,%rax ed: eb 8d jmp 85 ef: 66 48 0f 6e c2 movq %rdx,%xmm0 f4: f2 0f 58 c0 addsd %xmm0,%xmm0 f8: eb be jmp c9 How do you compare these 60 instructions/252 bytes to the code I posted (23 instructions/72 bytes)? not amused about such HORRIBLE machine code! Stefan ------=_NextPart_000_8F6B_01D7904C.2F6484F0 Content-Type: application/octet-stream; name="nextafter.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="nextafter.patch" --- -/src/math/nextafter.c=0A= +++ +/src/math/nextafter.c=0A= @@ -10,13 +10,13 @@=0A= return x + y;=0A= if (ux.i =3D=3D uy.i)=0A= return y;=0A= - ax =3D ux.i & -1ULL/2;=0A= - ay =3D uy.i & -1ULL/2;=0A= + ax =3D ux.i << 2;=0A= + ay =3D uy.i << 2;=0A= if (ax =3D=3D 0) {=0A= if (ay =3D=3D 0)=0A= return y;=0A= ux.i =3D (uy.i & 1ULL<<63) | 1;=0A= - } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))=0A= + } else if ((ax < ay) =3D=3D ((int64_t) ux.i < 0))=0A= ux.i--;=0A= else=0A= ux.i++;=0A= ------=_NextPart_000_8F6B_01D7904C.2F6484F0--