Re: [musl] [PATCH #2] Properly simplified nextafter()

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Szabolcs Nagy" <nsz@port70.net>
Cc: <musl@lists.openwall.com>
Subject: Re: [musl] [PATCH #2] Properly simplified nextafter()
Date: Fri, 13 Aug 2021 14:04:51 +0200	[thread overview]
Message-ID: <E2423D1F1F3848848AEA933048174858@H270> (raw)
In-Reply-To: <20210810213455.GB37904@port70.net>

[-- Attachment #1: Type: text/plain, Size: 5307 bytes --]

Szabolcs Nagy <nsz@port70.net> wrote on 2021-08-10 at 23:34:

>* Stefan Kanthak <stefan.kanthak@nexgo.de> [2021-08-10 08:23:46 +0200]:
>> <https://git.musl-libc.org/cgit/musl/plain/src/math/nextafter.c>
>> has quite some superfluous statements:
>> 
>> 1. there's absolutely no need for 2 uint64_t holding |x| and |y|;
>> 2. IEEE-754 specifies -0.0 == +0.0, so (x == y) is equivalent to
>>    (ax == 0) && (ay == 0): the latter 2 tests can be removed;
> 
> you replaced 4 int cmps with 4 float cmps (among other things).
> 
> it's target dependent if float compares are fast or not.

It's also target dependent whether the FP additions and multiplies
used to raise overflow/underflow are SLOOOWWW: how can you justify
them, especially for targets using soft-float?

|        /* raise overflow if ux.f is infinite and x is finite */
|        if (e == 0x7ff)
|                FORCE_EVAL(x+x);
|        /* raise underflow if ux.f is subnormal or zero */
|        if (e == 0)
|                FORCE_EVAL(x*x + ux.f*ux.f);

> (the i386 machine where i originally tested this preferred int
> cmp and float cmp was very slow in the subnormal range

This also and still holds for i386 FPU fadd/fmul as well as SSE
addsd/addss/mulss/mulsd additions/multiplies!

Second version:

--- -/src/math/nextafter.c
+++ +/src/math/nextafter.c
@@ -10,13 +10,13 @@
                return x + y;
        if (ux.i == uy.i)
                return y;
-       ax = ux.i & -1ULL/2;
-       ay = uy.i & -1ULL/2;
+       ax = ux.i << 2;
+       ay = uy.i << 2;
        if (ax == 0) {
                if (ay == 0)
                        return y;
                ux.i = (uy.i & 1ULL<<63) | 1;
-       } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))
+       } else if ((ax < ay) == ((int64_t) ux.i < 0))
                ux.i--;
        else
                ux.i++;

For AMD64, GCC generates the following ABSOLUTELY HORRIBLE CRAP
(the original code compiles even worse):

0000000000000000 <nextafter>:
   0:   48 83 ec 38             sub    $0x38,%rsp
   4:   0f 29 74 24 20          movaps %xmm6,0x20(%rsp)
   9:   49 b8 ff ff ff ff ff    movabs $0x7fffffffffffffff,%r8
  10:   ff ff 7f
  13:   49 b9 00 00 00 00 00    movabs $0x7ff0000000000000,%r9
  1a:   00 f0 7f
  1d:   66 49 0f 7e c2          movq   %xmm0,%r10
  22:   66 48 0f 7e c2          movq   %xmm0,%rdx
  27:   66 48 0f 7e c8          movq   %xmm1,%rax
  2c:   4d 21 c2                and    %r8,%r10
  2f:   66 48 0f 7e c1          movq   %xmm0,%rcx
  34:   4d 39 ca                cmp    %r9,%r10
  37:   0f 87 83 00 00 00       ja     bb <nextafter+0xbb>
  3d:   49 21 c0                and    %rax,%r8
  40:   66 49 0f 7e ca          movq   %xmm1,%r10
  45:   4d 39 c8                cmp    %r9,%r8
  48:   77 76                   ja     bb <nextafter+0xbb>
  4a:   66 0f 28 f1             movapd %xmm1,%xmm6
  4e:   48 39 d0                cmp    %rdx,%rax
  51:   74 7b                   je     c9 <nextafter+0xc9>
  53:   66 49 0f 7e c0          movq   %xmm0,%r8
  58:   48 8d 04 85 00 00 00    lea    0x0(,%rax,4),%rax
  5f:   00
  60:   49 c1 e0 02             shl    $0x2,%r8
  64:   74 7a                   je     db <nextafter+0xd7>
  66:   49 39 c0                cmp    %rax,%r8
  69:   66 49 0f 7e c0          movq   %xmm0,%r8
  6e:   48 8d 42 ff             lea    -0x1(%rdx),%rax
  72:   41 0f 93 c1             setae  %r9b
  76:   49 c1 e8 3f             shr    $0x3f,%r8
  7a:   48 83 c1 01             add    $0x1,%rcx
  7e:   45 38 c1                cmp    %r8b,%r9b
  81:   48 0f 44 c1             cmove  %rcx,%rax
  85:   48 89 c1                mov    %rax,%rcx
  88:   66 48 0f 6e f0          movq   %rax,%xmm6
  8d:   48 c1 e9 34             shr    $0x34,%rcx
  91:   81 e1 ff 07 00 00       and    $0x7ff,%ecx
  97:   81 f9 ff 07 00 00       cmp    $0x7ff,%ecx
  9d:   74 61                   je     ef <nextafter+0xef>
  9f:   85 c9                   test   %ecx,%ecx
  a1:   75 2b                   jne    c9 <nextafter+0xc9>
  a3:   66 48 0f 6e c2          movq   %rdx,%xmm0
  a8:   66 48 0f 6e c8          movq   %rax,%xmm1
  ad:   f2 0f 59 ce             mulsd  %xmm6,%xmm1
  b1:   f2 0f 59 c0             mulsd  %xmm0,%xmm0
  b5:   f2 0f 58 c1             addsd  %xmm1,%xmm0
  b9:   eb 0e                   jmp    c9 <nextafter+0xc9>
  bb:   66 48 0f 6e f2          movq   %rdx,%xmm6
  c0:   66 48 0f 6e d0          movq   %rax,%xmm2
  c5:   f2 0f 58 f2             addsd  %xmm2,%xmm6
  c9:   66 0f 28 c6             movapd %xmm6,%xmm0
  cd:   0f 28 74 24 20          movaps 0x20(%rsp),%xmm6
  d2:   48 83 c4 38             add    $0x38,%rsp
  d6:   c3                      retq
  d7:   48 85 c0                test   %rax,%rax
  da:   74 e9                   je     c9 <nextafter+0xc9>
  dc:   48 b8 00 00 00 00 00    movabs $0x8000000000000000,%rax
  e3:   00 00 80
  e6:   4c 21 d0                and    %r10,%rax
  ea:   48 83 c8 01             or     $0x1,%rax
  ed:   eb 8d                   jmp    85 <nextafter+0x85>
  ef:   66 48 0f 6e c2          movq   %rdx,%xmm0
  f4:   f2 0f 58 c0             addsd  %xmm0,%xmm0
  f8:   eb be                   jmp    c9 <nextafter+0xc9>

How do you compare these 60 instructions/252 bytes to the code I posted
(23 instructions/72 bytes)?

not amused about such HORRIBLE machine code!
Stefan

[-- Attachment #2: nextafter.patch --]
[-- Type: application/octet-stream, Size: 434 bytes --]

--- -/src/math/nextafter.c
+++ +/src/math/nextafter.c
@@ -10,13 +10,13 @@
 		return x + y;
 	if (ux.i == uy.i)
 		return y;
-	ax = ux.i & -1ULL/2;
-	ay = uy.i & -1ULL/2;
+	ax = ux.i << 2;
+	ay = uy.i << 2;
 	if (ax == 0) {
 		if (ay == 0)
 			return y;
 		ux.i = (uy.i & 1ULL<<63) | 1;
-	} else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))
+	} else if ((ax < ay) == ((int64_t) ux.i < 0))
 		ux.i--;
 	else
 		ux.i++;

next prev parent reply	other threads:[~2021-08-13 12:18 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10  6:23 [musl] [PATCH] " Stefan Kanthak
2021-08-10 21:34 ` Szabolcs Nagy
2021-08-10 22:53   ` Stefan Kanthak
2021-08-11  2:40     ` Rich Felker
2021-08-11 15:44       ` Stefan Kanthak
2021-08-11 16:09         ` Rich Felker
2021-08-11 16:50           ` Stefan Kanthak
2021-08-11 17:57             ` Rich Felker
2021-08-11 22:16               ` Szabolcs Nagy
2021-08-11 22:43                 ` Stefan Kanthak
2021-08-12  0:59                   ` Rich Felker
2021-08-11  8:23     ` Szabolcs Nagy
2021-08-13 12:04   ` Stefan Kanthak [this message]
2021-08-13 15:59     ` [musl] [PATCH #2] " Rich Felker
2021-08-13 18:30       ` Stefan Kanthak
2021-08-14  4:07         ` Damian McGuckin
2021-08-14 22:45           ` Szabolcs Nagy
2021-08-14 23:46     ` Szabolcs Nagy
2021-08-15  7:04       ` Stefan Kanthak
2021-08-15  7:46         ` Ariadne Conill
2021-08-15 13:59           ` Rich Felker
2021-08-15 14:57             ` Ariadne Conill
2021-08-15  8:24         ` Damian McGuckin
2021-08-15 14:03           ` Rich Felker
2021-08-15 15:10             ` Damian McGuckin
2021-08-15 14:56         ` Szabolcs Nagy
2021-08-15 15:19           ` Stefan Kanthak
2021-08-15 15:48             ` Rich Felker
2021-08-15 16:29               ` Stefan Kanthak
2021-08-15 16:49                 ` Rich Felker
2021-08-15 20:52                   ` Stefan Kanthak
2021-08-15 21:48                     ` Rich Felker
2021-08-15 15:52             ` Ariadne Conill
2021-08-15 16:09               ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E2423D1F1F3848848AEA933048174858@H270 \
    --to=stefan.kanthak@nexgo.de \
    --cc=musl@lists.openwall.com \
    --cc=nsz@port70.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).