mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Stefan Kanthak <stefan.kanthak@nexgo.de>
Cc: Szabolcs Nagy <nsz@port70.net>, musl@lists.openwall.com
Subject: Re: [musl] [PATCH #2] Properly simplified nextafter()
Date: Fri, 13 Aug 2021 11:59:17 -0400	[thread overview]
Message-ID: <20210813155917.GE13220@brightrain.aerifal.cx> (raw)
In-Reply-To: <E2423D1F1F3848848AEA933048174858@H270>

On Fri, Aug 13, 2021 at 02:04:51PM +0200, Stefan Kanthak wrote:
> Szabolcs Nagy <nsz@port70.net> wrote on 2021-08-10 at 23:34:
> 
> >* Stefan Kanthak <stefan.kanthak@nexgo.de> [2021-08-10 08:23:46 +0200]:
> >> <https://git.musl-libc.org/cgit/musl/plain/src/math/nextafter.c>
> >> has quite some superfluous statements:
> >> 
> >> 1. there's absolutely no need for 2 uint64_t holding |x| and |y|;
> >> 2. IEEE-754 specifies -0.0 == +0.0, so (x == y) is equivalent to
> >>    (ax == 0) && (ay == 0): the latter 2 tests can be removed;
> > 
> > you replaced 4 int cmps with 4 float cmps (among other things).
> > 
> > it's target dependent if float compares are fast or not.
> 
> It's also target dependent whether the FP additions and multiplies
> used to raise overflow/underflow are SLOOOWWW: how can you justify
> them, especially for targets using soft-float?

On a target with fenv, raising the exception flags is mandatory and
something must be done to make that happen. Performing the arithmetic
is far faster than prodding at the status flag registers explicitly.

On a target without fenv (all the existing soft float targets), it
should be valid for the compiler just not to emit them, since they
have no side effects. I think we could make FORCE_EVAL expand to
nothing on such targets (if the FE_* macros are not defined). This is
probably worth doing.

(Note that FORCE_EVAL only exists because compilers are buggy and
don't respect the fact that floating point expressions have side
effects even if the result is not used; with a properly working
compiler, it would not be needed, and non-fenv targets would
automatically omit the no-side-effect code.)

> |        /* raise overflow if ux.f is infinite and x is finite */
> |        if (e == 0x7ff)
> |                FORCE_EVAL(x+x);
> |        /* raise underflow if ux.f is subnormal or zero */
> |        if (e == 0)
> |                FORCE_EVAL(x*x + ux.f*ux.f);
> 
> > (the i386 machine where i originally tested this preferred int
> > cmp and float cmp was very slow in the subnormal range
> 
> This also and still holds for i386 FPU fadd/fmul as well as SSE
> addsd/addss/mulss/mulsd additions/multiplies!

It may be possible to reduce the number of such ops too; not sure. But
there's no way to eliminate them.

> Second version:
> 
> --- -/src/math/nextafter.c
> +++ +/src/math/nextafter.c
> @@ -10,13 +10,13 @@
>                 return x + y;
>         if (ux.i == uy.i)
>                 return y;
> -       ax = ux.i & -1ULL/2;
> -       ay = uy.i & -1ULL/2;
> +       ax = ux.i << 2;
> +       ay = uy.i << 2;
>         if (ax == 0) {
>                 if (ay == 0)
>                         return y;
>                 ux.i = (uy.i & 1ULL<<63) | 1;
> -       } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))
> +       } else if ((ax < ay) == ((int64_t) ux.i < 0))
>                 ux.i--;
>         else
>                 ux.i++;
> 
> For AMD64, GCC generates the following ABSOLUTELY HORRIBLE CRAP
> (the original code compiles even worse):
> 
> 0000000000000000 <nextafter>:
>    0:   48 83 ec 38             sub    $0x38,%rsp
>    4:   0f 29 74 24 20          movaps %xmm6,0x20(%rsp)
>    9:   49 b8 ff ff ff ff ff    movabs $0x7fffffffffffffff,%r8
>   10:   ff ff 7f
>   13:   49 b9 00 00 00 00 00    movabs $0x7ff0000000000000,%r9
>   1a:   00 f0 7f
>   1d:   66 49 0f 7e c2          movq   %xmm0,%r10
>   22:   66 48 0f 7e c2          movq   %xmm0,%rdx
>   27:   66 48 0f 7e c8          movq   %xmm1,%rax
>   2c:   4d 21 c2                and    %r8,%r10
>   2f:   66 48 0f 7e c1          movq   %xmm0,%rcx
>   34:   4d 39 ca                cmp    %r9,%r10
>   37:   0f 87 83 00 00 00       ja     bb <nextafter+0xbb>
>   3d:   49 21 c0                and    %rax,%r8
>   40:   66 49 0f 7e ca          movq   %xmm1,%r10
>   45:   4d 39 c8                cmp    %r9,%r8
>   48:   77 76                   ja     bb <nextafter+0xbb>
>   4a:   66 0f 28 f1             movapd %xmm1,%xmm6
>   4e:   48 39 d0                cmp    %rdx,%rax
>   51:   74 7b                   je     c9 <nextafter+0xc9>
>   53:   66 49 0f 7e c0          movq   %xmm0,%r8
>   58:   48 8d 04 85 00 00 00    lea    0x0(,%rax,4),%rax
>   5f:   00
>   60:   49 c1 e0 02             shl    $0x2,%r8
>   64:   74 7a                   je     db <nextafter+0xd7>
>   66:   49 39 c0                cmp    %rax,%r8
>   69:   66 49 0f 7e c0          movq   %xmm0,%r8
>   6e:   48 8d 42 ff             lea    -0x1(%rdx),%rax
>   72:   41 0f 93 c1             setae  %r9b
>   76:   49 c1 e8 3f             shr    $0x3f,%r8
>   7a:   48 83 c1 01             add    $0x1,%rcx
>   7e:   45 38 c1                cmp    %r8b,%r9b
>   81:   48 0f 44 c1             cmove  %rcx,%rax
>   85:   48 89 c1                mov    %rax,%rcx
>   88:   66 48 0f 6e f0          movq   %rax,%xmm6
>   8d:   48 c1 e9 34             shr    $0x34,%rcx
>   91:   81 e1 ff 07 00 00       and    $0x7ff,%ecx
>   97:   81 f9 ff 07 00 00       cmp    $0x7ff,%ecx
>   9d:   74 61                   je     ef <nextafter+0xef>
>   9f:   85 c9                   test   %ecx,%ecx
>   a1:   75 2b                   jne    c9 <nextafter+0xc9>
>   a3:   66 48 0f 6e c2          movq   %rdx,%xmm0
>   a8:   66 48 0f 6e c8          movq   %rax,%xmm1
>   ad:   f2 0f 59 ce             mulsd  %xmm6,%xmm1
>   b1:   f2 0f 59 c0             mulsd  %xmm0,%xmm0
>   b5:   f2 0f 58 c1             addsd  %xmm1,%xmm0
>   b9:   eb 0e                   jmp    c9 <nextafter+0xc9>
>   bb:   66 48 0f 6e f2          movq   %rdx,%xmm6
>   c0:   66 48 0f 6e d0          movq   %rax,%xmm2
>   c5:   f2 0f 58 f2             addsd  %xmm2,%xmm6
>   c9:   66 0f 28 c6             movapd %xmm6,%xmm0
>   cd:   0f 28 74 24 20          movaps 0x20(%rsp),%xmm6
>   d2:   48 83 c4 38             add    $0x38,%rsp
>   d6:   c3                      retq
>   d7:   48 85 c0                test   %rax,%rax
>   da:   74 e9                   je     c9 <nextafter+0xc9>
>   dc:   48 b8 00 00 00 00 00    movabs $0x8000000000000000,%rax
>   e3:   00 00 80
>   e6:   4c 21 d0                and    %r10,%rax
>   ea:   48 83 c8 01             or     $0x1,%rax
>   ed:   eb 8d                   jmp    85 <nextafter+0x85>
>   ef:   66 48 0f 6e c2          movq   %rdx,%xmm0
>   f4:   f2 0f 58 c0             addsd  %xmm0,%xmm0
>   f8:   eb be                   jmp    c9 <nextafter+0xc9>
> 
> How do you compare these 60 instructions/252 bytes to the code I posted
> (23 instructions/72 bytes)?
> 
> not amused about such HORRIBLE machine code!

Then submit patched to GCC. We are not writing a libc in assembly
language just because of a compiler that can't generate sufficiently
good code.

Have you traced the different execution paths through the above to see
if the cases that matter (non-exceptional ones) are actually terribly
bad? I haven't trawled through it to do this, but that's the first
thing I'd do if I wanted to evaluate it. From there it's easier to see
what's actually going wrong (if anything) and use that to reason about
whether there are reasonable changes to be made to the C code that
would produce better output.

Rich

  reply	other threads:[~2021-08-13 15:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10  6:23 [musl] [PATCH] " Stefan Kanthak
2021-08-10 21:34 ` Szabolcs Nagy
2021-08-10 22:53   ` Stefan Kanthak
2021-08-11  2:40     ` Rich Felker
2021-08-11 15:44       ` Stefan Kanthak
2021-08-11 16:09         ` Rich Felker
2021-08-11 16:50           ` Stefan Kanthak
2021-08-11 17:57             ` Rich Felker
2021-08-11 22:16               ` Szabolcs Nagy
2021-08-11 22:43                 ` Stefan Kanthak
2021-08-12  0:59                   ` Rich Felker
2021-08-11  8:23     ` Szabolcs Nagy
2021-08-13 12:04   ` [musl] [PATCH #2] " Stefan Kanthak
2021-08-13 15:59     ` Rich Felker [this message]
2021-08-13 18:30       ` Stefan Kanthak
2021-08-14  4:07         ` Damian McGuckin
2021-08-14 22:45           ` Szabolcs Nagy
2021-08-14 23:46     ` Szabolcs Nagy
2021-08-15  7:04       ` Stefan Kanthak
2021-08-15  7:46         ` Ariadne Conill
2021-08-15 13:59           ` Rich Felker
2021-08-15 14:57             ` Ariadne Conill
2021-08-15  8:24         ` Damian McGuckin
2021-08-15 14:03           ` Rich Felker
2021-08-15 15:10             ` Damian McGuckin
2021-08-15 14:56         ` Szabolcs Nagy
2021-08-15 15:19           ` Stefan Kanthak
2021-08-15 15:48             ` Rich Felker
2021-08-15 16:29               ` Stefan Kanthak
2021-08-15 16:49                 ` Rich Felker
2021-08-15 20:52                   ` Stefan Kanthak
2021-08-15 21:48                     ` Rich Felker
2021-08-15 15:52             ` Ariadne Conill
2021-08-15 16:09               ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210813155917.GE13220@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=nsz@port70.net \
    --cc=stefan.kanthak@nexgo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).