Re: [musl] [PATCH] Properly simplified nextafter()

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Rich Felker" <dalias@libc.org>
Cc: "Szabolcs Nagy" <nsz@port70.net>, <musl@lists.openwall.com>
Subject: Re: [musl] [PATCH] Properly simplified nextafter()
Date: Wed, 11 Aug 2021 17:44:28 +0200	[thread overview]
Message-ID: <7143269BEC424DE6A3B0218C4268C4C8@H270> (raw)
In-Reply-To: <20210811024010.GA13220@brightrain.aerifal.cx>

Rich Felker <dalias@libc.org> wrote:

> On Wed, Aug 11, 2021 at 12:53:37AM +0200, Stefan Kanthak wrote:
>> Szabolcs Nagy <nsz@port70.net> wrote:
>>
>>>* Stefan Kanthak <stefan.kanthak@nexgo.de> [2021-08-10 08:23:46 +0200]:
>>>> <https://git.musl-libc.org/cgit/musl/plain/src/math/nextafter.c>
>>>> has quite some superfluous statements:
>>>>
>>>> 1. there's absolutely no need for 2 uint64_t holding |x| and |y|;
>>>> 2. IEEE-754 specifies -0.0 == +0.0, so (x == y) is equivalent to
>>>>    (ax == 0) && (ay == 0): the latter 2 tests can be removed;
>>>
>>> you replaced 4 int cmps with 4 float cmps (among other things).
>>
>> and hinted that the result of the second pair of comparisions is
>> already known from the first pair.
>>
>>> it's target dependent if float compares are fast or not.
>>
>> It's also target dependent whether the floating-point registers
>> can be accessed by integer instructions, or need to be copied:
>> some win, some loose!
>> Just let the compiler/optimizer do its job!
>
> The values have been copied already to perform isnan,

NOT necessary: the compiler may have inlined isnan() and perform
the test for example using FXAM, FUCOM or FUCOMI on i386, or
UCOMISD on AMD64, without copying the arguments.
I recommend to inspect the code GCC generates for AMD64, for example.

> so continuing to access them does not incur any further cost.

Non sequitur: see above.

[...]

>> 0. Doesn't musl provide target specific routines for targets with
>>    soft FP?
>
> No, quite the opposite. Targets with hard fp and native insns for
> particular ops have target-specific versions,

That's why I assumed that this may also be the case for soft FP.

> but in general musl strongly prefers use of common implementation
> across all targets when there is not an obvious [nearly-]single-insn
> candidate for a specialized version.

That's one of the reason why I submitted this patch: FP hardware is
mainstream.

>> 1. If not: the compiler knows the target ABI and SHOULD generate
>>    the proper integer comparisions there.
>
> Here it would require the compiler to recognize that the nan case was
> already ruled out, and to special-case ±0 comparison on the
> representation. Of course this is possible in theory, but it's almost
> surely not happening now or any time soon. I'm pretty sure soft float
> targets just end up calling the libgcc function for floating point
> comparison if you do that.

|     if (isnan(x) || isnan(y))
|          return x + y;

The 4 instructions I mentioned above set flags for all cases: see
below.

>> The code is of course smaller ... but not as small and fast as a
>> proper i386 or AMD64 assembly implementation ... which I can
>> post upon request.
>
> Full asm functions are not wanted; it's something we're trying to get
> rid of in favor of just using very small/single-insn asm statements
> with proper constraints, where it's sufficiently beneficial to have
> asm at all. But I'm not even clear how you could make this function
> more efficient with asm. The overall logic would be exactly the same
> as the C. Maybe on x86_64 there'd be some SSE instructions to let you
> elide a few things?

No, just what the instruction set offers: 23 instructions in 72 bytes.

nextafter:
        comisd  xmm1, xmm0              # CF = (from > to)
        jp      .Lmxcsr                 # from or to INDEFINITE?
        je      .Lequal                 # from = to?
        sbb     rdx, rdx                # rdx = (from > to) ? -1 : 0
        movq    rcx, xmm0               # rcx = from
        mov     rax, rcx
        add     rax, rax                # CF = (from & -0.0)
        jz      .Lzero                  # from = ±0.0?
.Lstep:
        sbb     rax, rax                # rax = (from < 0.0) ? -1 : 0
        xor     rax, rdx                # rax = (from < 0.0) ^ (from > to) ? -1 : 0
        or      rax, 1                  # rax = (from < 0.0) ^ (from > to) ? -1 : 1
        add     rax, rcx                # rax = nextafter(from, to)
        movq    xmm0, rax               # xmm0 = nextafter(from, to)
        xorpd   xmm1, xmm1
.Lmxcsr:
        addsd   xmm0, xmm1              # set MXCSR flags
        ret
.Lequal:
        movsd   xmm0, xmm1              # xmm0 = to
        ret
.Lzero:
        movmskpd eax, xmm1              # rax = (to & -0.0) ? 0b?1 : 0b?0
        or      eax, 2                  # rax = (to & -0.0) ? 0b11 : 0b10
        ror     rax, 1                  # rax = (to & -0.0) ? 0x8000000000000001 : 1
        movq    xmm0, rax               # xmm0 = (to & -0.0) ? -0x1.0p-1074 : 0x1.0p-1074
        ret

GCC generates here at least 12 instructions more, also longer ones,
including 2 movabs to load 0x8000000000000000 and 0x7FFFFFFFFFFFFFFF,
so the code is more than 50% fatter, mixes integer SSE and FP SSE
instructions which incur 2 cycles penalty on many Intel CPUs, with
WAY TOO MANY not so predictable (un)conditional branches.

JFTR: it's almost always easy to beat the compiler!

Stefan

next prev parent reply	other threads:[~2021-08-11 15:49 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-10  6:23 Stefan Kanthak
2021-08-10 21:34 ` Szabolcs Nagy
2021-08-10 22:53   ` Stefan Kanthak
2021-08-11  2:40     ` Rich Felker
2021-08-11 15:44       ` Stefan Kanthak [this message]
2021-08-11 16:09         ` Rich Felker
2021-08-11 16:50           ` Stefan Kanthak
2021-08-11 17:57             ` Rich Felker
2021-08-11 22:16               ` Szabolcs Nagy
2021-08-11 22:43                 ` Stefan Kanthak
2021-08-12  0:59                   ` Rich Felker
2021-08-11  8:23     ` Szabolcs Nagy
2021-08-13 12:04   ` [musl] [PATCH #2] " Stefan Kanthak
2021-08-13 15:59     ` Rich Felker
2021-08-13 18:30       ` Stefan Kanthak
2021-08-14  4:07         ` Damian McGuckin
2021-08-14 22:45           ` Szabolcs Nagy
2021-08-14 23:46     ` Szabolcs Nagy
2021-08-15  7:04       ` Stefan Kanthak
2021-08-15  7:46         ` Ariadne Conill
2021-08-15 13:59           ` Rich Felker
2021-08-15 14:57             ` Ariadne Conill
2021-08-15  8:24         ` Damian McGuckin
2021-08-15 14:03           ` Rich Felker
2021-08-15 15:10             ` Damian McGuckin
2021-08-15 14:56         ` Szabolcs Nagy
2021-08-15 15:19           ` Stefan Kanthak
2021-08-15 15:48             ` Rich Felker
2021-08-15 16:29               ` Stefan Kanthak
2021-08-15 16:49                 ` Rich Felker
2021-08-15 20:52                   ` Stefan Kanthak
2021-08-15 21:48                     ` Rich Felker
2021-08-15 15:52             ` Ariadne Conill
2021-08-15 16:09               ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7143269BEC424DE6A3B0218C4268C4C8@H270 \
    --to=stefan.kanthak@nexgo.de \
    --cc=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=nsz@port70.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).