From: Ariadne Conill <ariadne@dereferenced.org>
To: musl@lists.openwall.com
Cc: Szabolcs Nagy <nsz@port70.net>
Subject: Re: [musl] [PATCH #2] Properly simplified nextafter()
Date: Sun, 15 Aug 2021 02:46:58 -0500 (CDT) [thread overview]
Message-ID: <4272846f-eb89-2856-af9-38571037a924@dereferenced.org> (raw)
In-Reply-To: <367A4018B58A4E308E2A95404362CBFB@H270>
Hi,
On Sun, 15 Aug 2021, Stefan Kanthak wrote:
> Szabolcs Nagy <nsz@port70.net> wrote:
>
>> * Stefan Kanthak <stefan.kanthak@nexgo.de> [2021-08-13 14:04:51 +0200]:
>>> Szabolcs Nagy <nsz@port70.net> wrote on 2021-08-10 at 23:34:
>
>>>> (the i386 machine where i originally tested this preferred int
>>>> cmp and float cmp was very slow in the subnormal range
>>>
>>> This also and still holds for i386 FPU fadd/fmul as well as SSE
>>> addsd/addss/mulss/mulsd additions/multiplies!
>>
>> they are avoided in the common case, and only used to create
>> fenv side-effects.
>
> Unfortunately but for hard & SOFT-float, where no fenv exists, as
> Rich wrote.
My admittedly rudementary understanding of how soft-float is implemented
in musl leads me to believe that this doesn't really matter that much.
>>> --- -/src/math/nextafter.c
>>> +++ +/src/math/nextafter.c
>>> @@ -10,13 +10,13 @@
>>> return x + y;
>>> if (ux.i == uy.i)
>>> return y;
>>> - ax = ux.i & -1ULL/2;
>>> - ay = uy.i & -1ULL/2;
>>> + ax = ux.i << 2;
>>> + ay = uy.i << 2;
>>
>> the << 2 looks wrong, the top bit of the exponent is lost.
>
> It IS wrong, but only in the post, not in the code I tested.
So... in other words, you are testing code that is different than the code
you are submitting?
>
>>> if (ax == 0) {
>>> if (ay == 0)
>>> return y;
>>> ux.i = (uy.i & 1ULL<<63) | 1;
>>> - } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))
>>> + } else if ((ax < ay) == ((int64_t) ux.i < 0))
>>> ux.i--;
>>> else
>>> ux.i++;
>> ...
>>> How do you compare these 60 instructions/252 bytes to the code I posted
>>> (23 instructions/72 bytes)?
>>
>> you should benchmark, but the second best is to look
>> at the longest dependency chain in the hot path and
>> add up the instruction latencies.
>
> 1 billion calls to nextafter(), with random from, and to either 0 or +INF:
> run 1 against glibc, 8.58 ns/call
> run 2 against musl original, 3.59
> run 3 against musl patched, 0.52
> run 4 the pure floating-point variant from 0.72
> my initial post in this thread,
> run 5 the assembly variant I posted. 0.28 ns/call
>
> Now hurry up and patch your slowmotion code!
And how do these benchmarks look on non-x86 architectures, like aarch64 or
riscv64?
I would rather have a portable math library with functions that cost 3.59
nsec per call than one where the portable bits are not exercised on x86.
> Stefan
>
> PS: I cheated a very tiny little bit: the isnan() macro of musl patched is
>
> #ifdef PATCH
> #define isnan(x) ( \
> sizeof(x) == sizeof(float) ? (__FLOAT_BITS(x) << 1) > 0xff00000U : \
> sizeof(x) == sizeof(double) ? (__DOUBLE_BITS(x) << 1) > 0xffe0000000000000ULL : \
> __fpclassifyl(x) == FP_NAN)
> #else
> #define isnan(x) ( \
> sizeof(x) == sizeof(float) ? (__FLOAT_BITS(x) & 0x7fffffff) > 0x7f800000 : \
> sizeof(x) == sizeof(double) ? (__DOUBLE_BITS(x) & -1ULL>>1) > 0x7ffULL<<52 : \
> __fpclassifyl(x) == FP_NAN)
> #endif // PATCH
>
> PPS: and of course the log from the benchmarks...
>
> [stefan@rome ~]$ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 1
> NUMA node(s): 1
> Vendor ID: AuthenticAMD
> CPU family: 23
> Model: 49
> Model name: AMD EPYC 7262 8-Core Processor
> Stepping: 0
> CPU MHz: 3194.323
> BogoMIPS: 6388.64
> Virtualization: AMD-V
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 512K
> L3 cache: 16384K
> ...
> [stefan@rome ~]$ gcc --version
> gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
> Copyright (C) 2018 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
gcc 8 is quite old at this point. gcc 9 and 10 have much better
optimizers that are much more capable.
Indeed, on my system with GCC 10.3.1, nextafter() is using SSE2
instructions on Alpine x86_64, and if I rebuild musl with `-march=znver2`
it uses AVX instructions for nextafter(), which seems more than
sufficiently optimized to me.
Speaking in personal capacity only, I would rather musl's math routines
remain to the point instead of going down the rabbit hole of manually
optimized routines like glibc has done. That also includes
hand-optimizing the C routines to exploit optimal behavior for some
specific microarch. GCC already has knowledge of what optimizations are
good for a specific microarch (this is the whole point of `-march` and
`-mtune` after all), if something is missing, it should be fixed there.
Ariadne
next prev parent reply other threads:[~2021-08-15 7:47 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-10 6:23 [musl] [PATCH] " Stefan Kanthak
2021-08-10 21:34 ` Szabolcs Nagy
2021-08-10 22:53 ` Stefan Kanthak
2021-08-11 2:40 ` Rich Felker
2021-08-11 15:44 ` Stefan Kanthak
2021-08-11 16:09 ` Rich Felker
2021-08-11 16:50 ` Stefan Kanthak
2021-08-11 17:57 ` Rich Felker
2021-08-11 22:16 ` Szabolcs Nagy
2021-08-11 22:43 ` Stefan Kanthak
2021-08-12 0:59 ` Rich Felker
2021-08-11 8:23 ` Szabolcs Nagy
2021-08-13 12:04 ` [musl] [PATCH #2] " Stefan Kanthak
2021-08-13 15:59 ` Rich Felker
2021-08-13 18:30 ` Stefan Kanthak
2021-08-14 4:07 ` Damian McGuckin
2021-08-14 22:45 ` Szabolcs Nagy
2021-08-14 23:46 ` Szabolcs Nagy
2021-08-15 7:04 ` Stefan Kanthak
2021-08-15 7:46 ` Ariadne Conill [this message]
2021-08-15 13:59 ` Rich Felker
2021-08-15 14:57 ` Ariadne Conill
2021-08-15 8:24 ` Damian McGuckin
2021-08-15 14:03 ` Rich Felker
2021-08-15 15:10 ` Damian McGuckin
2021-08-15 14:56 ` Szabolcs Nagy
2021-08-15 15:19 ` Stefan Kanthak
2021-08-15 15:48 ` Rich Felker
2021-08-15 16:29 ` Stefan Kanthak
2021-08-15 16:49 ` Rich Felker
2021-08-15 20:52 ` Stefan Kanthak
2021-08-15 21:48 ` Rich Felker
2021-08-15 15:52 ` Ariadne Conill
2021-08-15 16:09 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4272846f-eb89-2856-af9-38571037a924@dereferenced.org \
--to=ariadne@dereferenced.org \
--cc=musl@lists.openwall.com \
--cc=nsz@port70.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).