From: Rich Felker <dalias@libc.org>
To: Stefan Kanthak <stefan.kanthak@nexgo.de>
Cc: Szabolcs Nagy <nsz@port70.net>, musl@lists.openwall.com
Subject: Re: [musl] [PATCH #2] Properly simplified nextafter()
Date: Sun, 15 Aug 2021 11:48:44 -0400 [thread overview]
Message-ID: <20210815154843.GH13220@brightrain.aerifal.cx> (raw)
In-Reply-To: <1F3569BD7D6E45889B7518DC9BE5004B@H270>
On Sun, Aug 15, 2021 at 05:19:05PM +0200, Stefan Kanthak wrote:
> Szabolcs Nagy <nsz@port70.net> wrote:
>
> > * Stefan Kanthak <stefan.kanthak@nexgo.de> [2021-08-15 09:04:55 +0200]:
> >> Szabolcs Nagy <nsz@port70.net> wrote:
> >>> you should benchmark, but the second best is to look
> >>> at the longest dependency chain in the hot path and
> >>> add up the instruction latencies.
> >>
> >> 1 billion calls to nextafter(), with random from, and to either 0 or +INF:
> >> run 1 against glibc, 8.58 ns/call
> >> run 2 against musl original, 3.59
> >> run 3 against musl patched, 0.52
> >> run 4 the pure floating-point variant from 0.72
> >> my initial post in this thread,
> >> run 5 the assembly variant I posted. 0.28 ns/call
> >
> > thanks for the numbers. it's not the best measurment
>
> IF YOU DON'T LIKE IT, PERFORM YOUR OWN MEASUREMENT!
The burden of performing a meaningful measurement is on the party who
says there's something that needs to be changed.
> > but shows some interesting effects.
>
> It clearly shows that musl's current implementation SUCKS, at least
> on AMD64.
Hardly. According to you it's faster than glibc, and looks
sufficiently fast never to be a bottleneck.
> >> PS: I cheated a very tiny little bit: the isnan() macro of musl patched is
> >>
> >> #ifdef PATCH
> >> #define isnan(x) ( \
> >> sizeof(x) == sizeof(float) ? (__FLOAT_BITS(x) << 1) > 0xff00000U : \
> >> sizeof(x) == sizeof(double) ? (__DOUBLE_BITS(x) << 1) > 0xffe0000000000000ULL : \
> >> __fpclassifyl(x) == FP_NAN)
> >> #else
> >> #define isnan(x) ( \
> >> sizeof(x) == sizeof(float) ? (__FLOAT_BITS(x) & 0x7fffffff) > 0x7f800000 : \
> >> sizeof(x) == sizeof(double) ? (__DOUBLE_BITS(x) & -1ULL>>1) > 0x7ffULL<<52 : \
> >> __fpclassifyl(x) == FP_NAN)
> >> #endif // PATCH
> >
> > i think on x86 this only changes an and to an add
> > (or nothing at all if the compiler is smart)
>
> BETTER THINK TWICE: where does the mask needed for the and come from?
> Does it need an extra register?
> How do you (for example) build it on ARM?
>
> > if this is measurable that's an uarch issue of your cpu.
>
> ARGH: it's not the and that makes the difference!
>
> JFTR: movabs $0x7ff0000000000000, %r*x is a 10 byte instruction
> I recommend to read Intel's and AMD's processor optimisation
> manuals and learn just a little bit!
If you have a general reason (not specific to specific
microarchitectural considerartions) for why one form is preferred,
please state that from the beginning. I don't entirely understand your
argument here since in both the original version and yours, there's a
value on the RHS of the > operator that's in some sense nontrivial to
generate.
Ideally the compiler would be able to emit whichever form is preferred
for the target, since there's a clear transformation that can be made
either direction for this kind of thing. But since that's presently
not the case, if there's a version that can be expected, based on some
reasoning not just "what GCC happens to do", to be faster on most
targets, we should use that.
Rich
next prev parent reply other threads:[~2021-08-15 15:48 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-10 6:23 [musl] [PATCH] " Stefan Kanthak
2021-08-10 21:34 ` Szabolcs Nagy
2021-08-10 22:53 ` Stefan Kanthak
2021-08-11 2:40 ` Rich Felker
2021-08-11 15:44 ` Stefan Kanthak
2021-08-11 16:09 ` Rich Felker
2021-08-11 16:50 ` Stefan Kanthak
2021-08-11 17:57 ` Rich Felker
2021-08-11 22:16 ` Szabolcs Nagy
2021-08-11 22:43 ` Stefan Kanthak
2021-08-12 0:59 ` Rich Felker
2021-08-11 8:23 ` Szabolcs Nagy
2021-08-13 12:04 ` [musl] [PATCH #2] " Stefan Kanthak
2021-08-13 15:59 ` Rich Felker
2021-08-13 18:30 ` Stefan Kanthak
2021-08-14 4:07 ` Damian McGuckin
2021-08-14 22:45 ` Szabolcs Nagy
2021-08-14 23:46 ` Szabolcs Nagy
2021-08-15 7:04 ` Stefan Kanthak
2021-08-15 7:46 ` Ariadne Conill
2021-08-15 13:59 ` Rich Felker
2021-08-15 14:57 ` Ariadne Conill
2021-08-15 8:24 ` Damian McGuckin
2021-08-15 14:03 ` Rich Felker
2021-08-15 15:10 ` Damian McGuckin
2021-08-15 14:56 ` Szabolcs Nagy
2021-08-15 15:19 ` Stefan Kanthak
2021-08-15 15:48 ` Rich Felker [this message]
2021-08-15 16:29 ` Stefan Kanthak
2021-08-15 16:49 ` Rich Felker
2021-08-15 20:52 ` Stefan Kanthak
2021-08-15 21:48 ` Rich Felker
2021-08-15 15:52 ` Ariadne Conill
2021-08-15 16:09 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210815154843.GH13220@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=musl@lists.openwall.com \
--cc=nsz@port70.net \
--cc=stefan.kanthak@nexgo.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).