mailing list of musl libc
 help / color / mirror / code / Atom feed
From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Rich Felker" <dalias@libc.org>
Cc: "Szabolcs Nagy" <nsz@port70.net>, <musl@lists.openwall.com>
Subject: Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl() simplified
Date: Wed, 11 Dec 2019 23:25:37 +0100	[thread overview]
Message-ID: <19C2F5D4C9574B8D9ADFCBE84CA0BC97@H270> (raw)
In-Reply-To: <20191211213039.GP1666@brightrain.aerifal.cx>

"Rich Felker" <dalias@libc.org> wrote:
> On Wed, Dec 11, 2019 at 10:17:09PM +0100, Stefan Kanthak wrote:

[...]

>> PS: the following is just a "Gedankenspiel", extending the idea to
>>     avoid transfers from/to SSE registers.
>>     On x86-64, functions like isunordered(), copysign() etc. may be
>>     implemented using SSE intrinsics _mm_*() as follows:
>> 
>> #include <immintrin.h>
>> 
>> int signbit(double argument)
>> {
>>     return /* 1 & */ _mm_movemask_pd(_mm_set_sd(argument));
>> }
> 
> This is just a missed optimization the compiler should be able to do
> without intrinsics, on any arch where floating point types are kept in
> vector registers that can also do integer/bitmask operations.

The catch here is but that the MOVMSKPD instruction generated from
_mm_movemask_pd() intrinsic yields its result in an integer register,
so there's no need to do integer/bitmask operations on vector
registers (and transfer them to an integer register afterwards).

>> uint32_t lrint(double argument)
>> {
>>     return _mm_cvtsd_si32(_mm_set_sd(argument));
>> }
> 
> This is already done (on x86_64 where it's valid). It's in an asm
> source file

This is exactly the cheating I address below: the prototype of the
assembler function matches the ABI, but not the C declaration.

> but should be converted to a C source file with __asm__
> and proper constraint, not intrinsics, because __asm__ is a compiler
> feature we require support for and intrinsics aren't (and also they
> have some really weird semantics with respect to how they interface
> with C aliasing rules).

That's why I introduced this only as a "Gedankenspiel"!

>> double copysign(double magnitude, double sign)
>> {
>>     return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(_mm_set_sd(-0.0), _mm_set_sd(sign)),
>>                                    _mm_andnot_pd(_mm_set_sd(-0.0), _mm_set_sd(magnitude))));
>> }
> 
> I don't think we have one like this for x86_64, but ideally the C
> would compile to something like it. (See above about missed
> optimization.)

Compilers typically emit superfluous PXOR/XORPD instructions here to
clear the upper lane(s) of the vector registers, although _mm_*_sd()
and _mm_*_ss() don't touch the upper lanes (so invalid values can't
raise exceptions), and the bitmask operations _mm_*_pd() don't raise
exceptions on SNANs, subnormals etc.

>>     Although the arguments and results are all held in SSE registers,
>>     there's no way to use them directly; it's but necessary to
>>     transfer them using _mm_set_sd() and _mm_cvtsd_f64(), which may
>>     result in superfluous instructions emitted by the compiler.
> 
> I don't see why you say that.

Just insert "in plain C" after "there's no way to use them directly"

> They should be used in-place if possible just by virtue of how the
> compiler's IR works.

See above: most often XORPD or another instruction to clear/set the
upper lane(s) is emitted.

> Certainly for the __asm__ form they will be used in-place.

Right. But that's the inline form of cheating.-)

>>     If you but cheat and "hide" these functions from the compiler
>>     by placing them in a library, you can implement them as follows:
>> 
>> __m128d fmin(__m128d x, __m128d y)
>> {
>>     __m128d mask = _mm_cmp_sd(x, x, _CMP_ORD_Q);
>> 
>>     return _mm_or_pd(_mm_and_pd(mask, _mm_min_sd(y, x)),
>>                      _mm_andnot_pd(mask, y));
>> }
> 
> Yes, this kind of thing (hacks with declaring functions with wrong
> type to achieve an ABI result) is not something we really do in musl.
> But it shouldn't be needed here.

Remember that this is just a "Gedankenspiel".

Stefan


  reply	other threads:[~2019-12-11 22:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11  9:55 Stefan Kanthak
2019-12-11 10:49 ` Szabolcs Nagy
2019-12-11 12:33   ` Stefan Kanthak
2019-12-11 13:16     ` Szabolcs Nagy
2019-12-11 13:25       ` Rich Felker
2019-12-11 21:17       ` Stefan Kanthak
2019-12-11 21:30         ` Rich Felker
2019-12-11 22:25           ` Stefan Kanthak [this message]
2019-12-11 22:14         ` Damian McGuckin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19C2F5D4C9574B8D9ADFCBE84CA0BC97@H270 \
    --to=stefan.kanthak@nexgo.de \
    --cc=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=nsz@port70.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).