From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Szabolcs Nagy" <nsz@port70.net>, <musl@lists.openwall.com>
Subject: Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl() simplified
Date: Wed, 11 Dec 2019 22:17:09 +0100 [thread overview]
Message-ID: <438FA5B9D0E8497FA998E44E99C7CC8B@H270> (raw)
In-Reply-To: <20191211131659.GQ23985@port70.net>
"Szabolcs Nagy" <nsz@port70.net> wrote:
>* Stefan Kanthak <stefan.kanthak@nexgo.de> [2019-12-11 13:33:44 +0100]:
>> "Szabolcs Nagy" <nsz@port70.net> wrote:
>> >* Stefan Kanthak <stefan.kanthak@nexgo.de> [2019-12-11 10:55:29 +0100]:
>> > these two are not equivalent for snan input, but we dont care
>> > about snan, nor the compiler by default, so the compiler can
>> > optimize one to the other (although musl uses explicit int
>> > arithmetics instead of __builtin_isnan so it's a bit harder).
>>
>> The latter behaviour was my reason to use (x != x) here: I attempt
>> to replace as many function calls as possible with "normal" code,
>> and also try to avoid transfers to/from FPU/SSE registers to/from
>> integer registers if that does not result in faster/shorter code.
>
> why not just change the definition of isnan then?
Because I did not want to introduce such a global change; until now my
patches are just local (peephole) optimisations.
> #if __GNUC__ > xxx
> #define isnan(x) sizeof(x)==sizeof(float) ? __builtin_isnanf(x) : ...
This is better than my proposed change, as it also avoids the side-
effect of (x != x) which can raise exceptions, and gets rid of the
explicit transfer to integer registers, which can hurt performance.
The macros isinf(), isnormal(), isfinite(), signbit() should of
course be implemented in a similar way too, and the (internal only?)
functions __FLOAT_BITS() and __DOUBLE_BITS() removed completely!
regards
Stefan
PS: the following is just a "Gedankenspiel", extending the idea to
avoid transfers from/to SSE registers.
On x86-64, functions like isunordered(), copysign() etc. may be
implemented using SSE intrinsics _mm_*() as follows:
#include <immintrin.h>
int signbit(double argument)
{
return /* 1 & */ _mm_movemask_pd(_mm_set_sd(argument));
}
int isunordered(double a, double b)
{
#if 0
return _mm_comieq_sd(_mm_cmp_sd(_mm_set_sd(a), _mm_set_sd(b), _CMP_ORD_Q), _mm_set_sd(0.0));
#elif 0
return _mm_comineq_sd(_mm_set_sd(a), _mm_set_sd(a))
|| _mm_comineq_sd(_mm_set_sd(b), _mm_set_sd(b));
#else
return /* 1 & */ _mm_movemask_pd(_mm_cmp_sd(_mm_set_sd(a), _mm_set_sd(b), _CMP_UNORD_Q));
#endif
}
uint32_t lrint(double argument)
{
return _mm_cvtsd_si32(_mm_set_sd(argument));
}
uint64_t llrint(double argument)
{
return _mm_cvtsd_si64(_mm_set_sd(argument));
}
double copysign(double magnitude, double sign)
{
return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(_mm_set_sd(-0.0), _mm_set_sd(sign)),
_mm_andnot_pd(_mm_set_sd(-0.0), _mm_set_sd(magnitude))));
}
double fdim(double x, double y)
{
return _mm_cvtsd_f64(_mm_and_pd(_mm_cmp_sd(_mm_set_sd(x), _mm_set_sd(y), _CMP_NLE_US),
_mm_sub_sd(_mm_set_sd(x), _mm_set_sd(y))));
}
double fmax(double x, double y)
{
__m128d mask = _mm_cmp_sd(_mm_set_sd(x), _mm_set_sd(x), _CMP_ORD_Q);
return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(mask, _mm_max_sd(_mm_set_sd(y), _mm_set_sd(x))),
_mm_andnot_pd(mask, _mm_set_sd(y))));
}
double fmin(double x, double y)
{
__m128d mask = _mm_cmp_sd(_mm_set_sd(x), _mm_set_sd(x), _CMP_ORD_Q);
return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(mask, _mm_min_sd(_mm_set_sd(y), _mm_set_sd(x))),
_mm_andnot_pd(mask, _mm_set_sd(y))));
}
Although the arguments and results are all held in SSE registers,
there's no way to use them directly; it's but necessary to
transfer them using _mm_set_sd() and _mm_cvtsd_f64(), which may
result in superfluous instructions emitted by the compiler.
If you but cheat and "hide" these functions from the compiler
by placing them in a library, you can implement them as follows:
__m128d fmin(__m128d x, __m128d y)
{
__m128d mask = _mm_cmp_sd(x, x, _CMP_ORD_Q);
return _mm_or_pd(_mm_and_pd(mask, _mm_min_sd(y, x)),
_mm_andnot_pd(mask, y));
}
.code ; Intel syntax
fmin proc public
movsd xmm2, xmm0 ; xmm2 = x
cmpsd xmm2, xmm0, 7 ; xmm2 = (x != NAN) ? -1 : 0
movsd xmm3, xmm2
andnpd xmm3, xmm1 ; xmm3 = (x != NAN) ? 0.0 : y
minsd xmm1, xmm0 ; xmm1 = (x < y) ? x : y
; = min(x, y)
andpd xmm2, xmm1 ; xmm2 = (x != NAN) ? min(x, y) : 0.0
orpd xmm2, xmm3 ; xmm2 = (x != NAN) ? min(x, y) : y
movsd xmm0, xmm2 ; xmm0 = fmin(x, y)
ret
fmin endp
next prev parent reply other threads:[~2019-12-11 21:17 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-11 9:55 Stefan Kanthak
2019-12-11 10:49 ` Szabolcs Nagy
2019-12-11 12:33 ` Stefan Kanthak
2019-12-11 13:16 ` Szabolcs Nagy
2019-12-11 13:25 ` Rich Felker
2019-12-11 21:17 ` Stefan Kanthak [this message]
2019-12-11 21:30 ` Rich Felker
2019-12-11 22:25 ` Stefan Kanthak
2019-12-11 22:14 ` Damian McGuckin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=438FA5B9D0E8497FA998E44E99C7CC8B@H270 \
--to=stefan.kanthak@nexgo.de \
--cc=musl@lists.openwall.com \
--cc=nsz@port70.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).