From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/15009 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Stefan Kanthak" Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl() simplified Date: Wed, 11 Dec 2019 22:17:09 +0100 Organization: Me, myself & IT Message-ID: <438FA5B9D0E8497FA998E44E99C7CC8B@H270> References: <557979287957451E9255CCBC4CD7CBE5@H270> <20191211104955.GN23985@port70.net> <5BF8FB2FE1AA418393E6091F7F8AFC14@H270> <20191211131659.GQ23985@port70.net> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="154913"; mail-complaints-to="usenet@blaine.gmane.org" To: "Szabolcs Nagy" , Original-X-From: musl-return-15025-gllmg-musl=m.gmane.org@lists.openwall.com Wed Dec 11 22:19:58 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1if9Oo-000e98-Hd for gllmg-musl@m.gmane.org; Wed, 11 Dec 2019 22:19:58 +0100 Original-Received: (qmail 11996 invoked by uid 550); 11 Dec 2019 21:19:56 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 11975 invoked from network); 11 Dec 2019 21:19:55 -0000 In-Reply-To: <20191211131659.GQ23985@port70.net> X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-VADE-STATUS: LEGIT Xref: news.gmane.org gmane.linux.lib.musl.general:15009 Archived-At: "Szabolcs Nagy" wrote: >* Stefan Kanthak [2019-12-11 13:33:44 +0100]: >> "Szabolcs Nagy" wrote: >> >* Stefan Kanthak [2019-12-11 10:55:29 +0100]: >> > these two are not equivalent for snan input, but we dont care >> > about snan, nor the compiler by default, so the compiler can >> > optimize one to the other (although musl uses explicit int >> > arithmetics instead of __builtin_isnan so it's a bit harder). >> >> The latter behaviour was my reason to use (x != x) here: I attempt >> to replace as many function calls as possible with "normal" code, >> and also try to avoid transfers to/from FPU/SSE registers to/from >> integer registers if that does not result in faster/shorter code. > > why not just change the definition of isnan then? Because I did not want to introduce such a global change; until now my patches are just local (peephole) optimisations. > #if __GNUC__ > xxx > #define isnan(x) sizeof(x)==sizeof(float) ? __builtin_isnanf(x) : ... This is better than my proposed change, as it also avoids the side- effect of (x != x) which can raise exceptions, and gets rid of the explicit transfer to integer registers, which can hurt performance. The macros isinf(), isnormal(), isfinite(), signbit() should of course be implemented in a similar way too, and the (internal only?) functions __FLOAT_BITS() and __DOUBLE_BITS() removed completely! regards Stefan PS: the following is just a "Gedankenspiel", extending the idea to avoid transfers from/to SSE registers. On x86-64, functions like isunordered(), copysign() etc. may be implemented using SSE intrinsics _mm_*() as follows: #include int signbit(double argument) { return /* 1 & */ _mm_movemask_pd(_mm_set_sd(argument)); } int isunordered(double a, double b) { #if 0 return _mm_comieq_sd(_mm_cmp_sd(_mm_set_sd(a), _mm_set_sd(b), _CMP_ORD_Q), _mm_set_sd(0.0)); #elif 0 return _mm_comineq_sd(_mm_set_sd(a), _mm_set_sd(a)) || _mm_comineq_sd(_mm_set_sd(b), _mm_set_sd(b)); #else return /* 1 & */ _mm_movemask_pd(_mm_cmp_sd(_mm_set_sd(a), _mm_set_sd(b), _CMP_UNORD_Q)); #endif } uint32_t lrint(double argument) { return _mm_cvtsd_si32(_mm_set_sd(argument)); } uint64_t llrint(double argument) { return _mm_cvtsd_si64(_mm_set_sd(argument)); } double copysign(double magnitude, double sign) { return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(_mm_set_sd(-0.0), _mm_set_sd(sign)), _mm_andnot_pd(_mm_set_sd(-0.0), _mm_set_sd(magnitude)))); } double fdim(double x, double y) { return _mm_cvtsd_f64(_mm_and_pd(_mm_cmp_sd(_mm_set_sd(x), _mm_set_sd(y), _CMP_NLE_US), _mm_sub_sd(_mm_set_sd(x), _mm_set_sd(y)))); } double fmax(double x, double y) { __m128d mask = _mm_cmp_sd(_mm_set_sd(x), _mm_set_sd(x), _CMP_ORD_Q); return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(mask, _mm_max_sd(_mm_set_sd(y), _mm_set_sd(x))), _mm_andnot_pd(mask, _mm_set_sd(y)))); } double fmin(double x, double y) { __m128d mask = _mm_cmp_sd(_mm_set_sd(x), _mm_set_sd(x), _CMP_ORD_Q); return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(mask, _mm_min_sd(_mm_set_sd(y), _mm_set_sd(x))), _mm_andnot_pd(mask, _mm_set_sd(y)))); } Although the arguments and results are all held in SSE registers, there's no way to use them directly; it's but necessary to transfer them using _mm_set_sd() and _mm_cvtsd_f64(), which may result in superfluous instructions emitted by the compiler. If you but cheat and "hide" these functions from the compiler by placing them in a library, you can implement them as follows: __m128d fmin(__m128d x, __m128d y) { __m128d mask = _mm_cmp_sd(x, x, _CMP_ORD_Q); return _mm_or_pd(_mm_and_pd(mask, _mm_min_sd(y, x)), _mm_andnot_pd(mask, y)); } .code ; Intel syntax fmin proc public movsd xmm2, xmm0 ; xmm2 = x cmpsd xmm2, xmm0, 7 ; xmm2 = (x != NAN) ? -1 : 0 movsd xmm3, xmm2 andnpd xmm3, xmm1 ; xmm3 = (x != NAN) ? 0.0 : y minsd xmm1, xmm0 ; xmm1 = (x < y) ? x : y ; = min(x, y) andpd xmm2, xmm1 ; xmm2 = (x != NAN) ? min(x, y) : 0.0 orpd xmm2, xmm3 ; xmm2 = (x != NAN) ? min(x, y) : y movsd xmm0, xmm2 ; xmm0 = fmin(x, y) ret fmin endp