From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/15010 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl() simplified Date: Wed, 11 Dec 2019 16:30:39 -0500 Message-ID: <20191211213039.GP1666@brightrain.aerifal.cx> References: <557979287957451E9255CCBC4CD7CBE5@H270> <20191211104955.GN23985@port70.net> <5BF8FB2FE1AA418393E6091F7F8AFC14@H270> <20191211131659.GQ23985@port70.net> <438FA5B9D0E8497FA998E44E99C7CC8B@H270> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="202792"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Szabolcs Nagy , musl@lists.openwall.com To: Stefan Kanthak Original-X-From: musl-return-15026-gllmg-musl=m.gmane.org@lists.openwall.com Wed Dec 11 22:31:10 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1if9Zc-000qbV-JW for gllmg-musl@m.gmane.org; Wed, 11 Dec 2019 22:31:08 +0100 Original-Received: (qmail 18307 invoked by uid 550); 11 Dec 2019 21:31:06 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 18287 invoked from network); 11 Dec 2019 21:31:05 -0000 Content-Disposition: inline In-Reply-To: <438FA5B9D0E8497FA998E44E99C7CC8B@H270> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:15010 Archived-At: On Wed, Dec 11, 2019 at 10:17:09PM +0100, Stefan Kanthak wrote: > "Szabolcs Nagy" wrote: > > > >* Stefan Kanthak [2019-12-11 13:33:44 +0100]: > >> "Szabolcs Nagy" wrote: > >> >* Stefan Kanthak [2019-12-11 10:55:29 +0100]: > >> > these two are not equivalent for snan input, but we dont care > >> > about snan, nor the compiler by default, so the compiler can > >> > optimize one to the other (although musl uses explicit int > >> > arithmetics instead of __builtin_isnan so it's a bit harder). > >> > >> The latter behaviour was my reason to use (x != x) here: I attempt > >> to replace as many function calls as possible with "normal" code, > >> and also try to avoid transfers to/from FPU/SSE registers to/from > >> integer registers if that does not result in faster/shorter code. > > > > why not just change the definition of isnan then? > > Because I did not want to introduce such a global change; until now my > patches are just local (peephole) optimisations. > > > #if __GNUC__ > xxx > > #define isnan(x) sizeof(x)==sizeof(float) ? __builtin_isnanf(x) : ... > > This is better than my proposed change, as it also avoids the side- > effect of (x != x) which can raise exceptions, and gets rid of the > explicit transfer to integer registers, which can hurt performance. > > The macros isinf(), isnormal(), isfinite(), signbit() should of > course be implemented in a similar way too, and the (internal only?) > functions __FLOAT_BITS() and __DOUBLE_BITS() removed completely! Not removed because the public headers support non-GNUC (or older GCC? I forget when these were introduced) compilers that may not provide these. Having the portable definitions present as the fallback case is still desirable. > PS: the following is just a "Gedankenspiel", extending the idea to > avoid transfers from/to SSE registers. > On x86-64, functions like isunordered(), copysign() etc. may be > implemented using SSE intrinsics _mm_*() as follows: > > #include > > int signbit(double argument) > { > return /* 1 & */ _mm_movemask_pd(_mm_set_sd(argument)); > } This is just a missed optimization the compiler should be able to do without intrinsics, on any arch where floating point types are kept in vector registers that can also do integer/bitmask operations. > uint32_t lrint(double argument) > { > return _mm_cvtsd_si32(_mm_set_sd(argument)); > } This is already done (on x86_64 where it's valid). It's in an asm source file but should be converted to a C source file with __asm__ and proper constraint, not intrinsics, because __asm__ is a compiler feature we require support for and intrinsics aren't (and also they have some really weird semantics with respect to how they interface with C aliasing rules). > double copysign(double magnitude, double sign) > { > return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(_mm_set_sd(-0.0), _mm_set_sd(sign)), > _mm_andnot_pd(_mm_set_sd(-0.0), _mm_set_sd(magnitude)))); > } I don't think we have one like this for x86_64, but ideally the C would compile to something like it. (See above about missed optimization.) > Although the arguments and results are all held in SSE registers, > there's no way to use them directly; it's but necessary to > transfer them using _mm_set_sd() and _mm_cvtsd_f64(), which may > result in superfluous instructions emitted by the compiler. I don't see why you say that. They should be used in-place if possible just by virtue of how the compiler's IR works. Certainly for the __asm__ form they will be used in-place. > If you but cheat and "hide" these functions from the compiler > by placing them in a library, you can implement them as follows: > > __m128d fmin(__m128d x, __m128d y) > { > __m128d mask = _mm_cmp_sd(x, x, _CMP_ORD_Q); > > return _mm_or_pd(_mm_and_pd(mask, _mm_min_sd(y, x)), > _mm_andnot_pd(mask, y)); > } Yes, this kind of thing (hacks with declaring functions with wrong type to achieve an ABI result) is not something we really do in musl. But it shouldn't be needed here. Rich