From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/15012 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Stefan Kanthak" Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl() simplified Date: Wed, 11 Dec 2019 23:25:37 +0100 Organization: Me, myself & IT Message-ID: <19C2F5D4C9574B8D9ADFCBE84CA0BC97@H270> References: <557979287957451E9255CCBC4CD7CBE5@H270> <20191211104955.GN23985@port70.net> <5BF8FB2FE1AA418393E6091F7F8AFC14@H270> <20191211131659.GQ23985@port70.net> <438FA5B9D0E8497FA998E44E99C7CC8B@H270> <20191211213039.GP1666@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="211454"; mail-complaints-to="usenet@blaine.gmane.org" Cc: "Szabolcs Nagy" , To: "Rich Felker" Original-X-From: musl-return-15028-gllmg-musl=m.gmane.org@lists.openwall.com Wed Dec 11 23:36:49 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1ifAbB-000ssj-H3 for gllmg-musl@m.gmane.org; Wed, 11 Dec 2019 23:36:49 +0100 Original-Received: (qmail 13638 invoked by uid 550); 11 Dec 2019 22:36:47 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13620 invoked from network); 11 Dec 2019 22:36:46 -0000 In-Reply-To: <20191211213039.GP1666@brightrain.aerifal.cx> X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-VADE-STATUS: LEGIT Xref: news.gmane.org gmane.linux.lib.musl.general:15012 Archived-At: "Rich Felker" wrote: > On Wed, Dec 11, 2019 at 10:17:09PM +0100, Stefan Kanthak wrote: [...] >> PS: the following is just a "Gedankenspiel", extending the idea to >> avoid transfers from/to SSE registers. >> On x86-64, functions like isunordered(), copysign() etc. may be >> implemented using SSE intrinsics _mm_*() as follows: >> >> #include >> >> int signbit(double argument) >> { >> return /* 1 & */ _mm_movemask_pd(_mm_set_sd(argument)); >> } > > This is just a missed optimization the compiler should be able to do > without intrinsics, on any arch where floating point types are kept in > vector registers that can also do integer/bitmask operations. The catch here is but that the MOVMSKPD instruction generated from _mm_movemask_pd() intrinsic yields its result in an integer register, so there's no need to do integer/bitmask operations on vector registers (and transfer them to an integer register afterwards). >> uint32_t lrint(double argument) >> { >> return _mm_cvtsd_si32(_mm_set_sd(argument)); >> } > > This is already done (on x86_64 where it's valid). It's in an asm > source file This is exactly the cheating I address below: the prototype of the assembler function matches the ABI, but not the C declaration. > but should be converted to a C source file with __asm__ > and proper constraint, not intrinsics, because __asm__ is a compiler > feature we require support for and intrinsics aren't (and also they > have some really weird semantics with respect to how they interface > with C aliasing rules). That's why I introduced this only as a "Gedankenspiel"! >> double copysign(double magnitude, double sign) >> { >> return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(_mm_set_sd(-0.0), _mm_set_sd(sign)), >> _mm_andnot_pd(_mm_set_sd(-0.0), _mm_set_sd(magnitude)))); >> } > > I don't think we have one like this for x86_64, but ideally the C > would compile to something like it. (See above about missed > optimization.) Compilers typically emit superfluous PXOR/XORPD instructions here to clear the upper lane(s) of the vector registers, although _mm_*_sd() and _mm_*_ss() don't touch the upper lanes (so invalid values can't raise exceptions), and the bitmask operations _mm_*_pd() don't raise exceptions on SNANs, subnormals etc. >> Although the arguments and results are all held in SSE registers, >> there's no way to use them directly; it's but necessary to >> transfer them using _mm_set_sd() and _mm_cvtsd_f64(), which may >> result in superfluous instructions emitted by the compiler. > > I don't see why you say that. Just insert "in plain C" after "there's no way to use them directly" > They should be used in-place if possible just by virtue of how the > compiler's IR works. See above: most often XORPD or another instruction to clear/set the upper lane(s) is emitted. > Certainly for the __asm__ form they will be used in-place. Right. But that's the inline form of cheating.-) >> If you but cheat and "hide" these functions from the compiler >> by placing them in a library, you can implement them as follows: >> >> __m128d fmin(__m128d x, __m128d y) >> { >> __m128d mask = _mm_cmp_sd(x, x, _CMP_ORD_Q); >> >> return _mm_or_pd(_mm_and_pd(mask, _mm_min_sd(y, x)), >> _mm_andnot_pd(mask, y)); >> } > > Yes, this kind of thing (hacks with declaring functions with wrong > type to achieve an ABI result) is not something we really do in musl. > But it shouldn't be needed here. Remember that this is just a "Gedankenspiel". Stefan