From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14995 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: "Stefan Kanthak" Newsgroups: gmane.linux.lib.musl.general Subject: Re: More patches for math subtree Date: Wed, 11 Dec 2019 10:53:41 +0100 Organization: Me, myself & IT Message-ID: References: <2C3325A208DA4260A1A0F7B4517D6DFA@H270> <20191210193558.GK1666@brightrain.aerifal.cx> <20191210221738.GL1666@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="143399"; mail-complaints-to="usenet@blaine.gmane.org" Cc: To: "Rich Felker" Original-X-From: musl-return-15011-gllmg-musl=m.gmane.org@lists.openwall.com Wed Dec 11 10:58:41 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1ieylS-000b8v-C7 for gllmg-musl@m.gmane.org; Wed, 11 Dec 2019 10:58:38 +0100 Original-Received: (qmail 3564 invoked by uid 550); 11 Dec 2019 09:58:35 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 3534 invoked from network); 11 Dec 2019 09:58:34 -0000 In-Reply-To: <20191210221738.GL1666@brightrain.aerifal.cx> X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-VADE-STATUS: LEGIT Xref: news.gmane.org gmane.linux.lib.musl.general:14995 Archived-At: "Rich Felker" wrote: > On Tue, Dec 10, 2019 at 10:32:26PM +0100, Stefan Kanthak wrote: [ asm vs. C ] >> Does any compiler emit branch-free instruction sequences like the >> following for Intel CPUs without SSE4.1, i.e. without ROUNDSS/ROUNDSD? >> >> .code ; Intel syntax >> ceil proc public >> extern __real@8000000000000000:real8 >> movsd xmm1, __real@8000000000000000 >> extern __real@3ff0000000000000:real8 >> movsd xmm2, __real@3ff0000000000000 >> extern __real@4330000000000000:real8 >> movsd xmm3, __real@4330000000000000 >> movsd xmm4, xmm1 >> andnpd xmm1, xmm0 >> andpd xmm4, xmm0 >> cmpltsd xmm1, xmm3 >> andpd xmm1, xmm3 >> orpd xmm1, xmm4 >> movsd xmm3, xmm0 >> addsd xmm0, xmm1 >> subsd xmm0, xmm1 >> movsd xmm1, xmm0 >> cmpltsd xmm0, xmm3 >> andpd xmm0, xmm2 >> addsd xmm0, xmm1 >> orpd xmm0, xmm4 >> ret >> ceil endp >> >> Or instruction sequences like >> >> .code ; Intel syntax >> copysign proc public >> movd rcx, xmm0 >> movd rdx, xmm1 >> shld rcx, rdx, 1 >> ror rcx, 1 >> movd xmm0, rcx >> ret >> copysign endp > > Not quite (but it might be possible to write the C in terms of shifts > instead of masks such that it does), but I also don't think it's clear > which version is better. Yours here is mildly smaller and might > perform better, but when making changes that aren't clearly better > there should be some evidence that it's actually an improvement -- > especially if it's not just improving existing arch optimizations but > adding new ones where the C was formerly used. Correct. I expect the compiler to emit such properly optimised code instead of calls to the library for standard functions like copysign(), fdim(), etc. which can be written with just a few instructions ... what the compiler but not (always) does. JFTR: I don't know whether GCC or clang either provide intrinsics or __builtin_* for such (or all those) small standard functions. > Generally musl avoids asm and arch-specific files as much as possible, > using them only for things that aren't representable in C or where > the C is a lot larger or slower or both. > >> .code ; Intel syntax >> fdim proc public >> movsd xmm2, xmm0 >> cmpsd xmm0, xmm1, 6 >> subsd xmm2, xmm1 >> andpd xmm0, xmm2 >> ret >> fdim endp > > Does this handle nans correctly? Of course! It's equivalent to double fdim(double a, double b) { uint64_t mask = (a <= b) ? 0ull : ~0ull; union {double dbl; uint64_t ull;} u = {a - b}; u.ull &= mask; return u.dbl; } [...] > OK. I don't mind looking at these patches further as-is, and I'll try > to continue offering constructive comments now, but it'll be after > this release cycle (hopefully wrapping that up in the next week or so) > before consideration for merging. musl 1.2.0 is already going to be a > release with big changes (time64) and I don't want to risk subtle > breakage with new changes that haven't been reviewed in detail yet or > had time for users to test. That's OK. Stefan