From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/15091 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] math: move x86_64 fabs, fabsf to C with inline asm Date: Sun, 5 Jan 2020 17:43:54 -0500 Message-ID: <20200105224354.GN30412@brightrain.aerifal.cx> References: <20200105163639.25963-1-amonakov@ispras.ru> <20200105200541.GM30412@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="5480"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-15107-gllmg-musl=m.gmane.org@lists.openwall.com Sun Jan 05 23:44:10 2020 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1ioEcz-0001Bv-Fv for gllmg-musl@m.gmane.org; Sun, 05 Jan 2020 23:44:09 +0100 Original-Received: (qmail 25976 invoked by uid 550); 5 Jan 2020 22:44:06 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 25958 invoked from network); 5 Jan 2020 22:44:06 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:15091 Archived-At: On Mon, Jan 06, 2020 at 12:32:38AM +0300, Alexander Monakov wrote: > > > On Sun, 5 Jan 2020, Rich Felker wrote: > > > On Sun, Jan 05, 2020 at 07:36:39PM +0300, Alexander Monakov wrote: > > > --- > > > > > > Questions: > > > > > > Why are there amd64-specific fabs implementations in the first place? > > > (Only) because GCC generated poor code for the generic C version? > > > > I think so. It generates: > [snip] > > *nod* In my eyes that's a missed optimization, but one that is probably not > going to be fully fixed anytime soon, although for the particular case of > generic fabs gcc-9 has improved: > > movq %xmm0, %rax > btrq $63, %rax > movq %rax, %xmm0 > > On Aarch64 GCC seems to do better with float bit manipulations (can emit code > that does them on vector registers directly without copying to/from general > registers). On x86 LLVM compiles fabs well, but not copysign. > > (ideally the language would allow to express bit manipulations of floats > directly, then compilers probably would have better support as well) > > FWIW GCC bugreport is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039 > but I'm not holding my breath. > > By this logic, specialized implementations of copysign are also desirable, > right? (2 instructions longer than fabs, except for long double) I'm not sure if "this logic" carries over. fabs is a common operation (ideally compiler would inline it anyway in the caller, though). copysign not so much. Really I'm not even sure it makes sense to have the asm here at all for fabs either, but perhaps with the gratuitous stack access in the older-GCC version it does...? > > > Do annotations for mask manipulation in the patch help? Any way to make > > > them less ambiguous? > > > > I think so. I like how you did individual asm statements with > > dependency relationship between them so compiler could even schedule > > them if it likes. I wonder if you could just write 0x7fffffffffffffff > > as an operand and have the compiler load it, though. > > In this case the mask is so simple that building it with pcmpeq-psrl is cheaper > than loading from memory or moving from a general register. So not using an > immediate is intentional. OK, I was figuring the compiler might be able to generate it easily with vector insns if there were no "non-vector" arithmetic/bitwise ops involved in the use of the result, but that's probably expecting too much... Rich