From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14988 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: More patches for math subtree Date: Tue, 10 Dec 2019 14:35:58 -0500 Message-ID: <20191210193558.GK1666@brightrain.aerifal.cx> References: <2C3325A208DA4260A1A0F7B4517D6DFA@H270> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="51220"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) Cc: musl@lists.openwall.com To: Stefan Kanthak Original-X-From: musl-return-15004-gllmg-musl=m.gmane.org@lists.openwall.com Tue Dec 10 20:36:21 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1ielIy-000DCx-S2 for gllmg-musl@m.gmane.org; Tue, 10 Dec 2019 20:36:20 +0100 Original-Received: (qmail 6071 invoked by uid 550); 10 Dec 2019 19:36:18 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 6053 invoked from network); 10 Dec 2019 19:36:17 -0000 Content-Disposition: inline In-Reply-To: <2C3325A208DA4260A1A0F7B4517D6DFA@H270> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14988 Archived-At: On Tue, Dec 10, 2019 at 05:57:55PM +0100, Stefan Kanthak wrote: > Some more optimisations: the current implementations of ceil(), floor() > and trunc() for i386 change the rounding control using fldcw instructions, > which are SLOW; these patches provide faster and smaller branch-free (!) > implementations. > > JFTR: I'm NOT subscribed to your mailing list, so CC: me in replies! > > --- -/src/math/i386/floor.s > +++ +/src/math/i386/floor.s > @@ -1,67 +1,26 @@ > .global floorf > .type floorf,@function > floorf: > flds 4(%esp) > jmp 1f > > .global floorl > .type floorl,@function > floorl: > fldt 4(%esp) > jmp 1f > > .global floor > .type floor,@function > floor: > fldl 4(%esp) > +1: fld %st(0) > + frndint > + fxch %st(1) > + fucomip %st(1),%st(0) > + fld1 > + fldz > + fcmovb %st(1),%st(0) ^^^^^^ fcmovb is not in the baseline ISA. Otherwise, I *think* the idea of this patch looks good, provided I'm not missing anything with respect to how status flags are affected. As noted in the other email (sorry about not CC'ing you before; I've got you on CC now), I really want to get rid of all these .s files in favor of __asm__ statements with proper constraints in C source files. That makes them inlineable with LTO, and makes it possible for the compiler to select to use an instruction like fcmovb conditionally based on the targeted ISA level rather than having to do a .S file with hard-coded preprocessor conditionals. It also precludes x87 stack imbalance bugs like CVE-2019-14697, which make me really wary of manual changes to these files. Would you be interested in working on converting over the files you want to optimize (or even others too) to that form at the same time as doing the optimizations? It would really help with review process and with improving the overall code state. Rich