From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14497 invoked from network); 3 Aug 2021 20:27:50 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 3 Aug 2021 20:27:50 -0000 Received: (qmail 9256 invoked by uid 550); 3 Aug 2021 20:27:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 9235 invoked from network); 3 Aug 2021 20:27:47 -0000 Date: Tue, 3 Aug 2021 22:27:35 +0200 From: Szabolcs Nagy To: Stefan Kanthak Cc: musl@lists.openwall.com Message-ID: <20210803202735.GA37904@port70.net> Mail-Followup-To: Stefan Kanthak , musl@lists.openwall.com References: <04BD4026EE364FF7AFBAF8C593E9A2E7@H270> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <04BD4026EE364FF7AFBAF8C593E9A2E7@H270> Subject: Re: [musl] [Patch] src/math/i386/remquo.s: remove conditional branch, shorter bit twiddling * Stefan Kanthak [2021-08-01 17:59:52 +0200]: > Halve the number of instructions (from 12 to 6) to fetch the > (3-bit partial) quotient from the FPU flags C0:C3:C1, and > perform its negation without conditional branch. i haven't tested it but it looks good. i think we should not tweak x87 asm code too much though. it can introduce bugs and there are not many users of it. i think only the size saving can justify keeping any i386 math code at all. but i'm not against committing this. thanks for the patch. > --- -/math/i386/remquo.s > +++ +/math/i386/remquo.s > @@ -2,49 +2,44 @@ > .type remquof,@function > remquof: > mov 12(%esp),%ecx > + mov 8(%esp),%eax > + xor 4(%esp),%eax > flds 8(%esp) > flds 4(%esp) > - mov 11(%esp),%dh > - xor 7(%esp),%dh > - jmp 1f > + jmp 0f > > .global remquol > .type remquol,@function > remquol: > mov 28(%esp),%ecx > + mov 24(%esp),%eax > + xor 12(%esp),%eax > + cwtl > fldt 16(%esp) > fldt 4(%esp) > - mov 25(%esp),%dh > - xor 13(%esp),%dh > - jmp 1f > + jmp 0f > > .global remquo > .type remquo,@function > remquo: > mov 20(%esp),%ecx > + mov 16(%esp),%eax > + xor 8(%esp),%eax > fldl 12(%esp) > fldl 4(%esp) > - mov 19(%esp),%dh > - xor 11(%esp),%dh > +0: cltd > 1: fprem1 > fnstsw %ax > sahf > jp 1b > fstp %st(1) > - mov %ah,%dl > - shr %dl > - and $1,%dl > - mov %ah,%al > - shr $5,%al > - and $2,%al > - or %al,%dl > - mov %ah,%al > - shl $2,%al > - and $4,%al > - or %al,%dl > - test %dh,%dh > - jns 1f > - neg %dl > -1: movsbl %dl,%edx > - mov %edx,(%ecx) > + adc %al,%al > + shl $2,%ah > + adc %al,%al > + shl $5,%ah > + adc %al,%al > + and $7,%eax > + xor %edx,%eax > + sub %edx,%eax > + mov %eax,(%ecx) > ret