From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 16785 invoked from network); 28 Feb 2021 19:37:48 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 28 Feb 2021 19:37:48 -0000 Received: (qmail 9560 invoked by uid 550); 28 Feb 2021 19:37:45 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 9542 invoked from network); 28 Feb 2021 19:37:45 -0000 Date: Sun, 28 Feb 2021 20:37:33 +0100 From: Szabolcs Nagy To: Mattias =?utf-8?Q?Andr=C3=A9e?= Cc: musl@lists.openwall.com Message-ID: <20210228193733.GF354034@port70.net> Mail-Followup-To: Mattias =?utf-8?Q?Andr=C3=A9e?= , musl@lists.openwall.com References: <20210228150912.1532943-1-maandree@kth.se> <20210228192210.1665554-1-maandree@kth.se> <20210228192210.1665554-2-maandree@kth.se> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20210228192210.1665554-2-maandree@kth.se> Subject: Re: [musl] [PATCH v2 2/2] Use modulo instead of mul+sub in __secs_to_tm * Mattias Andr=C3=A9e [2021-02-28 20:22:10 +0100]: > On x86 modulo is free when doing division, so this removes there should be no division. div by const is transformed to mul and shift at -O1 and that's what we should be using instead of manual hacks. https://godbolt.org/z/Wsxq5h > a multiplication and at the cost of replacing a conditional > move with a conditional jump, but it still appears to be > faster. > (Similar architectures: nds32le) >=20 > ARM doesn't have modulo, instead an multiply-and-subtract > operation is done after the division, so the diffence > here is either none at all, or a move and a multiply-and-add > being replaced with a multiply-and-subtract. > (Similar architectures: or1k) >=20 > RISC-V on the other hand has a separate modulo > instruction and will perform a separate modulo instead of > an assignment, a multiplication, and an addition with > this change. GCC does change how the modulo operation is > realised depending on the optimisation level. I don't know > how this affects the performance, however a simple test on > x86 suggests that doing a modulo operations is actually > faster than assign=E2=80=93multiply=E2=80=93add. did you benchmark with CFLAGS=3D-O2 or -Os ? > --- > src/time/__secs_to_tm.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) >=20 > diff --git a/src/time/__secs_to_tm.c b/src/time/__secs_to_tm.c > index 62219df5..348e51ec 100644 > --- a/src/time/__secs_to_tm.c > +++ b/src/time/__secs_to_tm.c > @@ -39,16 +39,28 @@ int __secs_to_tm(long long t, struct tm *tm) > qc_cycles--; > } > =20 > +#if 1 > + c_cycles =3D remdays / DAYS_PER_100Y; > + remdays %=3D DAYS_PER_100Y; > + if (c_cycles =3D=3D 4) { > + remdays +=3D DAYS_PER_100Y; > + c_cycles--; > + } > +#else > c_cycles =3D remdays / DAYS_PER_100Y; > if (c_cycles =3D=3D 4) c_cycles--; > remdays -=3D c_cycles * DAYS_PER_100Y; > +#endif > =20 > q_cycles =3D remdays / DAYS_PER_4Y; > - remdays -=3D q_cycles * DAYS_PER_4Y; > + remdays %=3D DAYS_PER_4Y; > =20 > remyears =3D remdays / 365; > - if (remyears =3D=3D 4) remyears--; > - remdays -=3D remyears * 365; > + remdays %=3D 365; > + if (remyears =3D=3D 4) { > + remdays +=3D 365; > + remyears--; > + } > =20 > leap =3D !remyears && (q_cycles || !c_cycles); > yday =3D remdays + 31 + 28 + leap; > --=20 > 2.30.1