From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 16785 invoked from network); 28 Feb 2021 19:37:48 -0000
Received: from mother.openwall.net (195.42.179.200)
  by inbox.vuxu.org with ESMTPUTF8; 28 Feb 2021 19:37:48 -0000
Received: (qmail 9560 invoked by uid 550); 28 Feb 2021 19:37:45 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 9542 invoked from network); 28 Feb 2021 19:37:45 -0000
Date: Sun, 28 Feb 2021 20:37:33 +0100
From: Szabolcs Nagy <nsz@port70.net>
To: Mattias =?utf-8?Q?Andr=C3=A9e?= <maandree@kth.se>
Cc: musl@lists.openwall.com
Message-ID: <20210228193733.GF354034@port70.net>
Mail-Followup-To: Mattias =?utf-8?Q?Andr=C3=A9e?= <maandree@kth.se>,
	musl@lists.openwall.com
References: <20210228150912.1532943-1-maandree@kth.se>
 <20210228192210.1665554-1-maandree@kth.se>
 <20210228192210.1665554-2-maandree@kth.se>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <20210228192210.1665554-2-maandree@kth.se>
Subject: Re: [musl] [PATCH v2 2/2] Use modulo instead of mul+sub in
 __secs_to_tm

* Mattias Andr=C3=A9e <maandree@kth.se> [2021-02-28 20:22:10 +0100]:
> On x86 modulo is free when doing division, so this removes

there should be no division.

div by const is transformed to mul and shift at -O1 and
that's what we should be using instead of manual hacks.

https://godbolt.org/z/Wsxq5h

> a multiplication and at the cost of replacing a conditional
> move with a conditional jump, but it still appears to be
> faster.
> (Similar architectures: nds32le)
>=20
> ARM doesn't have modulo, instead an multiply-and-subtract
> operation is done after the division, so the diffence
> here is either none at all, or a move and a multiply-and-add
> being replaced with a multiply-and-subtract.
> (Similar architectures: or1k)
>=20
> RISC-V on the other hand has a separate modulo
> instruction and will perform a separate modulo instead of
> an assignment, a multiplication, and an addition with
> this change. GCC does change how the modulo operation is
> realised depending on the optimisation level. I don't know
> how this affects the performance, however a simple test on
> x86 suggests that doing a modulo operations is actually
> faster than assign=E2=80=93multiply=E2=80=93add.

did you benchmark with CFLAGS=3D-O2 or -Os ?

> ---
>  src/time/__secs_to_tm.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>=20
> diff --git a/src/time/__secs_to_tm.c b/src/time/__secs_to_tm.c
> index 62219df5..348e51ec 100644
> --- a/src/time/__secs_to_tm.c
> +++ b/src/time/__secs_to_tm.c
> @@ -39,16 +39,28 @@ int __secs_to_tm(long long t, struct tm *tm)
>  		qc_cycles--;
>  	}
> =20
> +#if 1
> +	c_cycles =3D remdays / DAYS_PER_100Y;
> +	remdays %=3D DAYS_PER_100Y;
> +	if (c_cycles =3D=3D 4) {
> +		remdays +=3D DAYS_PER_100Y;
> +		c_cycles--;
> +	}
> +#else
>  	c_cycles =3D remdays / DAYS_PER_100Y;
>  	if (c_cycles =3D=3D 4) c_cycles--;
>  	remdays -=3D c_cycles * DAYS_PER_100Y;
> +#endif
> =20
>  	q_cycles =3D remdays / DAYS_PER_4Y;
> -	remdays -=3D q_cycles * DAYS_PER_4Y;
> +	remdays %=3D DAYS_PER_4Y;
> =20
>  	remyears =3D remdays / 365;
> -	if (remyears =3D=3D 4) remyears--;
> -	remdays -=3D remyears * 365;
> +	remdays %=3D 365;
> +	if (remyears =3D=3D 4) {
> +		remdays +=3D 365;
> +		remyears--;
> +	}
> =20
>  	leap =3D !remyears && (q_cycles || !c_cycles);
>  	yday =3D remdays + 31 + 28 + leap;
> --=20
> 2.30.1