Re: [PATCH] math: optimize lrint on 32bit targets

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [PATCH] math: optimize lrint on 32bit targets
Date: Mon, 23 Sep 2019 13:40:29 -0400	[thread overview]
Message-ID: <20190923174029.GN9017@brightrain.aerifal.cx> (raw)
In-Reply-To: <20190922204335.GC22009@port70.net>

On Sun, Sep 22, 2019 at 10:43:35PM +0200, Szabolcs Nagy wrote:
> * Szabolcs Nagy <nsz@port70.net> [2019-09-21 17:52:35 +0200]:
> > this was discussed on irc.
> 
> did more benchmarks, on i486 branches seem better
> than setting the sign bit but on arm branch is
> worse so i keep the original code, just changed
> the code style (asuint macro instead of union).
> 

> >From 67990a5c85fc5db55831f9ddddc58317e5b344b6 Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@port70.net>
> Date: Mon, 16 Sep 2019 20:33:11 +0000
> Subject: [PATCH] math: optimize lrint on 32bit targets
> 
> lrint in (LONG_MAX, 1/DBL_EPSILON) and in (-1/DBL_EPSILON, LONG_MIN)
> is not trivial: rounding to int may be inexact, but the conversion to
> int may overflow and then the inexact flag must not be raised. (the
> overflow threshold is rounding mode dependent).
> 
> this matters on 32bit targets (without single instruction lrint or
> rint), so the common case (when there is no overflow) is optimized by
> inlining the lrint logic, otherwise the old code is kept as a fallback.
> 
> on my laptop an i486 lrint call is asm:10ns, old c:30ns, new c:21ns
> on a smaller arm core: old c:71ns, new c:34ns
> on a bigger arm core: old c:27ns, new c:19ns
> ---
>  src/math/lrint.c | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/src/math/lrint.c b/src/math/lrint.c
> index bdca8b7c..ddee7a0d 100644
> --- a/src/math/lrint.c
> +++ b/src/math/lrint.c
> @@ -1,5 +1,6 @@
>  #include <limits.h>
>  #include <fenv.h>
> +#include <math.h>
>  #include "libm.h"
>  
>  /*
> @@ -26,7 +27,18 @@ as a double.
>  */
>  
>  #if LONG_MAX < 1U<<53 && defined(FE_INEXACT)
> -long lrint(double x)
> +#include <float.h>
> +#include <stdint.h>
> +#if FLT_EVAL_METHOD==0 || FLT_EVAL_METHOD==1
> +#define EPS DBL_EPSILON
> +#elif FLT_EVAL_METHOD==2
> +#define EPS LDBL_EPSILON
> +#endif
> +#ifdef __GNUC__
> +/* avoid stack frame in lrint */
> +__attribute__((noinline))
> +#endif
> +static long lrint_slow(double x)
>  {
>  	#pragma STDC FENV_ACCESS ON
>  	int e;
> @@ -38,6 +50,20 @@ long lrint(double x)
>  	/* conversion */
>  	return x;
>  }
> +
> +long lrint(double x)
> +{
> +	uint32_t abstop = asuint64(x)>>32 & 0x7fffffff;
> +	uint64_t sign = asuint64(x) & (1ULL << 63);
> +
> +	if (abstop < 0x41dfffff) {
> +		/* |x| < 0x7ffffc00, no overflow */
> +		double_t toint = asdouble(asuint64(1/EPS) | sign);
> +		double_t y = x + toint - toint;
> +		return (long)y;
> +	}
> +	return lrint_slow(x);
> +}
>  #else
>  long lrint(double x)
>  {

This code should be considerably faster than calling rint on 64-bit
archs too, no? I wonder if it should be something like (untested,
written inline here):

long lrint(double x)
{
	uint32_t abstop = asuint64(x)>>32 & 0x7fffffff;
	uint64_t sign = asuint64(x) & (1ULL << 63);

#if LONG_MAX < 1U<<53 && defined(FE_INEXACT)
	if (abstop >= 0x41dfffff) return lrint_slow(x);
#endif
	/* |x| < 0x7ffffc00, no overflow */
	double_t toint = asdouble(asuint64(1/EPS) | sign);
	double_t y = x + toint - toint;
	return (long)y;
}

Rich

next prev parent reply	other threads:[~2019-09-23 17:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-21 15:52 Szabolcs Nagy
2019-09-22 20:43 ` Szabolcs Nagy
2019-09-23 14:24   ` Rich Felker
2019-09-23 14:54     ` Szabolcs Nagy
2019-09-23 16:08       ` Rich Felker
2019-09-23 17:40   ` Rich Felker [this message]
2019-09-23 18:38     ` Szabolcs Nagy
2019-09-23 20:42       ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190923174029.GN9017@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).