From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 32500 invoked from network); 14 Aug 2021 23:46:27 -0000
Received: from mother.openwall.net (195.42.179.200)
  by inbox.vuxu.org with ESMTPUTF8; 14 Aug 2021 23:46:27 -0000
Received: (qmail 24101 invoked by uid 550); 14 Aug 2021 23:46:25 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 24083 invoked from network); 14 Aug 2021 23:46:24 -0000
Date: Sun, 15 Aug 2021 01:46:12 +0200
From: Szabolcs Nagy <nsz@port70.net>
To: Stefan Kanthak <stefan.kanthak@nexgo.de>
Cc: musl@lists.openwall.com
Message-ID: <20210814234612.GH37904@port70.net>
Mail-Followup-To: Stefan Kanthak <stefan.kanthak@nexgo.de>,
	musl@lists.openwall.com
References: <0C6AAAD55DA44C6189B2FF4F5FB2C3E7@H270>
 <20210810213455.GB37904@port70.net>
 <E2423D1F1F3848848AEA933048174858@H270>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <E2423D1F1F3848848AEA933048174858@H270>
Subject: Re: [musl] [PATCH #2] Properly simplified nextafter()

* Stefan Kanthak <stefan.kanthak@nexgo.de> [2021-08-13 14:04:51 +0200]:
> Szabolcs Nagy <nsz@port70.net> wrote on 2021-08-10 at 23:34:
> > it's target dependent if float compares are fast or not.
> 
> It's also target dependent whether the FP additions and multiplies
> used to raise overflow/underflow are SLOOOWWW: how can you justify
> them, especially for targets using soft-float?

for fenv side-effects, using fp arithmetic or conversion
operations are ideal.

i pointed out the new subnormal handling cases your patch
introduced because it was not clear that you considered it.
i'm mainly concerned about the performance of the common
cases, not rare special cases.

> > (the i386 machine where i originally tested this preferred int
> > cmp and float cmp was very slow in the subnormal range
> 
> This also and still holds for i386 FPU fadd/fmul as well as SSE
> addsd/addss/mulss/mulsd additions/multiplies!

they are avoided in the common case, and only used to create
fenv side-effects.

> --- -/src/math/nextafter.c
> +++ +/src/math/nextafter.c
> @@ -10,13 +10,13 @@
>                 return x + y;
>         if (ux.i == uy.i)
>                 return y;
> -       ax = ux.i & -1ULL/2;
> -       ay = uy.i & -1ULL/2;
> +       ax = ux.i << 2;
> +       ay = uy.i << 2;

the << 2 looks wrong, the top bit of the exponent is lost.

>         if (ax == 0) {
>                 if (ay == 0)
>                         return y;
>                 ux.i = (uy.i & 1ULL<<63) | 1;
> -       } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))
> +       } else if ((ax < ay) == ((int64_t) ux.i < 0))
>                 ux.i--;
>         else
>                 ux.i++;
...
> How do you compare these 60 instructions/252 bytes to the code I posted
> (23 instructions/72 bytes)?

you should benchmark, but the second best is to look
at the longest dependency chain in the hot path and
add up the instruction latencies.