From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <musl-return-20871-ml=inbox.vuxu.org@lists.openwall.com>
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4,
	RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4
Received: from second.openwall.net (second.openwall.net [193.110.157.125])
	by inbox.vuxu.org (Postfix) with SMTP id 9C786254B6
	for <ml@inbox.vuxu.org>; Tue, 23 Apr 2024 17:56:35 +0200 (CEST)
Received: (qmail 18180 invoked by uid 550); 23 Apr 2024 15:56:31 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 18127 invoked from network); 23 Apr 2024 15:56:31 -0000
Date: Tue, 23 Apr 2024 11:56:43 -0400
From: Rich Felker <dalias@libc.org>
To: ticat_fp <fanpeng@loongson.cn>
Cc: musl@lists.openwall.com, lixing@loongson.cn, huajingyun@loongson.cn,
	wanghongliang@loongson.cn
Message-ID: <20240423155643.GO4163@brightrain.aerifal.cx>
References: <20240423022619.1253464-1-fanpeng@loongson.cn>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20240423022619.1253464-1-fanpeng@loongson.cn>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] [PATCH] math: add LoongArch support for common APIs with
 inline assembly.

On Tue, Apr 23, 2024 at 10:26:19AM +0800, ticat_fp wrote:
> Including: ceil, copysign, fabs, floor, fma, fmax, fmin, llrint,
> lrint, rint, sqrt and their f versions.
> 
> ---
>  src/math/loongarch64/ceil.c      | 25 +++++++++++++++++++++++++
>  src/math/loongarch64/ceilf.c     | 25 +++++++++++++++++++++++++
>  src/math/loongarch64/copysign.c  |  7 +++++++
>  src/math/loongarch64/copysignf.c |  7 +++++++
>  src/math/loongarch64/fabs.c      |  7 +++++++
>  src/math/loongarch64/fabsf.c     |  7 +++++++
>  src/math/loongarch64/floor.c     | 22 ++++++++++++++++++++++
>  src/math/loongarch64/floorf.c    | 22 ++++++++++++++++++++++
>  src/math/loongarch64/fma.c       |  7 +++++++
>  src/math/loongarch64/fmaf.c      |  7 +++++++
>  src/math/loongarch64/fmax.c      |  7 +++++++
>  src/math/loongarch64/fmaxf.c     |  7 +++++++
>  src/math/loongarch64/fmin.c      |  7 +++++++
>  src/math/loongarch64/fminf.c     |  7 +++++++
>  src/math/loongarch64/llrint.c    | 17 +++++++++++++++++
>  src/math/loongarch64/llrintf.c   | 17 +++++++++++++++++
>  src/math/loongarch64/lrint.c     | 17 +++++++++++++++++
>  src/math/loongarch64/lrintf.c    | 17 +++++++++++++++++
>  src/math/loongarch64/rint.c      |  7 +++++++
>  src/math/loongarch64/rintf.c     |  7 +++++++
>  src/math/loongarch64/sqrt.c      |  7 +++++++
>  src/math/loongarch64/sqrtf.c     |  7 +++++++
>  22 files changed, 260 insertions(+)
>  create mode 100644 src/math/loongarch64/ceil.c
>  create mode 100644 src/math/loongarch64/ceilf.c
>  create mode 100644 src/math/loongarch64/copysign.c
>  create mode 100644 src/math/loongarch64/copysignf.c
>  create mode 100644 src/math/loongarch64/fabs.c
>  create mode 100644 src/math/loongarch64/fabsf.c
>  create mode 100644 src/math/loongarch64/floor.c
>  create mode 100644 src/math/loongarch64/floorf.c
>  create mode 100644 src/math/loongarch64/fma.c
>  create mode 100644 src/math/loongarch64/fmaf.c
>  create mode 100644 src/math/loongarch64/fmax.c
>  create mode 100644 src/math/loongarch64/fmaxf.c
>  create mode 100644 src/math/loongarch64/fmin.c
>  create mode 100644 src/math/loongarch64/fminf.c
>  create mode 100644 src/math/loongarch64/llrint.c
>  create mode 100644 src/math/loongarch64/llrintf.c
>  create mode 100644 src/math/loongarch64/lrint.c
>  create mode 100644 src/math/loongarch64/lrintf.c
>  create mode 100644 src/math/loongarch64/rint.c
>  create mode 100644 src/math/loongarch64/rintf.c
>  create mode 100644 src/math/loongarch64/sqrt.c
>  create mode 100644 src/math/loongarch64/sqrtf.c
> 
> diff --git a/src/math/loongarch64/ceil.c b/src/math/loongarch64/ceil.c
> new file mode 100644
> index 00000000..95781f4b
> --- /dev/null
> +++ b/src/math/loongarch64/ceil.c
> @@ -0,0 +1,25 @@
> +#include <math.h>
> +#include <stdint.h>
> +
> +double ceil(double x)
> +{
> +    int32_t old;                                                  
> +    int32_t new;                                                  
> +    int32_t tmp1;
> +    int32_t tmp2;
> +
> +    __asm__ __volatile__(                    
> +    "movfcsr2gr %[orig_old],  $r0               \n\t"
> +    "li.d       %[tmp1], 0x200                  \n\t"
> +    "or         %[new],  %[orig_old], %[tmp1]   \n\t"
> +    "li.d       %[tmp2], 0xfffffeff             \n\t"
> +    "and        %[new],  %[new], %[tmp2]        \n\t"
> +    "movgr2fcsr $r0,     %[new]                 \n\t"
> +    "frint.d    %[result],       %[orig_x]      \n\t"
> +    "movgr2fcsr $r0,     %[orig_old]            \n\t"                                                                                                                                     
> +    : [result] "+f"(x), [old]"+r"(old), [new]"+r"(new), [tmp1] "+r"(tmp1), [tmp2] "+r"(tmp2)
> +    : [orig_x] "f"(x), [orig_old]"r"(old), [orig_new]"r"(new), [orig_tmp1] "r"(tmp1), [orig_tmp2] "r"(tmp2)
> +    :);
> +
> +    return x;
> +}

Is it possible to write these with the control register logic in C
rather than a big block of asm?

Also, while probably all versions of gcc and clang with loongarch64
support the named-argument inline asm, we generally don't depend on
this extension in musl. I see how it makes the code more readable with
the big asm block, but if we could get rid of the bit asm block so
that it's just a single asm statement to read the old control register
value, C to modify it, and a pair of instructions (round and restore
control register) taking the argument value and old control register
value to restore as inputs, there wouldn't be any need for them to
make it readable.

Rich