From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 9C786254B6 for ; Tue, 23 Apr 2024 17:56:35 +0200 (CEST) Received: (qmail 18180 invoked by uid 550); 23 Apr 2024 15:56:31 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 18127 invoked from network); 23 Apr 2024 15:56:31 -0000 Date: Tue, 23 Apr 2024 11:56:43 -0400 From: Rich Felker To: ticat_fp Cc: musl@lists.openwall.com, lixing@loongson.cn, huajingyun@loongson.cn, wanghongliang@loongson.cn Message-ID: <20240423155643.GO4163@brightrain.aerifal.cx> References: <20240423022619.1253464-1-fanpeng@loongson.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240423022619.1253464-1-fanpeng@loongson.cn> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH] math: add LoongArch support for common APIs with inline assembly. On Tue, Apr 23, 2024 at 10:26:19AM +0800, ticat_fp wrote: > Including: ceil, copysign, fabs, floor, fma, fmax, fmin, llrint, > lrint, rint, sqrt and their f versions. > > --- > src/math/loongarch64/ceil.c | 25 +++++++++++++++++++++++++ > src/math/loongarch64/ceilf.c | 25 +++++++++++++++++++++++++ > src/math/loongarch64/copysign.c | 7 +++++++ > src/math/loongarch64/copysignf.c | 7 +++++++ > src/math/loongarch64/fabs.c | 7 +++++++ > src/math/loongarch64/fabsf.c | 7 +++++++ > src/math/loongarch64/floor.c | 22 ++++++++++++++++++++++ > src/math/loongarch64/floorf.c | 22 ++++++++++++++++++++++ > src/math/loongarch64/fma.c | 7 +++++++ > src/math/loongarch64/fmaf.c | 7 +++++++ > src/math/loongarch64/fmax.c | 7 +++++++ > src/math/loongarch64/fmaxf.c | 7 +++++++ > src/math/loongarch64/fmin.c | 7 +++++++ > src/math/loongarch64/fminf.c | 7 +++++++ > src/math/loongarch64/llrint.c | 17 +++++++++++++++++ > src/math/loongarch64/llrintf.c | 17 +++++++++++++++++ > src/math/loongarch64/lrint.c | 17 +++++++++++++++++ > src/math/loongarch64/lrintf.c | 17 +++++++++++++++++ > src/math/loongarch64/rint.c | 7 +++++++ > src/math/loongarch64/rintf.c | 7 +++++++ > src/math/loongarch64/sqrt.c | 7 +++++++ > src/math/loongarch64/sqrtf.c | 7 +++++++ > 22 files changed, 260 insertions(+) > create mode 100644 src/math/loongarch64/ceil.c > create mode 100644 src/math/loongarch64/ceilf.c > create mode 100644 src/math/loongarch64/copysign.c > create mode 100644 src/math/loongarch64/copysignf.c > create mode 100644 src/math/loongarch64/fabs.c > create mode 100644 src/math/loongarch64/fabsf.c > create mode 100644 src/math/loongarch64/floor.c > create mode 100644 src/math/loongarch64/floorf.c > create mode 100644 src/math/loongarch64/fma.c > create mode 100644 src/math/loongarch64/fmaf.c > create mode 100644 src/math/loongarch64/fmax.c > create mode 100644 src/math/loongarch64/fmaxf.c > create mode 100644 src/math/loongarch64/fmin.c > create mode 100644 src/math/loongarch64/fminf.c > create mode 100644 src/math/loongarch64/llrint.c > create mode 100644 src/math/loongarch64/llrintf.c > create mode 100644 src/math/loongarch64/lrint.c > create mode 100644 src/math/loongarch64/lrintf.c > create mode 100644 src/math/loongarch64/rint.c > create mode 100644 src/math/loongarch64/rintf.c > create mode 100644 src/math/loongarch64/sqrt.c > create mode 100644 src/math/loongarch64/sqrtf.c > > diff --git a/src/math/loongarch64/ceil.c b/src/math/loongarch64/ceil.c > new file mode 100644 > index 00000000..95781f4b > --- /dev/null > +++ b/src/math/loongarch64/ceil.c > @@ -0,0 +1,25 @@ > +#include > +#include > + > +double ceil(double x) > +{ > + int32_t old; > + int32_t new; > + int32_t tmp1; > + int32_t tmp2; > + > + __asm__ __volatile__( > + "movfcsr2gr %[orig_old], $r0 \n\t" > + "li.d %[tmp1], 0x200 \n\t" > + "or %[new], %[orig_old], %[tmp1] \n\t" > + "li.d %[tmp2], 0xfffffeff \n\t" > + "and %[new], %[new], %[tmp2] \n\t" > + "movgr2fcsr $r0, %[new] \n\t" > + "frint.d %[result], %[orig_x] \n\t" > + "movgr2fcsr $r0, %[orig_old] \n\t" > + : [result] "+f"(x), [old]"+r"(old), [new]"+r"(new), [tmp1] "+r"(tmp1), [tmp2] "+r"(tmp2) > + : [orig_x] "f"(x), [orig_old]"r"(old), [orig_new]"r"(new), [orig_tmp1] "r"(tmp1), [orig_tmp2] "r"(tmp2) > + :); > + > + return x; > +} Is it possible to write these with the control register logic in C rather than a big block of asm? Also, while probably all versions of gcc and clang with loongarch64 support the named-argument inline asm, we generally don't depend on this extension in musl. I see how it makes the code more readable with the big asm block, but if we could get rid of the bit asm block so that it's just a single asm statement to read the old control register value, C to modify it, and a pair of instructions (round and restore control register) taking the argument value and old control register value to restore as inputs, there wouldn't be any need for them to make it readable. Rich