From: <sidneym@codeaurora.org>
To: "'Rich Felker'" <dalias@libc.org>
Cc: <musl@lists.openwall.com>
Subject: RE: [musl] Hexagon DSP support
Date: Sun, 20 Sep 2020 08:12:47 -0500 [thread overview]
Message-ID: <025a01d68f4f$bd211aa0$37634fe0$@codeaurora.org> (raw)
In-Reply-To: <20200918010806.GX3265@brightrain.aerifal.cx>
> -----Original Message-----
> From: Rich Felker <dalias@libc.org>
> Sent: Thursday, September 17, 2020 8:08 PM
> To: sidneym@codeaurora.org
> Cc: musl@lists.openwall.com
> Subject: Re: [musl] Hexagon DSP support
>
> On Thu, Sep 17, 2020 at 05:31:29PM -0500, sidneym@codeaurora.org wrote:
> >
> >
> > > -----Original Message-----
> > > From: 'Rich Felker' <dalias@libc.org>
> > > Sent: Wednesday, September 16, 2020 8:33 PM
> > > To: musl@lists.openwall.com
> > > Subject: Re: [musl] Hexagon DSP support
> > >
> > > On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@codeaurora.org
> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: sidneym@codeaurora.org <sidneym@codeaurora.org>
> > > > > Sent: Friday, July 24, 2020 12:50 PM
> > > > > To: 'Szabolcs Nagy' <nsz@port70.net>
> > > > > Cc: 'Rich Felker' <dalias@libc.org>; 'musl@lists.openwall.com'
> > > > > <musl@lists.openwall.com>
> > > > > Subject: RE: [musl] Hexagon DSP support
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Szabolcs Nagy <nsz@port70.net>
> > > > > > Sent: Thursday, July 23, 2020 4:56 PM
> > > > > > To: sidneym@codeaurora.org
> > > > > > Cc: 'Rich Felker' <dalias@libc.org>; musl@lists.openwall.com
> > > > > > Subject: Re: [musl] Hexagon DSP support
> > > > > >
> > > > > > * sidneym@codeaurora.org <sidneym@codeaurora.org> [2020-07-
> 20
> > > > > > 16:26:58 -0500]:
> > > > > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins,
> > > > > > > https://reviews.llvm.org/D82263 The comparison with musl can
> > > > > > > be found here:
> > > > > > > https://github.com/quic/musl/compare/hexagon but I've also
> > > > > > > attached the patch.
> > > > > > >
> > > > > > > An assert in clang when building both musl and libc-test for
> > > > > > > hexagon was fixed by, https://reviews.llvm.org/D80952 prior
> > > > > > > to this change -frounding-math had to be used.
> > > > > > >
> > > > > > > The test-results are also attached. Everything is built
> > > > > > > with the tip-of-tree llvm so sometimes results vary but
> > > > > > > these are the results I got from this morning's clone. The
> > > > > > > only notable difference in the results would be that both
> > > > > > > fma and fmal fail and this is because of the compiler-rt
> > > > > > > change. I didn't add fma to musl because it require more
> > > > > > > complex assembly, along the lines you saw in an earlier
> > > > > > > version with
> > > > > > sqrt.
> > > > > >
> > > > > >
> > > > > > the fma and sqrt failures are still not fully explained, e.g.
> > > > > > this looks
> > > > wrong:
> > > > > >
> > > > > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074)
> > > > > > want
> > > > > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 =
> > > > > > 0x1p+-0x1p+52
> > > > > > 0x1p++
> > > > > > 0x0p+0
> > > > > >
> > > > > > the only target specific bit in fma is a_clz_64 so i would
> > > > > > check
> > that.
> > > > > >
> > > > > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation
> > > > > > happens during the fma test case above).
> > > > >
> > > > > Hexagon didn't have a_clz_64 implemented however I added this
> > > > > morning it and noticed no differences. I will update the patch
> > > > > with that routine included.
> > > > >
> > > > > I did notice a compiler regression in how it compiled fma and
> > > > > have asked a compiler person to take a look. An older version
> > > > > of our internally
> > > > maintained
> > > > > compiler does produce the expected results for the values I used
> > > > > but later versions do not. Unfortunately changing optimization
> > > > > levels will produce different results as well.
> > > >
> > > > I've attached updated test results and patch. The patch doesn't
> > > > change much other than adding the above mentioned a_clz_64. The
> > > > only other change was an update to pthread_arch.h for an api
> > > > update so hexagon conforms with the rest of musl.
> > > >
> > > > Between updates to llvm and musl both fma and sqrt issues are
> > > > resolved provided I compile the library without optimization
> > > > enabled. No new tests fail.
> > > >
> > > > I guess I also need to know what the thoughts are about adding
> > > > hexagon to the mainline base. There are no issues adding from this
> end.
> > >
> > > I'll post some review with the hope that this can move forward
> > > upstream in musl soon. I might need some help figuring out how to
> > > get a cross build environment to check things, but I'll follow up when
I do.
> > >
> > > The review that follows is not 100% thorough but I think it's more
> > detailed
> > > than I've done for hexagon so far. Most of it's open to discussion
> > > if you
> > think
> > > anything I say is wrong.
> > >
> > > > diff --git a/arch/hexagon/atomic_arch.h
> > > > b/arch/hexagon/atomic_arch.h new file mode 100644 index
> > > > 00000000..ede55956
> > > > --- /dev/null
> > > > +++ b/arch/hexagon/atomic_arch.h
> > > > @@ -0,0 +1,194 @@
> > > > +#define a_ctz_32 a_ctz_32
> > > > +static inline int a_ctz_32(unsigned long x) {
> > > > + __asm__(
> > > > + "%0 = ct0(%0)\n\t"
> > > > + : "+r"(x));
> > > > + return x;
> > > > +}
> > > > +
> > > > +#define a_ctz_64 a_ctz_64
> > > > +static inline int a_ctz_64(uint64_t x) {
> > > > + int count;
> > > > + __asm__(
> > > > + "%0 = ct0(%1)\n\t"
> > > > + : "=r"(count) : "r"(x));
> > > > + return count;
> > > > +}
> > > > +#define a_clz_64 a_clz_64
> > > > +static inline int a_clz_64(uint64_t x) {
> > > > + int count;
> > > > + __asm__(
> > > > + "%1 = brev(%1)\n\t"
> > > > + "%0 = ct0(%1)\n\t"
> > > > + : "=r"(count) : "r"(x));
> > > > + return count;
> > > > +}
> > >
> > > This should probably do just the brev in asm then return
> > > a_ctz_64(result) so that the compiler has the freedom to schedule
> > > the
> > insns
> > > independently, unless there's a reason not to want it to do that.
> >
> > I used AARCH64's a_ctz_64 as a reference for this, but there are
> > builtins for these operations and I will use those instead,
> > __builtin_clzll(x);
>
> No, builtins of that sort aren't used in musl for various reasons.
> This one might be technically okay (as opposed to others that can produce
> circular definitions and such), but the style preference is still to write
it as an
> asm statement.
OK, will restore the inlines.
>
> > > > +#define a_cas a_cas
> > > > +static inline int a_cas(volatile int *p, int t, int s) {
> > > > + int dummy;
> > > > + __asm__ __volatile__(
> > > > + "1: %0 = memw_locked(%1)\n\t"
> > > > + " { p0 = cmp.eq(%0, %2)\n\t"
> > > > + " if (!p0.new) jump:nt 2f }\n\t"
> > > > + " memw_locked(%1, p0) = %3\n\t"
> > > > + " if (!p0) jump 1b\n\t"
> > > > + "2: \n\t"
> > > > + : "=&r"(dummy)
> > > > + : "r"(p), "r"(t), "r"(s)
> > > > + : "p0", "memory" );
> > > > + return dummy;
> > > > +}
> > >
> > > I don't know the hexagon atomic model, but as far as I can tell
> > > these at
> > least
> > > "look right" in the sense of having right asm constraints.
> > >
> > > > [...]
> > > > +#define a_barrier a_barrier
> > > > +static inline void a_barrier()
> > > > +{
> > > > + __asm__ __volatile__ ("barrier" ::: "memory"); }
> > >
> > > Is the barrier implied in memw_locked? If not, there need to be
> > > explicit barriers in all the atomic functions.
> >
> > Yes, if there is any memory access on the reserved address the
> > reservation is lost and the predicate is false.
>
> That's not what a barrier means. The question is whether it orders all
access
> to *other* memory, not the address with the reservation on it.
> In other words, musl's a_*() atomics need to be full seq_cst model
> operations, not relaxed atomics.
Per our spec:
"Threads in the Hexagon processor follow a sequentially consistent memory
model at a packet
granularity. Threads interleave their memory operations with one another in
an arbitrary but
fair manner. This results in a consistent program order that is globally
observable by all
threads in the same order."
>
> > > > diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h new
> > > > file mode 100644 index 00000000..331a797e
> > > > --- /dev/null
> > > > +++ b/arch/hexagon/crt_arch.h
> > > > @@ -0,0 +1,35 @@
> > > > +__asm__(
> > > > +".weak _DYNAMIC \n"
> > > > +".hidden _DYNAMIC \n"
> > > > +".text \n"
> > > > +".global " START " \n"
> > > > +".type " START ", %function \n"
> > > > +START ": \n"
> > > > +" // Find _DYNAMIC\n"
> > > > +" jump 1f\n"
> > > > +".word _DYNAMIC - .\n"
> > > > +"1: r2 = pc\n"
> > > > +" r2 = add(r2, #-4)\n"
> > > > +" r1 = memw(r2)\n"
> > > > +" r1 = add(r2, r1)\n"
> > > > +" r30 = #0 // Signals the end of
> backtrace\n"
> > > > +" r0 = r29 // Pointer to argc/argv\n"
> > > > +" r29 = and(r29, #-16) // Align\n"
> > > > +" memw(r29+#-8) = r29\n"
> > > > +" r29 = add(r29, #-8)\n"
> > > > +" call " START "_c \n"
> > > > +".size " START ", .-" START "\n"
> > > > +);
> > > > +
> > > > +__asm__(
> > > > +".section \".note.ABI-tag\", \"a\" \n"
> > > > +".align 4 \n"
> > > > +".long 1f - 0f /* name length */ \n"
> > > > +".long 3f - 2f /* data length */ \n"
> > > > +".long 1 /* note type */ \n"
> > > > +"0: .asciz \"GNU\" /* vendor name seems like this should be
> > MUSL but
> > > lldb doesn't agree.*/ \n"
> > > > +"1: .align 4 \n"
> > > > +"2: .long 0 /* linux */ \n"
> > > > +" .long 3,0,0 \n"
> > > > +"3: .align 4 \n"
> > > > +);
> > >
> > > Is there a reason this needs to be here at all? Shouldn't the
> > > tooling
> > generate
> > > it if it's actually wanted/needed?
> >
> > OK, this is here so lldb can select the right target allowing the same
> > version of lldb to work in multiple runtime environments. I need to
> > take a look at what the options are if this isn't a good place for this.
>
> OK, please follow up with what you find. I'm pretty sure this belongs in
the
> tooling though (as something the compiler emits or the assembler emits
> based on how it's invoked and what its target is) not in a file in libc.
>
Szabolcs noted that gnu crt adds a similar note. I know that lldb's
RefineModuleDetailsFromNote uses this to select the OS but will remove it
for now while other options are looked at.
> > > > diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h new file
> > > > mode
> > > > 100644 index 00000000..14085872
> > > > --- /dev/null
> > > > +++ b/arch/hexagon/reloc.h
> > > > @@ -0,0 +1,25 @@
> > > > +#include <endian.h>
> > > > +
> > > > +#if __BYTE_ORDER == __LITTLE_ENDIAN #define ENDIAN_SUFFIX "el"
> > > > +#else
> > > > +#define ENDIAN_SUFFIX ""
> > > > +#endif
> > >
> > > bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally
> > > little endian). Does Hexagon (hw and abi) actually support both byte
> > > orders? If not, I don't think there should be an endian suffix here at
all.
> >
> > No it does not I will remove it.
>
> OK. Make sure the compiler target support has the right ldso name too (I
> think that's /lib/ld-musl-hexagon.so.1, right?).
Yes, that is the name.
>
> > > > diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S new
> > > > file mode 100644 index 00000000..07b89764
> > > > --- /dev/null
> > > > +++ b/src/fenv/hexagon/fenv.S
> > > > @@ -0,0 +1,143 @@
> > > > +/*
> > > > +The Hexagon user status register includes five status fields
> > > > +which work as sticky flags for the five IEEE-defined exception
> conditions:
> > > > +inexact, overflow, underflow, divide by zero, and invalid. A
> > > > +sticky flag is set when the corresponding exception occurs, and
> > > > +remains set
> > until
> > > explicitly cleared.
> > >
> > > Please format for at most 80 columns. (Some source files have a few
> > > longer lines, but if you're flowing text it should be flowed to 80
> > > or
> > > fewer.)
> >
> > Will fix those lines and a couple of others I noticed.
> >
> > The changes I made can be seen here:
> >
> https://github.com/quic/musl/commit/4d714f2defcd926b4f8c0425af363b382
> d
> > 3084cb
> >
> > I've also attached an updated patch.
>
> OK. I haven't looked at it yet but I wanted to go ahead and get you
replies to
> the content in the email body. If there's anything else in the patch that
needs
> commenting on I'll follow up soon, probably tomorrow.
>
> Rich
next prev parent reply other threads:[~2020-09-20 13:13 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-15 13:19 sidneym
2020-04-15 16:30 ` Rich Felker
2020-04-15 17:50 ` sidneym
2020-04-15 18:06 ` Szabolcs Nagy
2020-04-15 18:22 ` sidneym
2020-04-16 9:36 ` Szabolcs Nagy
2020-04-16 15:34 ` Rich Felker
2020-04-16 16:26 ` sidneym
2020-04-16 16:34 ` 'Rich Felker'
2020-04-15 18:26 ` Rich Felker
2020-04-15 19:12 ` sidneym
2020-04-15 19:29 ` 'Rich Felker'
2020-04-30 22:44 ` sidneym
2020-04-30 23:51 ` Rich Felker
2020-05-05 23:37 ` sidneym
2020-05-06 0:59 ` Rich Felker
2020-06-18 16:37 ` sidneym
2020-06-18 21:42 ` Szabolcs Nagy
2020-06-19 21:58 ` sidneym
2020-06-19 22:46 ` Rich Felker
2020-06-20 0:03 ` [musl] strtok Robert Skopalík
2020-06-20 0:15 ` Rich Felker
2020-06-20 0:36 ` Robert Skopalík
2020-06-20 0:46 ` Robert Skopalík
2020-06-20 1:44 ` Rich Felker
2020-06-20 7:07 ` Patrick Oppenlander
2020-06-20 13:00 ` Robert Skopalík
2020-06-22 0:57 ` Bery Saidi
2020-06-20 2:29 ` [musl] Hexagon DSP support sidneym
2020-06-20 3:20 ` Rich Felker
2020-07-20 21:26 ` sidneym
2020-07-23 21:56 ` Szabolcs Nagy
2020-07-24 17:49 ` sidneym
2020-09-16 20:49 ` sidneym
2020-09-17 1:32 ` 'Rich Felker'
2020-09-17 22:31 ` sidneym
2020-09-18 1:08 ` Rich Felker
2020-09-18 8:10 ` Szabolcs Nagy
2020-09-20 13:12 ` sidneym [this message]
2020-09-20 17:17 ` 'Rich Felker'
2020-09-21 14:09 ` sidneym
2020-04-15 18:55 ` Fangrui Song
2021-03-09 20:25 sidneym
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='025a01d68f4f$bd211aa0$37634fe0$@codeaurora.org' \
--to=sidneym@codeaurora.org \
--cc=dalias@libc.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).