> -----Original Message----- > From: 'Rich Felker' > Sent: Wednesday, September 16, 2020 8:33 PM > To: musl@lists.openwall.com > Subject: Re: [musl] Hexagon DSP support > > On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@codeaurora.org wrote: > > > > > > > -----Original Message----- > > > From: sidneym@codeaurora.org > > > Sent: Friday, July 24, 2020 12:50 PM > > > To: 'Szabolcs Nagy' > > > Cc: 'Rich Felker' ; 'musl@lists.openwall.com' > > > > > > Subject: RE: [musl] Hexagon DSP support > > > > > > > > > > > > > -----Original Message----- > > > > From: Szabolcs Nagy > > > > Sent: Thursday, July 23, 2020 4:56 PM > > > > To: sidneym@codeaurora.org > > > > Cc: 'Rich Felker' ; musl@lists.openwall.com > > > > Subject: Re: [musl] Hexagon DSP support > > > > > > > > * sidneym@codeaurora.org [2020-07-20 > > > > 16:26:58 -0500]: > > > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins, > > > > > https://reviews.llvm.org/D82263 > > > > > The comparison with musl can be found here: > > > > > https://github.com/quic/musl/compare/hexagon but I've also > > > > > attached the patch. > > > > > > > > > > An assert in clang when building both musl and libc-test for > > > > > hexagon was fixed by, https://reviews.llvm.org/D80952 prior to > > > > > this change -frounding-math had to be used. > > > > > > > > > > The test-results are also attached. Everything is built with > > > > > the tip-of-tree llvm so sometimes results vary but these are the > > > > > results I got from this morning's clone. The only notable > > > > > difference in the results would be that both fma and fmal fail > > > > > and this is because of the compiler-rt change. I didn't add fma > > > > > to musl because it require more complex assembly, along the > > > > > lines you saw in an earlier version with > > > > sqrt. > > > > > > > > > > > > the fma and sqrt failures are still not fully explained, e.g. this > > > > looks > > wrong: > > > > > > > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074) want > > > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 = -0x1p+52 > > > > 0x1p++ > > > > 0x0p+0 > > > > > > > > the only target specific bit in fma is a_clz_64 so i would check that. > > > > > > > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation happens > > > > during the fma test case above). > > > > > > Hexagon didn't have a_clz_64 implemented however I added this > > > morning it and noticed no differences. I will update the patch with > > > that routine included. > > > > > > I did notice a compiler regression in how it compiled fma and have > > > asked a compiler person to take a look. An older version of our > > > internally > > maintained > > > compiler does produce the expected results for the values I used but > > > later versions do not. Unfortunately changing optimization levels > > > will produce different results as well. > > > > I've attached updated test results and patch. The patch doesn't > > change much other than adding the above mentioned a_clz_64. The only > > other change was an update to pthread_arch.h for an api update so > > hexagon conforms with the rest of musl. > > > > Between updates to llvm and musl both fma and sqrt issues are resolved > > provided I compile the library without optimization enabled. No new > > tests fail. > > > > I guess I also need to know what the thoughts are about adding hexagon > > to the mainline base. There are no issues adding from this end. > > I'll post some review with the hope that this can move forward upstream in > musl soon. I might need some help figuring out how to get a cross build > environment to check things, but I'll follow up when I do. > > The review that follows is not 100% thorough but I think it's more detailed > than I've done for hexagon so far. Most of it's open to discussion if you think > anything I say is wrong. > > > diff --git a/arch/hexagon/atomic_arch.h b/arch/hexagon/atomic_arch.h > > new file mode 100644 index 00000000..ede55956 > > --- /dev/null > > +++ b/arch/hexagon/atomic_arch.h > > @@ -0,0 +1,194 @@ > > +#define a_ctz_32 a_ctz_32 > > +static inline int a_ctz_32(unsigned long x) { > > + __asm__( > > + "%0 = ct0(%0)\n\t" > > + : "+r"(x)); > > + return x; > > +} > > + > > +#define a_ctz_64 a_ctz_64 > > +static inline int a_ctz_64(uint64_t x) { > > + int count; > > + __asm__( > > + "%0 = ct0(%1)\n\t" > > + : "=r"(count) : "r"(x)); > > + return count; > > +} > > +#define a_clz_64 a_clz_64 > > +static inline int a_clz_64(uint64_t x) { > > + int count; > > + __asm__( > > + "%1 = brev(%1)\n\t" > > + "%0 = ct0(%1)\n\t" > > + : "=r"(count) : "r"(x)); > > + return count; > > +} > > This should probably do just the brev in asm then return > a_ctz_64(result) so that the compiler has the freedom to schedule the insns > independently, unless there's a reason not to want it to do that. I used AARCH64's a_ctz_64 as a reference for this, but there are builtins for these operations and I will use those instead, __builtin_clzll(x); > > > +#define a_cas a_cas > > +static inline int a_cas(volatile int *p, int t, int s) { > > + int dummy; > > + __asm__ __volatile__( > > + "1: %0 = memw_locked(%1)\n\t" > > + " { p0 = cmp.eq(%0, %2)\n\t" > > + " if (!p0.new) jump:nt 2f }\n\t" > > + " memw_locked(%1, p0) = %3\n\t" > > + " if (!p0) jump 1b\n\t" > > + "2: \n\t" > > + : "=&r"(dummy) > > + : "r"(p), "r"(t), "r"(s) > > + : "p0", "memory" ); > > + return dummy; > > +} > > I don't know the hexagon atomic model, but as far as I can tell these at least > "look right" in the sense of having right asm constraints. > > > [...] > > +#define a_barrier a_barrier > > +static inline void a_barrier() > > +{ > > + __asm__ __volatile__ ("barrier" ::: "memory"); } > > Is the barrier implied in memw_locked? If not, there need to be explicit > barriers in all the atomic functions. Yes, if there is any memory access on the reserved address the reservation is lost and the predicate is false. > > > diff --git a/arch/hexagon/bits/alltypes.h.in > > b/arch/hexagon/bits/alltypes.h.in new file mode 100644 index > > 00000000..9d770c7e > > --- /dev/null > > +++ b/arch/hexagon/bits/alltypes.h.in > > @@ -0,0 +1,26 @@ > > +#define _Addr int > > +#define _Int64 long long > > +#define _Reg int > > + > > +#define __BYTE_ORDER 1234 > > +#define __LONG_MAX 0x7fffffffL > > + > > +#ifndef __cplusplus > > +#ifdef __WCHAR_TYPE__ > > +TYPEDEF __WCHAR_TYPE__ wchar_t; > > +#else > > +TYPEDEF long wchar_t; > > +#endif > > +#endif > > + > > +TYPEDEF float float_t; > > +TYPEDEF double double_t; > > + > > +#if !defined(__cplusplus) > > +TYPEDEF struct { _Alignas(8) long long __ll; long double __ld; } > > +max_align_t; #elif defined(__GNUC__) TYPEDEF struct { > > +__attribute__((__aligned__(8))) long long __ll; long double __ld; } > > +max_align_t; #else TYPEDEF struct { alignas(8) long long __ll; long > > +double __ld; } max_align_t; #endif > > As I understand, the Hexagon ABI has all scalar types aligned to their size, so I > don't think the alignas mess is needed here. It should just work as > unconditional: > > TYPEDEF struct { long long __ll; long double __ld; } max_align_t; True and I will remove the alignas stuff. > > > diff --git a/arch/hexagon/bits/setjmp.h b/arch/hexagon/bits/setjmp.h > > new file mode 100644 index 00000000..5ee5e49f > > --- /dev/null > > +++ b/arch/hexagon/bits/setjmp.h > > @@ -0,0 +1,4 @@ > > + > > +typedef struct { > > + long regs[16]; > > +} __jmp_buf[1] __attribute__((aligned (8))); > > Use long long so no extension is needed here for alignment, and no wrapper > struct (which is not needed, and the member name regs is a namespace > violation): > > typedef long long __jmp_buf[8]; OK > > > diff --git a/arch/hexagon/bits/signal.h b/arch/hexagon/bits/signal.h > > new file mode 100644 index 00000000..7a3b36d0 > > --- /dev/null > > +++ b/arch/hexagon/bits/signal.h > > @@ -0,0 +1,105 @@ > > +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \ > > + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || > > + || defined(_BSD_SOURCE) > > + > > +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || > > +defined(_BSD_SOURCE) #define MINSIGSTKSZ 2048 #define SIGSTKSZ > 8192 > > +#endif > > + > > +#if defined(_GNU_SOURCE) || defined(_BSD_SOURCE) typedef int > greg_t, > > +gregset_t[18]; typedef struct sigcontext { > > + unsigned long r0, r1, r2, r3; > > + unsigned long r4, r5, r6, r7; > > + unsigned long r8, r9, r10, r11; > > + unsigned long r12, r13, r14, r15; > > + unsigned long r16, r17, r18, r19; > > + unsigned long r20, r21, r22, r23; > > + unsigned long r24, r25, r26, r27; > > + unsigned long r28, r29, r30, r31; > > + unsigned long sa0; > > + unsigned long lc0; > > + unsigned long sa1; > > + unsigned long lc1; > > + unsigned long m0; > > + unsigned long m1; > > + unsigned long usr; > > + unsigned long p3_0; > > + unsigned long gp; > > + unsigned long ugp; > > + unsigned long pc; > > + unsigned long cause; > > + unsigned long badva; > > + unsigned long pad1; > > + unsigned long pad2; > > + unsigned long pad3; > > +} __attribute__((__aligned__(8))) mcontext_t; #else typedef struct { > > + unsigned long __regs[48]; > > +} __attribute__((__aligned__(8))) mcontext_t; #endif > > If the pad members are really padding, can the last two just be replaced by a > long long so there's natural alignment with no extension? Yes, this would work. > > > [...] > > +typedef struct __ucontext { > > + unsigned long uc_flags; > > + struct __ucontext *uc_link; > > + stack_t uc_stack; > > + mcontext_t uc_mcontext; > > + sigset_t uc_sigmask; > > +// unsigned long long uc_regspace[64]; > > +} ucontext_t; > > Should the commented uc_regspace be removed? I missed that. > > > diff --git a/arch/hexagon/bits/stat.h b/arch/hexagon/bits/stat.h new > > file mode 100644 index 00000000..55e81fd9 > > --- /dev/null > > +++ b/arch/hexagon/bits/stat.h > > @@ -0,0 +1,20 @@ > > +/* copied from kernel definition, but with padding replaced > > + * by the corresponding correctly-sized userspace types. */ struct > > +stat { > > + dev_t st_dev; > > + ino_t st_ino; > > + mode_t st_mode; > > + nlink_t st_nlink; > > + uid_t st_uid; > > + gid_t st_gid; > > + dev_t st_rdev; > > + unsigned long __pad; > > + off_t st_size; > > + blksize_t st_blksize; > > + int __pad2; > > + blkcnt_t st_blocks; > > + struct timespec st_atim; > > + struct timespec st_mtim; > > + struct timespec st_ctim; > > + unsigned __unused[2]; > > +}; > > No objection to doing this as-is, but since this looks like the modern > "standard" layout and we could probably deduplicate it by making > arch/generic provide it. That'd be a later patch to remove the other existing > duplicates (aarch64, riscv64, maybe others). > > > diff --git a/arch/hexagon/bits/syscall.h.in > > b/arch/hexagon/bits/syscall.h.in new file mode 100644 index > > 00000000..77ca3fa0 > > --- /dev/null > > +++ b/arch/hexagon/bits/syscall.h.in > > @@ -0,0 +1,317 @@ > > [...] > > +#define __NR_timer_create 107 > > +#define __NR_timer_gettime 108 > > +#define __NR_timer_getoverrun 109 > > +#define __NR_timer_settime 110 > > +#define __NR_timer_delete 111 > > +#define __NR_clock_settime 112 > > +#define __NR_clock_gettime 113 > > Some of these, e.g. clock_gettime, need to be renamed as in commit > 5a105f19b5aae79dd302899e634b6b18b3dcd0d6. > > > [...] > > +#define __NR_sched_rr_get_interval_time64 423 #define __NR_syscalls > > +(__NR_sched_rr_get_interval_time64+1) > > +#define __NR_fcntl __NR3264_fcntl > > +#define __NR_fstatfs __NR3264_fstatfs #define __NR_truncate > > +__NR3264_truncate #define __NR_ftruncate __NR3264_ftruncate > #define > > +__NR_lseek __NR3264_lseek #define __NR_sendfile > __NR3264_sendfile > > +#define __NR_newfstatat __NR3264_fstatat #define __NR_fcntl64 > > +__NR3264_fcntl #define __NR_statfs64 __NR3264_statfs #define > > +__NR_fstatfs64 __NR3264_fstatfs #define __NR_truncate64 > > +__NR3264_truncate #define __NR_ftruncate64 __NR3264_ftruncate > > +#define __NR__llseek __NR3264_lseek #define __NR_sendfile64 > > +__NR3264_sendfile #define __NR_fstatat64 __NR3264_fstatat #define > > +__NR_fstat64 __NR3264_fstat #define __NR_mmap2 __NR3264_mmap > > +#define __NR_fadvise64_64 __NR3264_fadvise64 > > Is there a reason for these NR3264 redirections rather than just defining > directly with the public names? This is also some carryover from an older port. I think they are not needed. > > > diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h new > > file mode 100644 index 00000000..331a797e > > --- /dev/null > > +++ b/arch/hexagon/crt_arch.h > > @@ -0,0 +1,35 @@ > > +__asm__( > > +".weak _DYNAMIC \n" > > +".hidden _DYNAMIC \n" > > +".text \n" > > +".global " START " \n" > > +".type " START ", %function \n" > > +START ": \n" > > +" // Find _DYNAMIC\n" > > +" jump 1f\n" > > +".word _DYNAMIC - .\n" > > +"1: r2 = pc\n" > > +" r2 = add(r2, #-4)\n" > > +" r1 = memw(r2)\n" > > +" r1 = add(r2, r1)\n" > > +" r30 = #0 // Signals the end of backtrace\n" > > +" r0 = r29 // Pointer to argc/argv\n" > > +" r29 = and(r29, #-16) // Align\n" > > +" memw(r29+#-8) = r29\n" > > +" r29 = add(r29, #-8)\n" > > +" call " START "_c \n" > > +".size " START ", .-" START "\n" > > +); > > + > > +__asm__( > > +".section \".note.ABI-tag\", \"a\" \n" > > +".align 4 \n" > > +".long 1f - 0f /* name length */ \n" > > +".long 3f - 2f /* data length */ \n" > > +".long 1 /* note type */ \n" > > +"0: .asciz \"GNU\" /* vendor name seems like this should be MUSL but > lldb doesn't agree.*/ \n" > > +"1: .align 4 \n" > > +"2: .long 0 /* linux */ \n" > > +" .long 3,0,0 \n" > > +"3: .align 4 \n" > > +); > > Is there a reason this needs to be here at all? Shouldn't the tooling generate > it if it's actually wanted/needed? OK, this is here so lldb can select the right target allowing the same version of lldb to work in multiple runtime environments. I need to take a look at what the options are if this isn't a good place for this. > > > diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h new file mode > > 100644 index 00000000..14085872 > > --- /dev/null > > +++ b/arch/hexagon/reloc.h > > @@ -0,0 +1,25 @@ > > +#include > > + > > +#if __BYTE_ORDER == __LITTLE_ENDIAN > > +#define ENDIAN_SUFFIX "el" > > +#else > > +#define ENDIAN_SUFFIX "" > > +#endif > > bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally little > endian). Does Hexagon (hw and abi) actually support both byte orders? If > not, I don't think there should be an endian suffix here at all. No it does not I will remove it. > > > diff --git a/include/elf.h b/include/elf.h index 549f92c1..54251c24 > > 100644 > > --- a/include/elf.h > > +++ b/include/elf.h > > @@ -3284,6 +3284,107 @@ enum > > #define R_RISCV_SET32 56 > > #define R_RISCV_32_PCREL 57 > > > > +#define R_HEX_NONE 0 > > [...] > > I'd like to merge this separately first since it's independent of whether > hexagon is a supported host. > > > diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S new > > file mode 100644 index 00000000..07b89764 > > --- /dev/null > > +++ b/src/fenv/hexagon/fenv.S > > @@ -0,0 +1,143 @@ > > +/* > > +The Hexagon user status register includes five status fields which > > +work as sticky flags for the five IEEE-defined exception conditions: > > +inexact, overflow, underflow, divide by zero, and invalid. A sticky > > +flag is set when the corresponding exception occurs, and remains set until > explicitly cleared. > > Please format for at most 80 columns. (Some source files have a few longer > lines, but if you're flowing text it should be flowed to 80 or > fewer.) Will fix those lines and a couple of others I noticed. The changes I made can be seen here: https://github.com/quic/musl/commit/4d714f2defcd926b4f8c0425af363b382d3084cb I've also attached an updated patch. Thanks