> -----Original Message-----
> From: 'Rich Felker' <dalias@libc.org>
> Sent: Wednesday, September 16, 2020 8:33 PM
> To: musl@lists.openwall.com
> Subject: Re: [musl] Hexagon DSP support
> 
> On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@codeaurora.org wrote:
> >
> >
> > > -----Original Message-----
> > > From: sidneym@codeaurora.org <sidneym@codeaurora.org>
> > > Sent: Friday, July 24, 2020 12:50 PM
> > > To: 'Szabolcs Nagy' <nsz@port70.net>
> > > Cc: 'Rich Felker' <dalias@libc.org>; 'musl@lists.openwall.com'
> > > <musl@lists.openwall.com>
> > > Subject: RE: [musl] Hexagon DSP support
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Szabolcs Nagy <nsz@port70.net>
> > > > Sent: Thursday, July 23, 2020 4:56 PM
> > > > To: sidneym@codeaurora.org
> > > > Cc: 'Rich Felker' <dalias@libc.org>; musl@lists.openwall.com
> > > > Subject: Re: [musl] Hexagon DSP support
> > > >
> > > > * sidneym@codeaurora.org <sidneym@codeaurora.org> [2020-07-20
> > > > 16:26:58 -0500]:
> > > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins,
> > > > > https://reviews.llvm.org/D82263
> > > > > The comparison with musl can be found here:
> > > > > https://github.com/quic/musl/compare/hexagon but I've also
> > > > > attached the patch.
> > > > >
> > > > > An assert in clang when building both musl and libc-test for
> > > > > hexagon was fixed by, https://reviews.llvm.org/D80952 prior to
> > > > > this change -frounding-math had to be used.
> > > > >
> > > > > The test-results are also attached.  Everything is built with
> > > > > the tip-of-tree llvm so sometimes results vary but these are the
> > > > > results I got from this morning's clone.  The only notable
> > > > > difference in the results would be that both fma and fmal fail
> > > > > and this is because of the compiler-rt change.  I didn't add fma
> > > > > to musl because it require more complex assembly, along the
> > > > > lines you saw in an earlier version with
> > > > sqrt.
> > > >
> > > >
> > > > the fma and sqrt failures are still not fully explained, e.g. this
> > > > looks
> > wrong:
> > > >
> > > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074) want
> > > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 = -0x1p+52
> > > > 0x1p++
> > > > 0x0p+0
> > > >
> > > > the only target specific bit in fma is a_clz_64 so i would check
that.
> > > >
> > > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation happens
> > > > during the fma test case above).
> > >
> > > Hexagon didn't have a_clz_64 implemented however I added this
> > > morning it and noticed no differences.  I will update the patch with
> > > that routine included.
> > >
> > > I did notice a compiler regression in how it compiled fma and have
> > > asked a compiler person to take a look.  An older version of our
> > > internally
> > maintained
> > > compiler does produce the expected results for the values I used but
> > > later versions do not.  Unfortunately changing optimization levels
> > > will produce different results as well.
> >
> > I've attached updated test results and patch.  The patch doesn't
> > change much other than adding the above mentioned a_clz_64.  The only
> > other change was an update to pthread_arch.h for an api update so
> > hexagon conforms with the rest of musl.
> >
> > Between updates to llvm and musl both fma and sqrt issues are resolved
> > provided I compile the library without optimization enabled.  No new
> > tests fail.
> >
> > I guess I also need to know what the thoughts are about adding hexagon
> > to the mainline base.  There are no issues adding from this end.
> 
> I'll post some review with the hope that this can move forward upstream in
> musl soon. I might need some help figuring out how to get a cross build
> environment to check things, but I'll follow up when I do.
> 
> The review that follows is not 100% thorough but I think it's more
detailed
> than I've done for hexagon so far. Most of it's open to discussion if you
think
> anything I say is wrong.
> 
> > diff --git a/arch/hexagon/atomic_arch.h b/arch/hexagon/atomic_arch.h
> > new file mode 100644 index 00000000..ede55956
> > --- /dev/null
> > +++ b/arch/hexagon/atomic_arch.h
> > @@ -0,0 +1,194 @@
> > +#define a_ctz_32 a_ctz_32
> > +static inline int a_ctz_32(unsigned long x) {
> > +	__asm__(
> > +		"%0 = ct0(%0)\n\t"
> > +		: "+r"(x));
> > +	return x;
> > +}
> > +
> > +#define a_ctz_64 a_ctz_64
> > +static inline int a_ctz_64(uint64_t x) {
> > +	int count;
> > +	__asm__(
> > +		"%0 = ct0(%1)\n\t"
> > +		: "=r"(count) : "r"(x));
> > +	return count;
> > +}
> > +#define a_clz_64 a_clz_64
> > +static inline int a_clz_64(uint64_t x) {
> > +        int count;
> > +        __asm__(
> > +                "%1 = brev(%1)\n\t"
> > +                "%0 = ct0(%1)\n\t"
> > +                : "=r"(count) : "r"(x));
> > +        return count;
> > +}
> 
> This should probably do just the brev in asm then return
> a_ctz_64(result) so that the compiler has the freedom to schedule the
insns
> independently, unless there's a reason not to want it to do that.

I used AARCH64's a_ctz_64 as a reference for this, but there are builtins
for these operations and I will use those instead, __builtin_clzll(x);

> 
> > +#define a_cas a_cas
> > +static inline int a_cas(volatile int *p, int t, int s) {
> > +	int dummy;
> > +	__asm__ __volatile__(
> > +		"1:	%0 = memw_locked(%1)\n\t"
> > +		"	{ p0 = cmp.eq(%0, %2)\n\t"
> > +		"	  if (!p0.new) jump:nt 2f }\n\t"
> > +		"	memw_locked(%1, p0) = %3\n\t"
> > +		"	if (!p0) jump 1b\n\t"
> > +		"2:	\n\t"
> > +		: "=&r"(dummy)
> > +		: "r"(p), "r"(t), "r"(s)
> > +		: "p0", "memory" );
> > +        return dummy;
> > +}
> 
> I don't know the hexagon atomic model, but as far as I can tell these at
least
> "look right" in the sense of having right asm constraints.
> 
> > [...]
> > +#define a_barrier a_barrier
> > +static inline void a_barrier()
> > +{
> > +	__asm__ __volatile__ ("barrier" ::: "memory"); }
> 
> Is the barrier implied in memw_locked? If not, there need to be explicit
> barriers in all the atomic functions.

Yes, if there is any memory access on the reserved address the reservation
is lost and the predicate is false.


> 
> > diff --git a/arch/hexagon/bits/alltypes.h.in
> > b/arch/hexagon/bits/alltypes.h.in new file mode 100644 index
> > 00000000..9d770c7e
> > --- /dev/null
> > +++ b/arch/hexagon/bits/alltypes.h.in
> > @@ -0,0 +1,26 @@
> > +#define _Addr int
> > +#define _Int64 long long
> > +#define _Reg int
> > +
> > +#define __BYTE_ORDER 1234
> > +#define __LONG_MAX 0x7fffffffL
> > +
> > +#ifndef __cplusplus
> > +#ifdef __WCHAR_TYPE__
> > +TYPEDEF __WCHAR_TYPE__ wchar_t;
> > +#else
> > +TYPEDEF long wchar_t;
> > +#endif
> > +#endif
> > +
> > +TYPEDEF float float_t;
> > +TYPEDEF double double_t;
> > +
> > +#if !defined(__cplusplus)
> > +TYPEDEF struct { _Alignas(8) long long __ll; long double __ld; }
> > +max_align_t; #elif defined(__GNUC__) TYPEDEF struct {
> > +__attribute__((__aligned__(8))) long long __ll; long double __ld; }
> > +max_align_t; #else TYPEDEF struct { alignas(8) long long __ll; long
> > +double __ld; } max_align_t; #endif
> 
> As I understand, the Hexagon ABI has all scalar types aligned to their
size, so I
> don't think the alignas mess is needed here. It should just work as
> unconditional:
> 
> TYPEDEF struct { long long __ll; long double __ld; } max_align_t;

True and I will remove the alignas stuff.


> 
> > diff --git a/arch/hexagon/bits/setjmp.h b/arch/hexagon/bits/setjmp.h
> > new file mode 100644 index 00000000..5ee5e49f
> > --- /dev/null
> > +++ b/arch/hexagon/bits/setjmp.h
> > @@ -0,0 +1,4 @@
> > +
> > +typedef struct {
> > +	long regs[16];
> > +} __jmp_buf[1] __attribute__((aligned (8)));
> 
> Use long long so no extension is needed here for alignment, and no wrapper
> struct (which is not needed, and the member name regs is a namespace
> violation):
> 
> typedef long long __jmp_buf[8];

OK 

> 
> > diff --git a/arch/hexagon/bits/signal.h b/arch/hexagon/bits/signal.h
> > new file mode 100644 index 00000000..7a3b36d0
> > --- /dev/null
> > +++ b/arch/hexagon/bits/signal.h
> > @@ -0,0 +1,105 @@
> > +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
> > + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) ||
> > + || defined(_BSD_SOURCE)
> > +
> > +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) ||
> > +defined(_BSD_SOURCE) #define MINSIGSTKSZ 2048 #define SIGSTKSZ
> 8192
> > +#endif
> > +
> > +#if defined(_GNU_SOURCE) || defined(_BSD_SOURCE) typedef int
> greg_t,
> > +gregset_t[18]; typedef struct sigcontext {
> > +	unsigned long  r0,  r1,  r2,  r3;
> > +	unsigned long  r4,  r5,  r6,  r7;
> > +	unsigned long  r8,  r9, r10, r11;
> > +	unsigned long r12, r13, r14, r15;
> > +	unsigned long r16, r17, r18, r19;
> > +	unsigned long r20, r21, r22, r23;
> > +	unsigned long r24, r25, r26, r27;
> > +	unsigned long r28, r29, r30, r31;
> > +	unsigned long sa0;
> > +	unsigned long lc0;
> > +	unsigned long sa1;
> > +	unsigned long lc1;
> > +	unsigned long m0;
> > +	unsigned long m1;
> > +	unsigned long usr;
> > +	unsigned long p3_0;
> > +	unsigned long gp;
> > +	unsigned long ugp;
> > +	unsigned long pc;
> > +	unsigned long cause;
> > +	unsigned long badva;
> > +	unsigned long pad1;
> > +	unsigned long pad2;
> > +	unsigned long pad3;
> > +} __attribute__((__aligned__(8))) mcontext_t; #else typedef struct {
> > +	unsigned long __regs[48];
> > +} __attribute__((__aligned__(8))) mcontext_t; #endif
> 
> If the pad members are really padding, can the last two just be replaced
by a
> long long so there's natural alignment with no extension?

Yes, this would work.


> 
> > [...]
> > +typedef struct __ucontext {
> > +	unsigned long uc_flags;
> > +	struct __ucontext *uc_link;
> > +	stack_t uc_stack;
> > +	mcontext_t uc_mcontext;
> > +	sigset_t uc_sigmask;
> > +//	unsigned long long uc_regspace[64];
> > +} ucontext_t;
> 
> Should the commented uc_regspace be removed?

I missed that.


> 
> > diff --git a/arch/hexagon/bits/stat.h b/arch/hexagon/bits/stat.h new
> > file mode 100644 index 00000000..55e81fd9
> > --- /dev/null
> > +++ b/arch/hexagon/bits/stat.h
> > @@ -0,0 +1,20 @@
> > +/* copied from kernel definition, but with padding replaced
> > + * by the corresponding correctly-sized userspace types. */ struct
> > +stat {
> > +	dev_t st_dev;
> > +	ino_t st_ino;
> > +	mode_t st_mode;
> > +	nlink_t st_nlink;
> > +	uid_t st_uid;
> > +	gid_t st_gid;
> > +	dev_t st_rdev;
> > +	unsigned long __pad;
> > +	off_t st_size;
> > +	blksize_t st_blksize;
> > +	int __pad2;
> > +	blkcnt_t st_blocks;
> > +	struct timespec st_atim;
> > +	struct timespec st_mtim;
> > +	struct timespec st_ctim;
> > +	unsigned __unused[2];
> > +};
> 
> No objection to doing this as-is, but since this looks like the modern
> "standard" layout and we could probably deduplicate it by making
> arch/generic provide it. That'd be a later patch to remove the other
existing
> duplicates (aarch64, riscv64, maybe others).
> 
> > diff --git a/arch/hexagon/bits/syscall.h.in
> > b/arch/hexagon/bits/syscall.h.in new file mode 100644 index
> > 00000000..77ca3fa0
> > --- /dev/null
> > +++ b/arch/hexagon/bits/syscall.h.in
> > @@ -0,0 +1,317 @@
> > [...]
> > +#define __NR_timer_create 107
> > +#define __NR_timer_gettime 108
> > +#define __NR_timer_getoverrun 109
> > +#define __NR_timer_settime 110
> > +#define __NR_timer_delete 111
> > +#define __NR_clock_settime 112
> > +#define __NR_clock_gettime 113
> 
> Some of these, e.g. clock_gettime, need to be renamed as in commit
> 5a105f19b5aae79dd302899e634b6b18b3dcd0d6.
> 
> > [...]
> > +#define __NR_sched_rr_get_interval_time64 423 #define __NR_syscalls
> > +(__NR_sched_rr_get_interval_time64+1)
> > +#define __NR_fcntl  __NR3264_fcntl
> > +#define __NR_fstatfs  __NR3264_fstatfs #define __NR_truncate
> > +__NR3264_truncate #define __NR_ftruncate  __NR3264_ftruncate
> #define
> > +__NR_lseek  __NR3264_lseek #define __NR_sendfile
> __NR3264_sendfile
> > +#define __NR_newfstatat  __NR3264_fstatat #define __NR_fcntl64
> > +__NR3264_fcntl #define __NR_statfs64  __NR3264_statfs #define
> > +__NR_fstatfs64  __NR3264_fstatfs #define __NR_truncate64
> > +__NR3264_truncate #define __NR_ftruncate64  __NR3264_ftruncate
> > +#define __NR__llseek  __NR3264_lseek #define __NR_sendfile64
> > +__NR3264_sendfile #define __NR_fstatat64  __NR3264_fstatat #define
> > +__NR_fstat64  __NR3264_fstat #define __NR_mmap2  __NR3264_mmap
> > +#define __NR_fadvise64_64  __NR3264_fadvise64
> 
> Is there a reason for these NR3264 redirections rather than just defining
> directly with the public names?

This is also some carryover from an older port.  I think they are not
needed.


> 
> > diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h new
> > file mode 100644 index 00000000..331a797e
> > --- /dev/null
> > +++ b/arch/hexagon/crt_arch.h
> > @@ -0,0 +1,35 @@
> > +__asm__(
> > +".weak _DYNAMIC \n"
> > +".hidden _DYNAMIC \n"
> > +".text \n"
> > +".global " START " \n"
> > +".type " START ", %function \n"
> > +START ": \n"
> > +"                                       // Find _DYNAMIC\n"
> > +"       jump 1f\n"
> > +".word  _DYNAMIC - .\n"
> > +"1:     r2 = pc\n"
> > +"       r2 = add(r2, #-4)\n"
> > +"       r1 = memw(r2)\n"
> > +"       r1 = add(r2, r1)\n"
> > +"	r30 = #0			// Signals the end of backtrace\n"
> > +"	r0 = r29			// Pointer to argc/argv\n"
> > +"	r29 = and(r29, #-16)		// Align\n"
> > +"	memw(r29+#-8) = r29\n"
> > +"	r29 = add(r29, #-8)\n"
> > +"	call " START "_c \n"
> > +".size " START ", .-" START "\n"
> > +);
> > +
> > +__asm__(
> > +".section \".note.ABI-tag\", \"a\" \n"
> > +".align 4 \n"
> > +".long 1f - 0f		/* name length */ \n"
> > +".long 3f - 2f		/* data length */ \n"
> > +".long  1		/* note type */ \n"
> > +"0:	.asciz \"GNU\"	/* vendor name seems like this should be
MUSL but
> lldb doesn't agree.*/ \n"
> > +"1:	.align 4		 \n"
> > +"2:	.long 0	/* linux */ \n"
> > +"	.long 3,0,0 \n"
> > +"3:	.align 4 \n"
> > +);
> 
> Is there a reason this needs to be here at all? Shouldn't the tooling
generate
> it if it's actually wanted/needed?

OK, this is here so lldb can select the right target allowing the same
version of lldb to work in multiple runtime environments. I need to take a
look at what the options are if this isn't a good place for this.

> 
> > diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h new file mode
> > 100644 index 00000000..14085872
> > --- /dev/null
> > +++ b/arch/hexagon/reloc.h
> > @@ -0,0 +1,25 @@
> > +#include <endian.h>
> > +
> > +#if __BYTE_ORDER == __LITTLE_ENDIAN
> > +#define ENDIAN_SUFFIX "el"
> > +#else
> > +#define ENDIAN_SUFFIX ""
> > +#endif
> 
> bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally little
> endian). Does Hexagon (hw and abi) actually support both byte orders? If
> not, I don't think there should be an endian suffix here at all.

No it does not I will remove it.

> 
> > diff --git a/include/elf.h b/include/elf.h index 549f92c1..54251c24
> > 100644
> > --- a/include/elf.h
> > +++ b/include/elf.h
> > @@ -3284,6 +3284,107 @@ enum
> >  #define R_RISCV_SET32           56
> >  #define R_RISCV_32_PCREL        57
> >
> > +#define R_HEX_NONE               0
> > [...]
> 
> I'd like to merge this separately first since it's independent of whether
> hexagon is a supported host.
> 
> > diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S new
> > file mode 100644 index 00000000..07b89764
> > --- /dev/null
> > +++ b/src/fenv/hexagon/fenv.S
> > @@ -0,0 +1,143 @@
> > +/*
> > +The Hexagon user status register includes five status fields which
> > +work as sticky flags for the five IEEE-defined exception conditions:
> > +inexact, overflow, underflow, divide by zero, and invalid. A sticky
> > +flag is set when the corresponding exception occurs, and remains set
until
> explicitly cleared.
> 
> Please format for at most 80 columns. (Some source files have a few longer
> lines, but if you're flowing text it should be flowed to 80 or
> fewer.)

Will fix those lines and a couple of others I noticed.

The changes I made can be seen here:
https://github.com/quic/musl/commit/4d714f2defcd926b4f8c0425af363b382d3084cb

I've also attached an updated patch.

Thanks