From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FROM_LOCAL_NOVOWEL,HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 2285825529 for ; Wed, 28 Feb 2024 18:20:59 +0100 (CET) Received: (qmail 3823 invoked by uid 550); 28 Feb 2024 17:17:19 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 3784 invoked from network); 28 Feb 2024 17:17:18 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709140845; x=1709745645; darn=lists.openwall.com; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pZu+zJ0SRzkOOVf3cmXv+AU18BoBu10IqrPOMxeNzy0=; b=jW03c3F15iqELYkM5s7rtQ5NEz7JwhFHoiCB5zeCN6x3o8i5CBExBhVsvjoQ495a0g EmEvw8t/CVUtO4zpVATpwVmnPOQLLUa4v+n6vD+lIcOOZLuHYaZ9GVmLmQDk0SKtwOTj iuwq/pJVQnJvqheAEwIBOW6JMHWpaGcaX85o6WhMEBfyJlxwvDoKea1QzeUS6zJudeGA BRcK2cgw4aprq11/PpFFxFCm/FF5Tg+Z1BFl2EUWgJx/J4W8P1HD1NpYOiqSKFcpr6zX 5veesvAqtbfAcFIeM+d+Q/pwrGFMdfD7/u0M/niTvJYXbaQ44DQMEov68BklZpXk+o3r WXeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709140845; x=1709745645; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pZu+zJ0SRzkOOVf3cmXv+AU18BoBu10IqrPOMxeNzy0=; b=apQRDiQIV/m9jMJOmUdjersfA3IP/+2aUTn6feZr89dSBXV+Nqxt3XTmM2wTbYLKqd +9pmXABIbBvDkAapldVDgnzALZFDIaL2Z2fCAVhEYP/f9bCwyRBU9B6xfYCr+wU+V2wo tDcxGkxmc384CJSiN+KZ/FrPb+3a9TVFQXGa1txpC2xqKiSkT/JyorlzOqfybRWdk0wE Mkr1u41WGQ7AXFvVfmXsKymbyYrSSZV2qPg2tpd6H5bv4lOOnE0cKkonf+un7P+4gtOv iEGLdLzzjEgl4OAz+b7znJ9Mmz+p2S0NA9w9OdgoTaYnCAUzG6wM/LvClHytK1hj9JzQ /1LQ== X-Gm-Message-State: AOJu0YwWvrVXUpS5pzvU1ird6QZ5nBQ3wg7/T3ZVvcpTWTat0RKJVXTF nCYewdxpTesx6iXXrBIiiwaE83CFYMH8iXes1Ypy93G7+5ihug+OnnTVRztXLAfk3IowrIoVrzK 8P4IkHFi2PMYw8xqPVjMnqWSqvX7hoMB2h9o= X-Google-Smtp-Source: AGHT+IFb8qhzO4Ww8ms2MFqFKEKVDyqljCzEKtBugDnbQp+V3bITWCStlGiuErfXqY8yUwNGYpRelTUtR93bdmw6Vz8= X-Received: by 2002:a05:6a20:f29:b0:19f:f059:c190 with SMTP id fl41-20020a056a200f2900b0019ff059c190mr5168935pzb.24.1709140844478; Wed, 28 Feb 2024 09:20:44 -0800 (PST) MIME-Version: 1.0 References: <20240227232430.GM4163@brightrain.aerifal.cx> <20240228001303.GN4163@brightrain.aerifal.cx> In-Reply-To: <20240228001303.GN4163@brightrain.aerifal.cx> From: Max Filippov Date: Wed, 28 Feb 2024 09:20:33 -0800 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [musl] Initial xtensa/fdpic port review Hi Rich, On Tue, Feb 27, 2024 at 4:12=E2=80=AFPM Rich Felker wrote= : > > First of all, I'm excited to see new work on this and especially the > FDPIC ABI. Not only is xtensa a very relevant arch I've been hoping we > could support for some time now, but having FDPIC will help getting > the parts of musl's FDPIC support that are currently making > sh-specific assumptions worked out so that we can get other new FDPIC > ports added too (particularly, arm and riscv). > > I'm not up-to-date on what toolchain status/stability is or how close > this is to being intended as upstreamable, but since I was pointed to > it and took a look, I wanted to record my initial review thoughts here > so they don't get lost and to hopefully move this closer to ready for > upstream. Thanks for your review. I expect to get toolchain components into shape for upstreaming during this year. > Review follows inline: > > On Tue, Feb 27, 2024 at 06:24:31PM -0500, Rich Felker wrote: > > Attached patches were pulled from > > https://github.com/jcmvbkbc/musl-xtensa/commits/xtensa-1.2.4-fdpic/ > > > > I'll reply to comment inline. > > > >From 868b34272f414323f10528d5b9988886541cb1c0 Mon Sep 17 00:00:00 2001 > > From: Max Filippov > > Date: Tue, 22 Mar 2016 02:35:58 +0300 > > Subject: [PATCH 1/2] xtensa: add port > > > > Signed-off-by: Max Filippov > > --- > > arch/xtensa/arch.mak | 1 + > > arch/xtensa/atomic_arch.h | 27 ++ > > arch/xtensa/bits/alltypes.h.in | 27 ++ > > arch/xtensa/bits/float.h | 16 ++ > > arch/xtensa/bits/ioctl.h | 219 ++++++++++++++ > > arch/xtensa/bits/limits.h | 1 + > > arch/xtensa/bits/mman.h | 68 +++++ > > arch/xtensa/bits/posix.h | 2 + > > arch/xtensa/bits/reg.h | 2 + > > arch/xtensa/bits/setjmp.h | 5 + > > arch/xtensa/bits/signal.h | 88 ++++++ > > arch/xtensa/bits/stat.h | 24 ++ > > arch/xtensa/bits/stdint.h | 20 ++ > > arch/xtensa/bits/syscall.h.in | 396 ++++++++++++++++++++++++++ > > arch/xtensa/bits/termios.h | 167 +++++++++++ > > arch/xtensa/bits/user.h | 4 + > > arch/xtensa/bits/xtensa-config.h | 8 + > > arch/xtensa/crt_arch.h | 48 ++++ > > arch/xtensa/kstat.h | 18 ++ > > arch/xtensa/pthread_arch.h | 11 + > > arch/xtensa/reloc.h | 38 +++ > > arch/xtensa/syscall_arch.h | 104 +++++++ > > configure | 1 + > > crt/xtensa/crti.S | 29 ++ > > crt/xtensa/crtn.S | 23 ++ > > include/elf.h | 70 +++++ > > src/internal/xtensa/syscall.S | 25 ++ > > src/ldso/xtensa/tlsdesc.S | 36 +++ > > src/process/xtensa/vfork.S | 21 ++ > > src/setjmp/xtensa/longjmp.S | 22 ++ > > src/setjmp/xtensa/setjmp.S | 25 ++ > > src/signal/xtensa/restore.s | 10 + > > src/signal/xtensa/sigsetjmp.S | 24 ++ > > src/thread/xtensa/__set_thread_area.S | 16 ++ > > src/thread/xtensa/__unmapself.S | 13 + > > src/thread/xtensa/clone.S | 46 +++ > > src/thread/xtensa/syscall_cp.S | 38 +++ > > 37 files changed, 1693 insertions(+) > > create mode 100644 arch/xtensa/arch.mak > > create mode 100644 arch/xtensa/atomic_arch.h > > create mode 100644 arch/xtensa/bits/alltypes.h.in > > create mode 100644 arch/xtensa/bits/float.h > > create mode 100644 arch/xtensa/bits/ioctl.h > > create mode 100644 arch/xtensa/bits/limits.h > > create mode 100644 arch/xtensa/bits/mman.h > > create mode 100644 arch/xtensa/bits/posix.h > > create mode 100644 arch/xtensa/bits/reg.h > > create mode 100644 arch/xtensa/bits/setjmp.h > > create mode 100644 arch/xtensa/bits/signal.h > > create mode 100644 arch/xtensa/bits/stat.h > > create mode 100644 arch/xtensa/bits/stdint.h > > create mode 100644 arch/xtensa/bits/syscall.h.in > > create mode 100644 arch/xtensa/bits/termios.h > > create mode 100644 arch/xtensa/bits/user.h > > create mode 100644 arch/xtensa/bits/xtensa-config.h > > create mode 100644 arch/xtensa/crt_arch.h > > create mode 100644 arch/xtensa/kstat.h > > create mode 100644 arch/xtensa/pthread_arch.h > > create mode 100644 arch/xtensa/reloc.h > > create mode 100644 arch/xtensa/syscall_arch.h > > create mode 100644 crt/xtensa/crti.S > > create mode 100644 crt/xtensa/crtn.S > > create mode 100644 src/internal/xtensa/syscall.S > > create mode 100644 src/ldso/xtensa/tlsdesc.S > > create mode 100644 src/process/xtensa/vfork.S > > create mode 100644 src/setjmp/xtensa/longjmp.S > > create mode 100644 src/setjmp/xtensa/setjmp.S > > create mode 100644 src/signal/xtensa/restore.s > > create mode 100644 src/signal/xtensa/sigsetjmp.S > > create mode 100644 src/thread/xtensa/__set_thread_area.S > > create mode 100644 src/thread/xtensa/__unmapself.S > > create mode 100644 src/thread/xtensa/clone.S > > create mode 100644 src/thread/xtensa/syscall_cp.S > > > > diff --git a/arch/xtensa/arch.mak b/arch/xtensa/arch.mak > > new file mode 100644 > > index 00000000..aa4d05ce > > --- /dev/null > > +++ b/arch/xtensa/arch.mak > > @@ -0,0 +1 @@ > > +COMPAT_SRC_DIRS =3D compat/time32 > > diff --git a/arch/xtensa/atomic_arch.h b/arch/xtensa/atomic_arch.h > > new file mode 100644 > > index 00000000..34bf0704 > > --- /dev/null > > +++ b/arch/xtensa/atomic_arch.h > > @@ -0,0 +1,27 @@ > > +#include "bits/xtensa-config.h" > > This is not what bits headers are for; they are public-installed > headers that provide parts of the standard public headers that vary by > arch, and are never directly #include'able. > > If you really need arch-internal headers like this just put them in > the top-level dir of the arch. But in the case of these macros, either > just use the compiler-provided, __-prefixed ones directly, or if > they're not actually variable, don't inspect them at all. I carry this header from the time when configuration-specific header file was the only available configuration option. __XCHAL_* macros only appeared in gcc-13, but since FDPIC support will only be available on toolchains that have these macros I guess I'll drop the header and use __XCHAL_* macros directly. > > + > > +#if XCHAL_HAVE_S32C1I > > +#define a_cas a_cas > > +static inline int a_cas(volatile int *p, int t, int s) > > +{ > > + __asm__ __volatile__ ( > > + " wsr %2, scompare1\n" > > + " s32c1i %0, %1\n" > > + : "+a"(s), "+m"(*p) > > + : "a"(t) > > + : "memory" ); > > + return s; > > +} > > +#endif > > For example this is not a useful test, because there's nothing you can > do in the case where it's not defined; This test is in the spirit of xtensa: it covers one possible configuration option. There's at least one other hardware option (exclusive access instructions) available for this functionality and a fallback to syscalls. They're not li= sted here because I tried to keep this initial implementation to a bare minimum. > it's just not supportable > (unless there's some kernel fallback mechanism like ARM and SH have; > then it would make sense to hard-code the cas instruction as above if > the ISA level is known to have it, and otherwise do conditional > runtime selection (again, like ARM and SH). Xtensa is quite limited on the runtime selection, because opcodes belonging to the options not enabled for a particular core just would not assemble. > > diff --git a/arch/xtensa/bits/alltypes.h.in b/arch/xtensa/bits/alltypes= .h.in > > new file mode 100644 > > index 00000000..21221ce5 > > --- /dev/null > > +++ b/arch/xtensa/bits/alltypes.h.in > > @@ -0,0 +1,27 @@ > > +#define _REDIR_TIME64 1 > > +#define _Addr int > > +#define _Int64 long long > > +#define _Reg int > > + > > +#if __XTENSA_EB__ > > +#define __BYTE_ORDER 4321 > > +#elif __XTENSA_EL__ > > +#define __BYTE_ORDER 1234 > > +#else > > +#error Unknown endianness > > +#endif > > + > > +#define __LONG_MAX 0x7fffffffL > > + > > +#ifndef __cplusplus > > +#ifdef __WCHAR_TYPE__ > > +TYPEDEF __WCHAR_TYPE__ wchar_t; > > +#else > > +TYPEDEF unsigned wchar_t; > > +#endif > > +#endif > > + > > +TYPEDEF float float_t; > > +TYPEDEF double double_t; > > + > > +TYPEDEF struct { long __l __attribute__((aligned(__BIGGEST_ALIGNMENT__= ))); } max_align_t; > > It's best to avoid macros like __BIGGEST_ALIGNMENT__ here that could > change out from under us in the future, thereby quietly breaking the > ABI. Whatever the ABI is, that's what the alignment here should be, > and just using a type that naturally has the alignment if possible > rather than attributes. The thing is: we don't know the name of the type with the widest alignment, as indeed in the future someone may come up with a configuration with an arbitrarily named type with an alignment as big as allowed physically. But then its their responsibility to keep the __BIGGEST_ALIGNMENT__ consistent across the updates of that configuration that intend to stay compatible with each other. > > diff --git a/arch/xtensa/bits/mman.h b/arch/xtensa/bits/mman.h > > new file mode 100644 > > index 00000000..d2f58eb1 > > --- /dev/null > > +++ b/arch/xtensa/bits/mman.h > > @@ -0,0 +1,68 @@ > > +#define MAP_FAILED ((void *) -1) > > + > > +#define PROT_NONE 0 > > +#define PROT_READ 1 > > +#define PROT_WRITE 2 > > +#define PROT_EXEC 4 > > +#define PROT_GROWSDOWN 0x01000000 > > +#define PROT_GROWSUP 0x02000000 > > + > > +#define MAP_SHARED 0x01 > > +#define MAP_PRIVATE 0x02 > > +#define MAP_FIXED 0x10 > > + > > +#define MAP_TYPE 0x0f > > +#undef MAP_FILE > > +#define MAP_FILE 0x00 > > +#undef MAP_ANON > > +#define MAP_ANON 0x800 > > +#define MAP_ANONYMOUS MAP_ANON > > +#undef MAP_NORESERVE > > +#define MAP_NORESERVE 0x0400 > > +#undef MAP_GROWSDOWN > > +#define MAP_GROWSDOWN 0x1000 > > +#undef MAP_DENYWRITE > > +#define MAP_DENYWRITE 0x2000 > > +#undef MAP_EXECUTABLE > > +#define MAP_EXECUTABLE 0x4000 > > +#undef MAP_LOCKED > > +#define MAP_LOCKED 0x8000 > > +#undef MAP_POPULATE > > +#define MAP_POPULATE 0x10000 > > +#undef MAP_NONBLOCK > > +#define MAP_NONBLOCK 0x20000 > > +#undef MAP_STACK > > +#define MAP_STACK 0x40000 > > +#undef MAP_HUGETLB > > +#define MAP_HUGETLB 0x80000 > > + > > +#define POSIX_MADV_NORMAL 0 > > +#define POSIX_MADV_RANDOM 1 > > +#define POSIX_MADV_SEQUENTIAL 2 > > +#define POSIX_MADV_WILLNEED 3 > > +#define POSIX_MADV_DONTNEED 4 > > + > > +#define MS_ASYNC 1 > > +#define MS_INVALIDATE 2 > > +#define MS_SYNC 4 > > + > > +#define MCL_CURRENT 1 > > +#define MCL_FUTURE 2 > > + > > +#if defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > > +#define MADV_NORMAL 0 > > +#define MADV_RANDOM 1 > > +#define MADV_SEQUENTIAL 2 > > +#define MADV_WILLNEED 3 > > +#define MADV_DONTNEED 4 > > +#define MADV_REMOVE 9 > > +#define MADV_DONTFORK 10 > > +#define MADV_DOFORK 11 > > +#define MADV_MERGEABLE 12 > > +#define MADV_UNMERGEABLE 13 > > +#define MADV_HUGEPAGE 14 > > +#define MADV_NOHUGEPAGE 15 > > +#define MADV_DONTDUMP 16 > > +#define MADV_DODUMP 17 > > +#define MADV_HWPOISON 100 > > +#endif > > A lot of these don't vary from the standard values, and including them > (especially since most will be duplicate/re- definitions) obscures > what actually is arch-specific here. Ok, I'll drop the duplicates. > > diff --git a/arch/xtensa/bits/setjmp.h b/arch/xtensa/bits/setjmp.h > > new file mode 100644 > > index 00000000..e1a6dd97 > > --- /dev/null > > +++ b/arch/xtensa/bits/setjmp.h > > @@ -0,0 +1,5 @@ > > +#if defined(__XTENSA_CALL0_ABI__) > > +typedef unsigned long __jmp_buf[6]; > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > See notes later on reloc.h regarding ABIs. > > > diff --git a/arch/xtensa/bits/signal.h b/arch/xtensa/bits/signal.h > > new file mode 100644 > > index 00000000..545ffd36 > > --- /dev/null > > +++ b/arch/xtensa/bits/signal.h > > @@ -0,0 +1,88 @@ > > +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \ > > + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOU= RCE) > > + > > +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOU= RCE) > > +#define MINSIGSTKSZ 2048 > > +#define SIGSTKSZ 8192 > > +#endif > > + > > +typedef struct sigcontext > > +{ > > + unsigned long sc_pc; > > + unsigned long sc_ps; > > + unsigned long sc_lbeg; > > + unsigned long sc_lend; > > + unsigned long sc_lcount; > > + unsigned long sc_sar; > > + unsigned long sc_acclo; > > + unsigned long sc_acchi; > > + unsigned long sc_a[16]; > > + void *sc_xtregs; > > +} mcontext_t; > > This needs a namespace-safe alternate definition when in > standards-conforming profile. See how it's done on most other archs. > (It can just be a dummy struct of the right size if you like.) Ok. > > diff --git a/arch/xtensa/bits/xtensa-config.h b/arch/xtensa/bits/xtensa= -config.h > > new file mode 100644 > > index 00000000..a2e95904 > > --- /dev/null > > +++ b/arch/xtensa/bits/xtensa-config.h > > @@ -0,0 +1,8 @@ > > +#undef XCHAL_NUM_AREGS > > +#define XCHAL_NUM_AREGS __XCHAL_NUM_AREGS > > + > > +#undef XCHAL_HAVE_S32C1I > > +#define XCHAL_HAVE_S32C1I __XCHAL_HAVE_S32C1I > > + > > +#undef XCHAL_HAVE_EXCLUSIVE > > +#define XCHAL_HAVE_EXCLUSIVE __XCHAL_HAVE_EXCLUSIVE > > See notes above on this header. > > > diff --git a/arch/xtensa/reloc.h b/arch/xtensa/reloc.h > > new file mode 100644 > > index 00000000..03482f18 > > --- /dev/null > > +++ b/arch/xtensa/reloc.h > > @@ -0,0 +1,38 @@ > > +#if __BYTE_ORDER =3D=3D __BIG_ENDIAN > > +#define ENDIAN_SUFFIX "eb" > > +#else > > +#define ENDIAN_SUFFIX "" > > +#endif > > + > > +#if __FDPIC__ > > +#define ABI_SUFFIX "-fdpic" > > +#else > > +#define ABI_SUFFIX "" > > +#endif > > + > > +#define LDSO_ARCH "xtensa" ENDIAN_SUFFIX ABI_SUFFIX > > Any actual incompatible ABI (as opposed to ISA level) needs its own > LDSO name. There's no real ISA levels in the xtensa. Every CPU core configuration is its own unique ISA. > So if there are alternate (call0/windowed?) ABIs you want > to support, different names need to be defined for them here and in > configure. I understand that it means that I should drop the endianness suffix, as the endianness of each core is fixed, but add windowed/call0 discriminator. > Unless there's a good reason to support more than one ABI, > it probably just makes sense to pick the preferred one and have > configure error-out if the toolchain's ABI is incompatible. I still have hope to support windowed FDPIC and then non-FDPIC variants. > > diff --git a/crt/xtensa/crti.S b/crt/xtensa/crti.S > > new file mode 100644 > > index 00000000..82a837e7 > > --- /dev/null > > +++ b/crt/xtensa/crti.S > > @@ -0,0 +1,29 @@ > > +.section .init > > +.global _init > > +.type _init, @function > > +_init: > > +#if defined(__XTENSA_CALL0_ABI__) > > + addi sp, sp, -16 > > + s32i a0, sp, 0 > > +#ifdef __FDPIC__ > > + s32i a12, sp, 4 > > + mov a12, a11 > > +#endif > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > Same thing regarding ABIs. > > > diff --git a/src/internal/xtensa/syscall.S b/src/internal/xtensa/syscal= l.S > > new file mode 100644 > > index 00000000..f09a46e8 > > --- /dev/null > > +++ b/src/internal/xtensa/syscall.S > > @@ -0,0 +1,25 @@ > > +.global __syscall > > +.hidden __syscall > > +.type __syscall,@function > > +.align 4 > > +__syscall: > > +#ifdef __XTENSA_WINDOWED_ABI__ > > + entry a1, 16 > > +#endif > > + mov a8, a3 > > + mov a3, a4 > > + mov a4, a5 > > + mov a5, a6 > > + mov a6, a8 > > + mov a8, a7 > > +#if defined(__XTENSA_WINDOWED_ABI__) > > + l32i a9, a1, 16 > > + syscall > > + retw > > +#elif defined(__XTENSA_CALL0_ABI__) > > + l32i a9, a1, 0 > > + syscall > > + ret > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > And here too. > > > diff --git a/src/ldso/xtensa/tlsdesc.S b/src/ldso/xtensa/tlsdesc.S > > new file mode 100644 > > index 00000000..084d5f3e > > --- /dev/null > > +++ b/src/ldso/xtensa/tlsdesc.S > > @@ -0,0 +1,36 @@ > > +.global __tlsdesc_static > > +.hidden __tlsdesc_static > > +.type __tlsdesc_static,@function > > +.align 4 > > +__tlsdesc_static: > > +#ifdef __XTENSA_WINDOWED_ABI__ > > + entry a1, 16 > > +#endif > > + rur a3, threadptr > > + add a2, a2, a3 > > +#if defined(__XTENSA_WINDOWED_ABI__) > > + retw > > +#elif defined(__XTENSA_CALL0_ABI__) > > + ret > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > Are you sure the entry step makes sense with tlsdesc? Is this honoring > the ABI for what the tlsdesc function is allowed to clobber? But the tlsdesc function is just a normal function, it can do whatever a regular function can do AFAIU. In addition there are no relaxations that can replace non-call code with a variant with a call. 'entry' is the right way to begin a regular function in windowed xtensa ABI= . > > +.hidden __tls_get_new > > + > > +.global __tlsdesc_dynamic > > +.hidden __tlsdesc_dynamic > > +.type __tlsdesc_dynamic,@function > > +.align 4 > > +__tlsdesc_dynamic: > > +#if defined(__XTENSA_WINDOWED_ABI__) > > + entry a1, 16 > > + mov a6, a2 > > + call4 __tls_get_addr > > + mov a2, a6 > > + retw > > +#elif defined(__XTENSA_CALL0_ABI__) > > + j __tls_get_addr > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > This approach is no longer a valid implementation for > __tlsdesc_dynamic, because it calls C code which could clobber any > call-clobbered registers, not just the ones the tlsdesc function is > allowed to clobber. > > Instead, it needs to actually do the dtv lookup in asm. musl does not > use any dynamic allocation here, so the lookup is guaranteed to just > work using the indices found with no conditional branches. See the > recently added riscv code for a clean example, in commit > 407aea628af8c81d9e3f5a068568f2217db71bba. Yeah, this whole xtensa TLS code should be in the WIP pile. I'll rewrite it in assembly. > > diff --git a/src/process/xtensa/vfork.S b/src/process/xtensa/vfork.S > > new file mode 100644 > > index 00000000..25478b87 > > --- /dev/null > > +++ b/src/process/xtensa/vfork.S > > @@ -0,0 +1,21 @@ > > +.global vfork > > +.type vfork,@function > > +vfork: > > +#if defined(__XTENSA_CALL0_ABI__) > > + movi a2, 116 # __NR_clone > > + movi a3, 0 > > + movi a6, 0x4111 # CLONE_VM | CLONE_VFORK | SIGCHLD > > + syscall > > + > > +#ifdef __FDPIC__ > > +.hidden __syscall_ret > > + movi a9, __syscall_ret@GOT > > + add a9, a9, a11 > > + l32i a9, a9, 0 > > + jx a9 > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > Same re: ABIs. But maybe there's intent to support a non-FDPIC ABI > too? Right, since there's no intent to support non-FDPIC ABI at this time the FDPIC condition can be dropped here. > > diff --git a/src/setjmp/xtensa/longjmp.S b/src/setjmp/xtensa/longjmp.S > > new file mode 100644 > > index 00000000..602093c9 > > --- /dev/null > > +++ b/src/setjmp/xtensa/longjmp.S > > @@ -0,0 +1,22 @@ > > +.global _longjmp > > +.global longjmp > > +.type _longjmp,@function > > +.type longjmp,@function > > +.align 4 > > +_longjmp: > > +longjmp: > > +#if defined(__XTENSA_CALL0_ABI__) > > + l32i a0, a2, 0 > > + l32i a1, a2, 4 > > + l32i a12, a2, 8 > > + l32i a13, a2, 12 > > + l32i a14, a2, 16 > > + l32i a15, a2, 20 > > + > > + movi a2, 1 > > + movnez a2, a3, a3 > > + > > + ret > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > diff --git a/src/setjmp/xtensa/setjmp.S b/src/setjmp/xtensa/setjmp.S > > new file mode 100644 > > index 00000000..3a735a6a > > --- /dev/null > > +++ b/src/setjmp/xtensa/setjmp.S > > @@ -0,0 +1,25 @@ > > +.global ___setjmp > > +.hidden ___setjmp > > +.global __setjmp > > +.global _setjmp > > +.global setjmp > > +.type __setjmp,@function > > +.type _setjmp,@function > > +.type setjmp,@function > > +.align 4 > > +___setjmp: > > +__setjmp: > > +_setjmp: > > +setjmp: > > +#if defined(__XTENSA_CALL0_ABI__) > > + s32i a0, a2, 0 > > + s32i a1, a2, 4 > > + s32i a12, a2, 8 > > + s32i a13, a2, 12 > > + s32i a14, a2, 16 > > + s32i a15, a2, 20 > > + movi a2, 0 > > + ret > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > Ditto. > > > diff --git a/src/signal/xtensa/sigsetjmp.S b/src/signal/xtensa/sigsetjm= p.S > > new file mode 100644 > > index 00000000..e44bba88 > > --- /dev/null > > +++ b/src/signal/xtensa/sigsetjmp.S > > @@ -0,0 +1,24 @@ > > +.global sigsetjmp > > +.global __sigsetjmp > > +.type sigsetjmp,@function > > +.type __sigsetjmp,@function > > +.align 4 > > +sigsetjmp: > > +__sigsetjmp: > > +#if defined(__XTENSA_CALL0_ABI__) > > + bnez a3, 1f > > +.hidden ___setjmp > > + j ___setjmp > > +1: > > + mov a3, a0 > > + mov a4, a2 > > + call0 ___setjmp > > + mov a0, a3 > > + mov a3, a2 > > + mov a2, a4 > > + > > +.hidden __sigsetjmp_tail > > + j __sigsetjmp_tail > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > I don't think this code works. It does not seem to save the return > address and argument address before calling ___setjmp, so they will be > clobbered after it returns. It does: + mov a3, a0 + mov a4, a2 do exactly this and ___setjmp is hidden, so it's always going to be the implementation above that doesn't clobber a3 and a4. > > diff --git a/src/thread/xtensa/__set_thread_area.S b/src/thread/xtensa/= __set_thread_area.S > > new file mode 100644 > > index 00000000..1f402125 > > --- /dev/null > > +++ b/src/thread/xtensa/__set_thread_area.S > > @@ -0,0 +1,16 @@ > > +.global __set_thread_area > > +.type __set_thread_area,@function > > +.align 4 > > +__set_thread_area: > > +#ifdef __XTENSA_WINDOWED_ABI__ > > + entry a1, 16 > > +#endif > > + wur a2, threadptr > > + movi a2, 0 > > +#if defined(__XTENSA_WINDOWED_ABI__) > > + retw > > +#elif defined(__XTENSA_CALL0_ABI__) > > + ret > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > FWIW, few archs do this yet, but this function can be written in C > with inline asm to set the register, and that would avoid any ABI > conditionals if they're ever needed. Ok. > Eventually I'd like to move most functions like this, that are > currently external asm but have no good reason to be (unlike setjmp, > vfork, etc. which *do* have a good reason to be), to minimal inline > asm in C. Some archs' versions of this file would even just become > __syscall, no further asm. > > > diff --git a/src/thread/xtensa/__unmapself.S b/src/thread/xtensa/__unma= pself.S > > new file mode 100644 > > index 00000000..15bf2bdf > > --- /dev/null > > +++ b/src/thread/xtensa/__unmapself.S > > @@ -0,0 +1,13 @@ > > +.global __unmapself > > +.type __unmapself,@function > > +.align 4 > > +__unmapself: > > +#if defined(__XTENSA_CALL0_ABI__) > > + mov a6, a2 > > + movi a2, 81 # SYS_munmap > > + syscall > > + movi a2, 118 # SYS_exit > > + syscall > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > This is unsafe on fdpic/nommu if the kernel requires the task to > always have a valid stack. How do I know if the kernel requires that? > You might want to check how it's done on SH > and whether something similar is needed here. The idea is that it > temporarily switches to an alternate stack protected by the global > thread list lock (so only one thread can be using it at a time) in > order to make it safe to unmap the thread's own stack. The thread list > is then unlocked by the kernel when the thread dies (via the exit > futex). > > > diff --git a/src/thread/xtensa/clone.S b/src/thread/xtensa/clone.S > > new file mode 100644 > > index 00000000..8e8514d6 > > --- /dev/null > > +++ b/src/thread/xtensa/clone.S > > @@ -0,0 +1,46 @@ > > +# __clone(func, stack, flags, arg, ptid, tls, ctid) > > +# a2, a3, a4, a5, a6, a7, [sp] > > + > > +# syscall(SYS_clone, flags, stack, ptid, tls, ctid) > > +# a2, a6, a3, a4, a5, a8 > > + > > +.global __clone > > +.type __clone,@function > > +.align 4 > > +__clone: > > +#if defined(__XTENSA_CALL0_ABI__) > > + # align stack and save func,arg > > + srli a3, a3, 4 > > + slli a3, a3, 4 > > + addi a3, a3, -16 > > + s32i a2, a3, 0 > > + s32i a5, a3, 4 > > + > > + # syscall > > + mov a2, a4 > > + mov a4, a6 > > + mov a6, a2 > > + mov a5, a7 > > + l32i a8, a1, 16 > > + movi a2, 116 # SYS_clone > > + syscall > > + > > + beqz a2, 1f > > + # parent > > + ret > > + > > + # child > > +1: > > + l32i a0, a1, 0 > > + l32i a2, a1, 4 > > +#ifdef __FDPIC__ > > + l32i a11, a0, 4 > > + l32i a0, a0, 0 > > +#endif > > + callx0 a0 > > + mov a6, a2 > > + movi a2, 118 # SYS_exit > > + syscall > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > Same re: ABIs. > > > diff --git a/src/thread/xtensa/syscall_cp.S b/src/thread/xtensa/syscall= _cp.S > > new file mode 100644 > > index 00000000..de7c3e66 > > --- /dev/null > > +++ b/src/thread/xtensa/syscall_cp.S > > @@ -0,0 +1,38 @@ > > +# __syscall_cp_asm(&self->cancel, nr, u, v, w, x, y, z) > > +# a2 a3 a4 a5 a6 a7 [sp] [sp+4] > > + > > +# syscall(nr, u, v, w, x, y, z) > > +# a2 a6 a3 a4 a5 a8 a9 > > + > > +.global __cp_begin > > +.hidden __cp_begin > > +.global __cp_end > > +.hidden __cp_end > > +.global __cp_cancel > > +.hidden __cp_cancel > > +.hidden __cancel > > +.global __syscall_cp_asm > > +.hidden __syscall_cp_asm > > +.type __syscall_cp_asm,@function > > +.align 4 > > +#if defined(__XTENSA_CALL0_ABI__) > > +__syscall_cp_asm: > > +__cp_begin: > > + l32i a2, a2, 0 > > + bnez a2, __cp_cancel > > + mov a2, a4 > > + mov a4, a6 > > + mov a6, a2 > > + mov a2, a3 > > + mov a3, a5 > > + mov a5, a7 > > + l32i a8, a1, 0 > > + l32i a9, a1, 4 > > + syscall > > +__cp_end: > > + ret > > +__cp_cancel: > > + j __cancel > > +#else > > +#error Unsupported Xtensa ABI > > +#endif > > -- > > 2.21.0 > > And this one. > > > > >From f8c5953ecdb948282cc8e573b729c25db60a95a8 Mon Sep 17 00:00:00 2001 > > From: Max Filippov > > Date: Wed, 21 Feb 2024 08:27:37 -0800 > > Subject: [PATCH 2/2] WIP > > > > Signed-off-by: Max Filippov > > --- > > ldso/dlstart.c | 7 +++++++ > > ldso/dynlink.c | 6 ++++-- > > src/internal/dynlink.h | 5 +++-- > > 3 files changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/ldso/dlstart.c b/ldso/dlstart.c > > index 259f5e18..beca953f 100644 > > --- a/ldso/dlstart.c > > +++ b/ldso/dlstart.c > > @@ -90,12 +90,19 @@ hidden void _dlstart_c(size_t *sp, size_t *dynv) > > - segs[rel_addr[1]].p_vaddr > > + syms[R_SYM(rel[1])].st_value; > > rel_addr[1] =3D dyn[DT_PLTGOT]; > > + } else if (R_TYPE(rel[1]) =3D=3D REL_RELATIVE) { > > + size_t val =3D *rel_addr; > > + for (j=3D0; val-segs[j].p_vaddr >=3D segs[j].p_me= msz; j++); > > + *rel_addr +=3D segs[j].addr - segs[j].p_vaddr; > > So xtensa has a "relative" reloc type that's just adjusted by the load > offset of the segment the relocation lives in, rather than needing to > use a symbolic relocation referencing a section symbol like other > fdpic archs do? I was looking at the ARM BFD code while doing that and my impression was that they do the same. Regardless, I wonder why either relocation form might be preferrable? > If so, this looks right and looks ok. > > > } else { > > size_t val =3D syms[R_SYM(rel[1])].st_value; > > for (j=3D0; val-segs[j].p_vaddr >=3D segs[j].p_me= msz; j++); > > *rel_addr =3D rel[2] + segs[j].addr - segs[j].p_v= addr + val; > > } > > } > > +#ifdef __xtensa__ > > + ((unsigned long *)dyn[DT_PLTGOT])[3] =3D segs[0].addr - segs[0].p= _vaddr; > > +#endif > > Is this actually needed for anything? Generally musl doesn't use the > reserved GOT slots itself, and on all the other archs I'm aware of, > they're essentially reserved to the dynamic linker implementation so > the dynamic linker is just free not to use them and not to set them > up. xtensa doesn't have relative register jumps and calls, so local jumps and calls to a far off locations need to use absolute target addresses. One possible solution is to have the address in the GOT entry, the other is to calculate the target address using the text segment load offset at runtime. Both have the same instruction count, see http://wiki.osll.ru/doku.php/etc:users:jcmvbkbc:binutils-xtensa#local_cal= l for the details, but the latter doesn't waste GOT space and that saves a noticeable amount of RAM. > > #else > > /* If the dynamic linker is invoked as a command, its load > > * address is not available in the aux vector. Instead, compute > > diff --git a/ldso/dynlink.c b/ldso/dynlink.c > > index ceca3c98..25563af3 100644 > > --- a/ldso/dynlink.c > > +++ b/ldso/dynlink.c > > @@ -1420,6 +1420,7 @@ static void reloc_all(struct dso *p) > > if (!DL_FDPIC) > > do_relr_relocs(p, laddr(p, dyn[DT_RELR]), dyn[DT_= RELRSZ]); > > > > +#if 0 > > if (head !=3D &ldso && p->relro_start !=3D p->relro_end) = { > > long ret =3D __syscall(SYS_mprotect, laddr(p, p->= relro_start), > > p->relro_end-p->relro_start, PROT_READ); > > @@ -1429,6 +1430,7 @@ static void reloc_all(struct dso *p) > > if (runtime) longjmp(*rtld_fail, 1); > > } > > } > > +#endif > > Was this breaking something? Relro linking probably should have been > disabled on fdpic, and we ignore ENOSYS anyway for nommu. Yeah, I do some of the testing in linux-user QEMU, it has MMU and mprotect calls actually succeed. Failures were coming from the relocation code, but my impression was that relro should be applied after the relocation completion and it should just work, hence the WIP pile. > > p->relocated =3D 1; > > } > > @@ -1485,7 +1487,7 @@ void __libc_exit_fini() > > if (dyn[0] & (1< > size_t n =3D dyn[DT_FINI_ARRAYSZ]/sizeof(size_t); > > size_t *fn =3D (size_t *)laddr(p, dyn[DT_FINI_ARR= AY])+n; > > - while (n--) ((void (*)(void))*--fn)(); > > + while (n--) fpaddr(p, *--fn)(); > > If this is fixable on the tooling side it really should be fixed > there. init/fini arrays should have actual language-level function > addresses (descriptor addresses on fdpic), not instruction addresses. I read libgcc code at https://github.com/jcmvbkbc/gcc-xtensa/blob/xtensa-14-8789-fdpic/libgcc/c= rtstuff.c#L498-L503 and the way it's written suggests that this was done on purpose. I put it into the WIP pile to figure out later what the purpose was. I thought that SH might not have this issue because it just didn't use the .array_init/.array_fini. > If it's not fixable at this point, I guess we need the arch's reloc.h > to define a macro signaling this difference in behavior. > > I have no idea how this would be expected to work on static linked > programs... That's true it's not working in this code version. > > diff --git a/src/internal/dynlink.h b/src/internal/dynlink.h > > index 06f41d09..6485e883 100644 > > --- a/src/internal/dynlink.h > > +++ b/src/internal/dynlink.h > > @@ -78,10 +78,11 @@ struct fdpic_dummy_loadmap { > > (R_TYPE(x) =3D=3D REL_RELATIVE) || \ > > (R_TYPE(x) =3D=3D REL_SYM_OR_REL && !R_SYM(x)) ) > > #else > > -#define IS_RELATIVE(x,s) ( ( \ > > +#define IS_RELATIVE(x,s) ( \ > > + (R_TYPE(x) =3D=3D REL_RELATIVE) || ( ( \ > > (R_TYPE(x) =3D=3D REL_FUNCDESC_VAL) || \ > > (R_TYPE(x) =3D=3D REL_SYMBOLIC) ) \ > > - && (((s)[R_SYM(x)].st_info & 0xf) =3D=3D STT_SECTION) ) > > + && (((s)[R_SYM(x)].st_info & 0xf) =3D=3D STT_SECTION) ) ) > > #endif > > > > #ifndef NEED_MIPS_GOT_RELOCS > > -- > > 2.21.0 > > > > This looks ok assuming the above about relative relocs. --=20 Thanks. -- Max