mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Jesse DeGuire <jesse.a.deguire@gmail.com>
To: Patrick Oppenlander <patrick.oppenlander@gmail.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Musl on Cortex-M Devices
Date: Tue, 5 Jan 2021 22:24:48 -0500	[thread overview]
Message-ID: <CALqyXLh0zwXPzvoBLaKx=jQx1eiQ22S2CSjsN47FntJAryst4w@mail.gmail.com> (raw)
In-Reply-To: <CAEg67GmUgS3_aUtTHn3NJRS31GeppWvkBWh-8n9VG2Bmtv-HeQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 11397 bytes --]

Hi everyone,

Here's attempt three at Musl on Arm M-profile devices. I think I've
incorporated all of the suggestions in this email thread. Let me know
your thoughts.

More specific responses are below.

On Mon, Dec 21, 2020 at 8:39 PM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Dec 21, 2020 at 06:58:47PM -0500, Jesse DeGuire wrote:
> > On Fri, Dec 18, 2020 at 12:30 PM Rich Felker <dalias@libc.org> wrote:
> > > If it lacks LDREX and STREX how do you implement atomic? I don't see
> > > where you're adding any alternative, so is v6-m support
> > > non-functional? That rather defeats the purpose of doing anything to
> > > support it...
> >
> > Correct, I haven't yet added an alternative. Arm's answer--and what we
> > generally do in the embedded world--is to disable interrupts using
> > "cpsid", do your thing, then re-enable interrupts with "cpsie". This
> > could be done with a new "__a_cas_v6m" variant that I'd add to
> > atomics.s. This still won't work for Linux because the "cps(ie|id)"
> > instruction is effectively a no-op if it is executed in an
> > unprivileged context (meaning you can't trap and emulate it). You'd be
> > looking at another system call if you really wanted v6-m Linux. That
> > said, this could let Musl work on v6-m in a bare metal or RTOS
> > environment, which I think Musl would be great for, and so I'd still
> > work on adding support for it. Also, not all v6-m devices support a
> > privilege model and run as though everything is privileged.
> > ARMv8-M.base is similar to v6-m with LDREX and STREX and so that could
> > have full support.
>
> I'm not sure what the right answer for this is and whether it makes
> support suitable for upstream or not at this point. We should probably
> investigate further. If LDREX/STREX are trappable we could just use
> them and require the kernel to trap and emulate but that's kinda ugly
> (llsc-type atomics are much harder to emulate correctly than a cas
> primitive).

LDREX/STREX should cause a fault on v6m that can be trapped and
handled. It's unfortunate that CPSID/CPSIE are ignored because
trapping on those seem like they'd be easier for whoever has to handle
them.

My current solution is to use the HWCAP_TLS flag on M-profile devices
to indicate how to handle this. If it's set, then the code will use
STREX, LDREX, and MRC with the assumption that they will be trapped
and emulated. If the flag is cleared, then the code will use the ARM
get_tls syscall instead of MRC and will use interrupt masking only if
the platform (aux{AT_PLATFORM]) indicates a "v6" device.

This does mean that I'm overloading an already overloaded flag, but
I'm not yet sure how else to handle this. I can easily not set the
flag in my bare metal environment and get the behavior I want, but I'm
not sure if this is sufficient for nommu Linux users.

> > > With M profile support, though, AIUI it's possible that you have the
> > > atomics but not the thread pointer. You should not assume that lack of
> > > HWCAP_TLS implies lack of the atomics; rather you just have to assume
> > > the atomics are present, I think, or use some other means of detection
> > > with fallback to interrupt masking (assuming these devices have no
> > > kernel/user distinction that prevents you from masking interrupts).
> > > HWCAP_TLS should probably be probed just so you don't assume the
> > > syscall exists in case a system omits it and does trap-and-emulate for
> > > the instruction instead.
> >
> > I think I'm starting to understand this, which is good because it's
> > looking like my startup code for the micros will need to properly set
> > HWCAP before Musl can be used. I assume I'll need to set that
> > 'aux{"AT_PLATFORM"}' to "v6" or "v7" as well to make this runtime
> > detection work properly. I'll have to figure out if "v6m" and "v7m"
> > are supported values for the platform. I may have more questions in
> > the future as I try actually implementing something.
>
> Yes that sounds right. There are other aux vector entries that have to
> be set correctly too for startup code, particularly AT_PHDR for
> __init_tls to find the program headers (and for dl_iterate_phdr to
> work). On some archs AT_HWCAP and AT_PLATFORM are also needed for
> detection of features. AT_MINSIGSTKSZ is needed if the signal frame
> size is variable and may exceed the default one defined in the macro.
> AT_RANDOM is desirable for hardening but not mandatory. AT_EXECFN is
> used as a fallback for program_invocation_name if auxv[0] is missing.
> AT_SYSINFO_EHDR is used to offer vdso but is optional. And AT_*ID and
> AT_SECURE are used to control behavior under suid (not trust
> environment, etc.).

Good to know, thanks.

On Tue, Dec 22, 2020 at 4:44 PM Patrick Oppenlander
<patrick.oppenlander@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 6:17 PM Jesse DeGuire <jesse.a.deguire@gmail.com> wrote:
> >
> > On Thu, Dec 17, 2020 at 12:10 AM Patrick Oppenlander
> > <patrick.oppenlander@gmail.com> wrote:
> > >
> > > On Thu, Dec 17, 2020 at 3:55 PM Patrick Oppenlander
> > > <patrick.oppenlander@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 17, 2020 at 11:24 AM Rich Felker <dalias@libc.org> wrote:
> > > > >
> > > > > On Wed, Dec 16, 2020 at 06:43:15PM -0500, Jesse DeGuire wrote:
> > > > > > Hey everyone,
> > > > > >
> > > > > > I'm working on putting together a Clang-based toolchain to use with
> > > > > > Microchip PIC32 (MIPS32) and SAM (Cortex-M) microcontrollers as an
> > > > > > alternative to their paid XC32 toolchain and I'd like to use Musl as
> > > > > > the C library. Currently, I'm trying to get it to build for a few
> > > > > > different Cortex-M devices and have found that Musl builds fine for
> > > > > > ARMv7-M, but not for ARMv6-M or v8-M Baseline because Musl uses
> > > > > > instructions not supported on the Thumb ISA subset used by v6-M and
> > > > > > v8-M Baseline devices. I'm using the v1.2.1 tag version of Musl, but
> > > > > > can easily switch to HEAD if needed. I am using a Python script to
> > > > > > build Musl (and eventually the rest of the toolchain), which you can
> > > > > > see on GitHub at the following link. It's a bit of a mess at the
> > > > > > moment, but the build_musl() function is what I'm currently using to
> > > > > > build Musl.
> > > > >
> > > > > I had assumed the thumb1[-ish?] subset wouldn't be interesting, but if
> > > > > it is, let's have a look.
> > > > >
> > > > > > https://github.com/jdeguire/buildPic32Clang/blob/master/buildPic32Clang.py
> > > > > >
> > > > > > Anyway, I have managed to get Musl to build for v6-M, v7-M, and v8-M
> > > > > > Baseline and have attached a diff to this email. If you like, I can go
> > > > > > into more detail as to why I made the changes I made; however, many
> > > > > > changes were merely the result of my attempts to correct errors
> > > > > > reported by Clang due to it encountering instruction sequences not
> > > > > > supported on ARMv6-M.
> > > > >
> > > > > Are there places where clang's linker is failing to make substitutions
> > > > > that the GNU one can do, that would make this simpler? For example I
> > > > > know the GNU one can replace bx rn by mov pc,rn if needed (based on a
> > > > > relocation the assembler emits on the insn).
> > > > >
> > > > > > A number of errors were simply the result of
> > > > > > ARMv6-M requiring one to use the "S" variant of an instruction that
> > > > > > sets status flags (such as "ADDS" vs "ADD" or "MOVS" vs "MOV"). A few
> > > > > > files I had to change from a "lower case s" to a "capital-S" file so
> > > > > > that I could use macros to check for either the Thumb1 ISA
> > > > > > ("__thumb2__ || !__thumb__") or for an M-Profile device
> > > > > > ("!__ARM_ARCH_ISA_ARM").
> > > > >
> > > > > Is __ARM_ARCH_ISA_ARM universally available (even on old compilers)?
> > > > > If not this may need an alternate detection. But I'd like to drop as
> > > > > much as possible and just make the code compatible rather than having
> > > > > 2 versions of it. I don't think there are any places where the
> > > > > performance or size is at all relevant.
> > > > >
> > > > > > The changes under
> > > > > > "src/thread/arm/__set_thread_area.c" are different in that I made
> > > > > > those because I don't believe Cortex-M devices could handle what was
> > > > > > there (no M-Profile device has Coprocessor 15, for example) and so I
> > > > >
> > > > > Unless this is an ISA level that can't be executed on a normal (non-M)
> > > > > ARM profile, it still needs all the backends that might be needed and
> > > > > runtime selection of which to use. This is okay. I believe Linux for
> > > > > nommu ARM has a syscall for get_tp, which is rather awful but probably
> > > > > needs to be added as a backend. The right way to do this would have
> > > > > been with trap-and-emulate (of cp15) I think...
> > > >
> > > > Linux emulates mrc 15 on old -A cores but they decided not to on -M
> > > > for some reason. BTW, the syscall is called get_tls.
> > > >
> > > > Is there any option other than supporting the get_tls syscall? Even if
> > > > someone puts in the effort to wire up the trap-and-emulate backend,
> > > > musl linked executables will still only run on new kernels.
> > > >
> > > > I took the trap-and-emulate approach in Apex RTOS to avoid opening
> > > > this can of worms. It's the only missing link for musl on armv7-m.
> > > > Everything else works beautifully.
> > >
> > > Another consideration is qemu-user: Currently it aborts when
> > > encountering an mrc 15 instruction while emulating armv7-m. I guess
> > > that would probably also be solved by supporting the syscall.
> > >
> > > Patrick
> >
> > ARMv6-M and v8-M.base do not support the MRC instruction at all. Could
> > that play into why Linux and qemu bail?
> >
> > Jesse
>
> Sorry, I missed this reply.
>
> qemu-user refuses to translate the instruction because cp15 is not
> implemented on armv7-m, exactly the same issue as is being discussed
> here. If you run the same executable but tell qemu to emulate an A
> profile core instead it happily runs it.
>
> Linux will probably kill the executable with SIGILL or something like
> that (I haven't tried, just guessing).
>
> It's related to this discussion as changing musl to use the syscall
> will likely result in qemu-user working too.
>
> I would personally prefer to see a solution which doesn't use the
> syscall. It's possible to implement the trap-and-emulate much more
> efficiently than the syscall as it can quite easily be done without
> preserving any more registers than the core pushes on exception entry
> anyway. https://github.com/apexrtos/apex/blob/master/sys/arch/arm/v7m/emulate.S
> is what I came up with. That implementation could be even tighter as
> it can never run from handler mode, so the stack detection at the
> beginning is unnecessary. However, I haven't considered v6-m or v8-m.
>
> trap-and-emulate also gracefully degrades when running the same
> executable on A vs M cores.
>
> Patrick

Any thoughts on what's shown in this patch? For your RTOS and v7m/v8m,
I'm thinking you'd be able to get the behavior you want by setting the
HWCAP_TLS flag early in your startup code. For my purposes, I plan to
use the syscall because I intend to eventually make a "baremetal" arch
in Musl that turns syscalls into simple function calls. Therefore, I'd
clear the flag in my startup code.

-Jesse

[-- Attachment #2: musl_cortexm_v3.diff --]
[-- Type: application/octet-stream, Size: 14402 bytes --]

diff --git a/arch/arm/atomic_arch.h b/arch/arm/atomic_arch.h
index 9e3937cc..54b743bb 100644
--- a/arch/arm/atomic_arch.h
+++ b/arch/arm/atomic_arch.h
@@ -27,16 +27,6 @@ static inline int a_sc(volatile int *p, int v)
 	return !r;
 }
 
-#if __ARM_ARCH_7A__ || __ARM_ARCH_7R__ ||  __ARM_ARCH >= 7
-
-#define a_barrier a_barrier
-static inline void a_barrier()
-{
-	__asm__ __volatile__ ("dmb ish" : : : "memory");
-}
-
-#endif
-
 #define a_pre_llsc a_barrier
 #define a_post_llsc a_barrier
 
@@ -62,13 +52,22 @@ static inline int a_cas(volatile int *p, int t, int s)
 
 #endif
 
-#ifndef a_barrier
 #define a_barrier a_barrier
+#if __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7 || __ARM_ARCH_PROFILE == 'M'
+
+static inline void a_barrier()
+{
+	__asm__ __volatile__ ("dmb ish" : : : "memory");
+}
+
+#else
+
 static inline void a_barrier()
 {
 	register uintptr_t ip __asm__("ip") = __a_barrier_ptr;
 	__asm__ __volatile__( BLX " ip" : "+r"(ip) : : "memory", "cc", "lr" );
 }
+
 #endif
 
 #define a_crash a_crash
diff --git a/arch/arm/crt_arch.h b/arch/arm/crt_arch.h
index 99508b1d..66080422 100644
--- a/arch/arm/crt_arch.h
+++ b/arch/arm/crt_arch.h
@@ -3,13 +3,15 @@ __asm__(
 ".global " START " \n"
 ".type " START ",%function \n"
 START ": \n"
-"	mov fp, #0 \n"
-"	mov lr, #0 \n"
+"	movs a3, #0 \n"
+"	mov fp, a3 \n"
+"	mov lr, a3 \n"
 "	ldr a2, 1f \n"
 "	add a2, pc, a2 \n"
 "	mov a1, sp \n"
-"2:	and ip, a1, #-16 \n"
-"	mov sp, ip \n"
+"2:	subs a3, #16 \n"
+"	ands a1, a3 \n"
+"	mov sp, a1 \n"
 "	bl " START "_c \n"
 ".weak _DYNAMIC \n"
 ".hidden _DYNAMIC \n"
diff --git a/arch/arm/pthread_arch.h b/arch/arm/pthread_arch.h
index e689ea21..9155b9a4 100644
--- a/arch/arm/pthread_arch.h
+++ b/arch/arm/pthread_arch.h
@@ -1,5 +1,5 @@
 #if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
- || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
+ || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || (__ARM_ARCH >= 7 && __ARM_ARCH_PROFILE != 'M')
 
 static inline pthread_t __pthread_self()
 {
diff --git a/crt/arm/crtn.s b/crt/arm/crtn.s
index dc020f92..547e64b7 100644
--- a/crt/arm/crtn.s
+++ b/crt/arm/crtn.s
@@ -1,9 +1,9 @@
 .syntax unified
 
 .section .init
-	pop {r0,lr}
-	bx lr
+	pop {r0,r1}
+	bx r1
 
 .section .fini
-	pop {r0,lr}
-	bx lr
+	pop {r0,r1}
+	bx r1
diff --git a/src/ldso/arm/tlsdesc.S b/src/ldso/arm/tlsdesc.S
index 3ae133c9..33216200 100644
--- a/src/ldso/arm/tlsdesc.S
+++ b/src/ldso/arm/tlsdesc.S
@@ -12,13 +12,13 @@ __tlsdesc_static:
 .hidden __tlsdesc_dynamic
 .type __tlsdesc_dynamic,%function
 __tlsdesc_dynamic:
-	push {r2,r3,ip,lr}
+	push {r2,r3,r4,lr}
 	ldr r1,[r0]
 	ldr r2,[r1,#4]  // r2 = offset
 	ldr r1,[r1]     // r1 = modid
 
 #if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
- || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
+ || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || (__ARM_ARCH >= 7 && __ARM_ARCH_PROFILE != 'M')
 	mrc p15,0,r0,c13,c0,3
 #else
 	ldr r0,1f
@@ -36,19 +36,28 @@ __tlsdesc_dynamic:
 	bx r0
 #endif
 #endif
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 	ldr r3,[r0,#-4] // r3 = dtv
-	ldr ip,[r3,r1,LSL #2]
-	sub r0,ip,r0
+	ldr r4,[r3,r1,LSL #2]
+	sub r0,r4,r0
+#else
+	mov r4,r0
+	subs r4,#4
+	ldr r3,[r4]
+	lsls r4,r1,#2
+	ldr r4,[r3,r4]
+	subs r0,r4,r0
+#endif
 	add r0,r0,r2    // r0 = r3[r1]-r0+r2
 #if __ARM_ARCH >= 5
-	pop {r2,r3,ip,pc}
+	pop {r2,r3,r4,pc}
 #else
-	pop {r2,r3,ip,lr}
+	pop {r2,r3,r4,lr}
 	bx lr
 #endif
 
 #if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
- || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
+ || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || (__ARM_ARCH >= 7 && __ARM_ARCH_PROFILE != 'M')
 #else
 	.align 2
 1:	.word __a_gettp_ptr - 2b
diff --git a/src/process/arm/vfork.s b/src/process/arm/vfork.s
index d7ec41b3..b6f0260e 100644
--- a/src/process/arm/vfork.s
+++ b/src/process/arm/vfork.s
@@ -3,7 +3,7 @@
 .type vfork,%function
 vfork:
 	mov ip, r7
-	mov r7, 190
+	movs r7, 190
 	svc 0
 	mov r7, ip
 	.hidden __syscall_ret
diff --git a/src/setjmp/arm/longjmp.S b/src/setjmp/arm/longjmp.S
index 8df0b819..a2641b92 100644
--- a/src/setjmp/arm/longjmp.S
+++ b/src/setjmp/arm/longjmp.S
@@ -7,16 +7,32 @@ _longjmp:
 longjmp:
 	mov ip,r0
 	movs r0,r1
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 	moveq r0,#1
 	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
 	ldmia ip!, {r2,lr}
 	mov sp,r2
-
+#else
+	bne 4f
+	movs r0,#1
+4:	mov r1,ip
+	adds r1,#16
+	ldmia r1!, {r2-r7}
+	mov lr,r7
+	mov sp,r6
+	mov r11,r5
+	mov r10,r4
+	mov r9,r3
+	mov r8,r2
+	mov ip,r1
+	subs r1,#40
+	ldmia r1!, {r4-r7}
+#endif
 	adr r1,1f
 	ldr r2,1f
 	ldr r1,[r1,r2]
 
-#if __ARM_ARCH < 8
+#if __ARM_ARCH_PROFILE != 'M' && __ARM_ARCH < 8
 	tst r1,#0x260
 	beq 3f
 	// HWCAP_ARM_FPA
@@ -24,14 +40,15 @@ longjmp:
 	beq 2f
 	ldc p2, cr4, [ip], #48
 #endif
-2:	tst r1,#0x40
+2:	movs r2,#0x40
+	tst r1,r2
 	beq 2f
 	.fpu vfp
 	vldmia ip!, {d8-d15}
 	.fpu softvfp
 	.eabi_attribute 10, 0
 	.eabi_attribute 27, 0
-#if __ARM_ARCH < 8
+#if __ARM_ARCH_PROFILE != 'M' && __ARM_ARCH < 8
 	// HWCAP_ARM_IWMMXT
 2:	tst r1,#0x200
 	beq 3f
diff --git a/src/setjmp/arm/setjmp.S b/src/setjmp/arm/setjmp.S
index 45731d22..7ca51886 100644
--- a/src/setjmp/arm/setjmp.S
+++ b/src/setjmp/arm/setjmp.S
@@ -8,17 +8,28 @@
 __setjmp:
 _setjmp:
 setjmp:
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 	mov ip,r0
 	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
 	mov r2,sp
 	stmia ip!,{r2,lr}
-	mov r0,#0
-
+#else
+	stmia r0!,{r4-r7}
+	mov r1,r8
+	mov r2,r9
+	mov r3,r10
+	mov r4,r11
+	mov r5,sp
+	mov r6,lr
+	stmia r0!,{r1-r6}
+	mov ip,r0
+#endif
+	movs r0,#0
 	adr r1,1f
 	ldr r2,1f
 	ldr r1,[r1,r2]
 
-#if __ARM_ARCH < 8
+#if __ARM_ARCH_PROFILE != 'M' && __ARM_ARCH < 8
 	tst r1,#0x260
 	beq 3f
 	// HWCAP_ARM_FPA
@@ -26,14 +37,15 @@ setjmp:
 	beq 2f
 	stc p2, cr4, [ip], #48
 #endif
-2:	tst r1,#0x40
+2:	movs r2,#0x40
+	tst r1,r2
 	beq 2f
 	.fpu vfp
 	vstmia ip!, {d8-d15}
 	.fpu softvfp
 	.eabi_attribute 10, 0
 	.eabi_attribute 27, 0
-#if __ARM_ARCH < 8
+#if __ARM_ARCH_PROFILE != 'M' && __ARM_ARCH < 8
 	// HWCAP_ARM_IWMMXT
 2:	tst r1,#0x200
 	beq 3f
diff --git a/src/signal/arm/restore.s b/src/signal/arm/restore.s
index fb086d9b..2b7621b1 100644
--- a/src/signal/arm/restore.s
+++ b/src/signal/arm/restore.s
@@ -4,12 +4,12 @@
 .hidden __restore
 .type __restore,%function
 __restore:
-	mov r7,#119
+	movs r7,#119
 	swi 0x0
 
 .global __restore_rt
 .hidden __restore_rt
 .type __restore_rt,%function
 __restore_rt:
-	mov r7,#173
+	movs r7,#173
 	swi 0x0
diff --git a/src/signal/arm/sigsetjmp.s b/src/signal/arm/sigsetjmp.s
index 69ebbf49..8ef51de3 100644
--- a/src/signal/arm/sigsetjmp.s
+++ b/src/signal/arm/sigsetjmp.s
@@ -9,16 +9,20 @@ __sigsetjmp:
 	bne 1f
 	b setjmp
 
-1:	str lr,[r0,#256]
-	str r4,[r0,#260+8]
+1:	mov r2,lr
+	adds r0,#200
+	str r2,[r0,#56]
+	str r4,[r0,#60+8]
 	mov r4,r0
 
 	bl setjmp
 
 	mov r1,r0
 	mov r0,r4
-	ldr lr,[r0,#256]
-	ldr r4,[r0,#260+8]
+	ldr r2,[r0,#56]
+	mov lr,r2
+	ldr r4,[r0,#60+8]
+	subs r0,#200
 
 .hidden __sigsetjmp_tail
 	b __sigsetjmp_tail
diff --git a/src/string/arm/memcpy.S b/src/string/arm/memcpy.S
index 869e3448..2eb28eec 100644
--- a/src/string/arm/memcpy.S
+++ b/src/string/arm/memcpy.S
@@ -43,6 +43,8 @@
  * building as thumb 2 and big-endian.
  */
 
+#if defined(__thumb2__)  ||  !defined(__thumb__)
+
 .syntax unified
 
 .global memcpy
@@ -477,3 +479,4 @@ copy_last_3_and_return:
 	ldmfd   sp!, {r0, r4, lr}
 	bx      lr
 
+#endif /* defined(__thumb2__)  ||  !defined(__thumb__) */
diff --git a/src/string/arm/memcpy_thumb1.c b/src/string/arm/memcpy_thumb1.c
new file mode 100755
index 00000000..23571e00
--- /dev/null
+++ b/src/string/arm/memcpy_thumb1.c
@@ -0,0 +1,5 @@
+#if defined(__thumb__)  &&  !defined(__thumb2__)
+
+#include "../memcpy.c"
+
+#endif
\ No newline at end of file
diff --git a/src/thread/arm/__set_thread_area.c b/src/thread/arm/__set_thread_area.c
index 09de65aa..99ce5f41 100644
--- a/src/thread/arm/__set_thread_area.c
+++ b/src/thread/arm/__set_thread_area.c
@@ -6,27 +6,50 @@
 #define HWCAP_TLS (1 << 15)
 
 extern hidden const unsigned char
-	__a_barrier_oldkuser[], __a_barrier_v6[], __a_barrier_v7[],
-	__a_cas_v6[], __a_cas_v7[],
-	__a_gettp_cp15[];
+	__a_barrier_oldkuser[], __a_barrier_v6[], __a_barrier_v7[], __a_barrier_m[],
+	__a_cas_v6[], __a_cas_v7[], __a_cas_m[], __a_cas_intmask_m[],
+	__a_gettp_cp15[], __a_gettp_cp15_m[], __a_gettp_syscall_m[];
 
 #define __a_barrier_kuser 0xffff0fa0
 #define __a_barrier_oldkuser (uintptr_t)__a_barrier_oldkuser
 #define __a_barrier_v6 (uintptr_t)__a_barrier_v6
 #define __a_barrier_v7 (uintptr_t)__a_barrier_v7
+#define __a_barrier_m (uintptr_t)__a_barrier_m
 
 #define __a_cas_kuser 0xffff0fc0
 #define __a_cas_v6 (uintptr_t)__a_cas_v6
 #define __a_cas_v7 (uintptr_t)__a_cas_v7
+#define __a_cas_m (uintptr_t)__a_cas_m
+#define __a_cas_intmask_m (uintptr_t)__a_cas_intmask_m
 
 #define __a_gettp_kuser 0xffff0fe0
 #define __a_gettp_cp15 (uintptr_t)__a_gettp_cp15
+#define __a_gettp_cp15_m (uintptr_t)__a_gettp_cp15_m
+#define __a_gettp_syscall_m (uintptr_t)__a_gettp_syscall_m
 
 extern hidden uintptr_t __a_barrier_ptr, __a_cas_ptr, __a_gettp_ptr;
 
 int __set_thread_area(void *p)
 {
-#if !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
+#if __ARM_ARCH_PROFILE == 'M'
+	__a_cas_ptr = __a_cas_m;
+	__a_barrier_ptr = __a_barrier_m;
+
+	if (__hwcap & HWCAP_TLS) {
+		__a_gettp_ptr = __a_gettp_cp15_m;
+	} else {
+		size_t *aux;
+		__a_gettp_ptr = __a_gettp_syscall_m;
+		for (aux=libc.auxv; *aux; aux+=2) {
+			if (*aux != AT_PLATFORM) continue;
+			const char *s = (void *)aux[1];
+			if (s[0]=='v' && s[1]=='6') {
+				__a_cas_ptr = __a_cas_intmask_m;
+				break;
+			}
+		}
+	}
+#elif !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
 	if (__hwcap & HWCAP_TLS) {
 		size_t *aux;
 		__a_cas_ptr = __a_cas_v7;
diff --git a/src/thread/arm/__unmapself.c b/src/thread/arm/__unmapself.c
new file mode 100755
index 00000000..9afc4780
--- /dev/null
+++ b/src/thread/arm/__unmapself.c
@@ -0,0 +1,21 @@
+#if __ARM_ARCH_PROFILE != 'M'
+
+#include "pthread_impl.h"
+
+void __unmapself(void *base, size_t size)
+{
+	register void *r0 __asm__("r0") = base;
+	register size_t r1 __asm__("r1") = size;
+	__asm__ __volatile__ (
+	"	movs r7,#91 \n"
+	"	svc 0 \n"
+	"	movs r7,#1 \n"
+	"	svc 0 \n"
+	:: "r"(r0), "r"(r1));
+}
+
+#else
+
+#include "../__unmapself.c"
+
+#endif
diff --git a/src/thread/arm/__unmapself.s b/src/thread/arm/__unmapself.s
deleted file mode 100644
index 29c2d07b..00000000
--- a/src/thread/arm/__unmapself.s
+++ /dev/null
@@ -1,9 +0,0 @@
-.syntax unified
-.text
-.global __unmapself
-.type   __unmapself,%function
-__unmapself:
-	mov r7,#91
-	svc 0
-	mov r7,#1
-	svc 0
diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
index da50508d..900572dc 100644
--- a/src/thread/arm/atomics.s
+++ b/src/thread/arm/atomics.s
@@ -11,6 +11,8 @@ __a_barrier_dummy:
 .hidden __a_barrier_oldkuser
 .type __a_barrier_oldkuser,%function
 __a_barrier_oldkuser:
+	.arch armv6
+	.arm
 	push {r0,r1,r2,r3,ip,lr}
 	mov r1,r0
 	mov r2,sp
@@ -25,6 +27,7 @@ __a_barrier_oldkuser:
 .type __a_barrier_v6,%function
 __a_barrier_v6:
 	.arch armv6t2
+	.arm
 	mcr p15,0,r0,c7,c10,5
 	bx lr
 
@@ -33,24 +36,38 @@ __a_barrier_v6:
 .type __a_barrier_v7,%function
 __a_barrier_v7:
 	.arch armv7-a
+	.arm
 	dmb ish
 	bx lr
 
+.global __a_barrier_m
+.hidden __a_barrier_m
+.type __a_barrier_m,%function
+__a_barrier_m:
+	.arch armv6-m
+	.thumb
+	dmb
+	bx lr
+
 .global __a_cas_dummy
 .hidden __a_cas_dummy
 .type __a_cas_dummy,%function
 __a_cas_dummy:
+	.arch armv7-a
+	.arm
 	mov r3,r0
 	ldr r0,[r2]
 	subs r0,r3,r0
-	streq r1,[r2]
-	bx lr
+	bne 1f
+	str r1,[r2]
+1:	bx lr
 
 .global __a_cas_v6
 .hidden __a_cas_v6
 .type __a_cas_v6,%function
 __a_cas_v6:
 	.arch armv6t2
+	.arm
 	mov r3,r0
 	mcr p15,0,r0,c7,c10,5
 1:	ldrex r0,[r2]
@@ -66,6 +83,7 @@ __a_cas_v6:
 .type __a_cas_v7,%function
 __a_cas_v7:
 	.arch armv7-a
+	.arm
 	mov r3,r0
 	dmb ish
 1:	ldrex r0,[r2]
@@ -76,13 +94,72 @@ __a_cas_v7:
 	dmb ish
 	bx lr
 
+.global __a_cas_m
+.hidden __a_cas_m
+.type __a_cas_m,%function
+__a_cas_m:
+	.arch armv7-m
+	.thumb
+	mov r3,r0
+	dmb
+1:	ldrex r0,[r2]
+	subs r0,r3,r0
+	bne 1b
+	strex r0,r1,[r2]
+	tst r0,r0
+	bne 1b
+	dmb
+	bx lr
+
+.global __a_cas_intmask_m
+.hidden __a_cas_intmask_m
+.type __a_cas_intmask_m,%function
+__a_cas_intmask_m:
+	.arch armv6-m
+	.thumb
+	mov r3,r0
+	dmb
+	cpsid i
+1:	ldr r0,[r2]
+	subs r0,r3,r0
+	bne 1b
+	str r1,[r2]
+	cpsie i
+	dmb
+	bx lr
+
 .global __a_gettp_cp15
 .hidden __a_gettp_cp15
 .type __a_gettp_cp15,%function
 __a_gettp_cp15:
+	.arch armv6
+	.arm
 	mrc p15,0,r0,c13,c0,3
 	bx lr
 
+.global __a_gettp_cp15_m
+.hidden __a_gettp_cp15_m
+.type __a_gettp_cp15_m,%function
+__a_gettp_cp15_m:
+	.arch armv7-m
+	.thumb
+	mrc p15,0,r0,c13,c0,3
+	bx lr
+
+.global __a_gettp_syscall_m
+.hidden __a_gettp_syscall_m
+.type __a_gettp_syscall_m,%function
+__a_gettp_syscall_m:
+	.arch armv6-m
+	.thumb
+	push {r7}
+	movs r7,#0xf
+	lsls r7,r7,#16
+	adds r7,#6          /* ARM get_tls syscall (0xf0006) */
+	svc 0
+	pop {r7}
+	bx lr
+
 /* Tag this file with minimum ISA level so as not to affect linking. */
 .object_arch armv4t
 .eabi_attribute 6,2
diff --git a/src/thread/arm/clone.s b/src/thread/arm/clone.s
index bb0965da..33c2e59b 100644
--- a/src/thread/arm/clone.s
+++ b/src/thread/arm/clone.s
@@ -4,24 +4,26 @@
 .hidden __clone
 .type   __clone,%function
 __clone:
-	stmfd sp!,{r4,r5,r6,r7}
-	mov r7,#120
+	push {r4,r5,r6,r7}
+	movs r7,#120
 	mov r6,r3
 	mov r5,r0
 	mov r0,r2
-	and r1,r1,#-16
+	movs r2,#0
+	subs r2,#16
+	ands r1,r2
 	ldr r2,[sp,#16]
 	ldr r3,[sp,#20]
 	ldr r4,[sp,#24]
 	svc 0
 	tst r0,r0
 	beq 1f
-	ldmfd sp!,{r4,r5,r6,r7}
+	pop {r4,r5,r6,r7}
 	bx lr
 
 1:	mov r0,r6
 	bl 3f
-2:	mov r7,#1
+2:	movs r7,#1
 	svc 0
 	b 2b
 
diff --git a/src/thread/arm/syscall_cp.s b/src/thread/arm/syscall_cp.s
index e607dd42..421e64f4 100644
--- a/src/thread/arm/syscall_cp.s
+++ b/src/thread/arm/syscall_cp.s
@@ -11,7 +11,7 @@
 .type __syscall_cp_asm,%function
 __syscall_cp_asm:
 	mov ip,sp
-	stmfd sp!,{r4,r5,r6,r7}
+	push {r4,r5,r6,r7}
 __cp_begin:
 	ldr r0,[r0]
 	cmp r0,#0
@@ -19,11 +19,12 @@ __cp_begin:
 	mov r7,r1
 	mov r0,r2
 	mov r1,r3
-	ldmfd ip,{r2,r3,r4,r5,r6}
+	mov r2,ip
+	ldmfd r2,{r2,r3,r4,r5,r6}
 	svc 0
 __cp_end:
-	ldmfd sp!,{r4,r5,r6,r7}
+	pop {r4,r5,r6,r7}
 	bx lr
 __cp_cancel:
-	ldmfd sp!,{r4,r5,r6,r7}
+	pop {r4,r5,r6,r7}
 	b __cancel

  reply	other threads:[~2021-01-06  3:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 23:43 Jesse DeGuire
2020-12-17  0:23 ` Rich Felker
2020-12-17  4:55   ` Patrick Oppenlander
2020-12-17  5:10     ` Patrick Oppenlander
2020-12-18  7:17       ` Jesse DeGuire
2020-12-22 21:43         ` Patrick Oppenlander
2021-01-06  3:24           ` Jesse DeGuire [this message]
2021-01-06  4:01             ` Patrick Oppenlander
2021-01-13 23:51               ` Jesse DeGuire
2020-12-18  7:15   ` Jesse DeGuire
2020-12-18 17:30     ` Rich Felker
2020-12-21 23:58       ` Jesse DeGuire
2020-12-22  1:39         ` Rich Felker
2020-12-18 19:38   ` Markus Wichmann
2020-12-18 20:34     ` Rich Felker
2020-12-22  0:00       ` Jesse DeGuire
2020-12-22  1:40         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALqyXLh0zwXPzvoBLaKx=jQx1eiQ22S2CSjsN47FntJAryst4w@mail.gmail.com' \
    --to=jesse.a.deguire@gmail.com \
    --cc=musl@lists.openwall.com \
    --cc=patrick.oppenlander@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).