mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Jesse DeGuire <jesse.a.deguire@gmail.com>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Musl on Cortex-M Devices
Date: Fri, 18 Dec 2020 02:15:20 -0500	[thread overview]
Message-ID: <CALqyXLhqNtZ55Nbs62j8yKCQ0mvCzCazSCCF2MTYuhSWUwF6tQ@mail.gmail.com> (raw)
In-Reply-To: <20201217002348.GO534@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 23878 bytes --]

On Wed, Dec 16, 2020 at 7:23 PM Rich Felker <dalias@libc.org> wrote:
>
> On Wed, Dec 16, 2020 at 06:43:15PM -0500, Jesse DeGuire wrote:
> > Hey everyone,
> >
> > I'm working on putting together a Clang-based toolchain to use with
> > Microchip PIC32 (MIPS32) and SAM (Cortex-M) microcontrollers as an
> > alternative to their paid XC32 toolchain and I'd like to use Musl as
> > the C library. Currently, I'm trying to get it to build for a few
> > different Cortex-M devices and have found that Musl builds fine for
> > ARMv7-M, but not for ARMv6-M or v8-M Baseline because Musl uses
> > instructions not supported on the Thumb ISA subset used by v6-M and
> > v8-M Baseline devices. I'm using the v1.2.1 tag version of Musl, but
> > can easily switch to HEAD if needed. I am using a Python script to
> > build Musl (and eventually the rest of the toolchain), which you can
> > see on GitHub at the following link. It's a bit of a mess at the
> > moment, but the build_musl() function is what I'm currently using to
> > build Musl.
>
> I had assumed the thumb1[-ish?] subset wouldn't be interesting, but if
> it is, let's have a look.
>
> > https://github.com/jdeguire/buildPic32Clang/blob/master/buildPic32Clang.py
> >
> > Anyway, I have managed to get Musl to build for v6-M, v7-M, and v8-M
> > Baseline and have attached a diff to this email. If you like, I can go
> > into more detail as to why I made the changes I made; however, many
> > changes were merely the result of my attempts to correct errors
> > reported by Clang due to it encountering instruction sequences not
> > supported on ARMv6-M.
>
> Are there places where clang's linker is failing to make substitutions
> that the GNU one can do, that would make this simpler? For example I
> know the GNU one can replace bx rn by mov pc,rn if needed (based on a
> relocation the assembler emits on the insn).

The main issue I ran into is that ARMv6-M and v8-M.base have many
instructions that can accept only registers R0-R7 as input, so some
extra care or shuffling is required. Many of these were pretty easily
worked around.

> > A number of errors were simply the result of
> > ARMv6-M requiring one to use the "S" variant of an instruction that
> > sets status flags (such as "ADDS" vs "ADD" or "MOVS" vs "MOV"). A few
> > files I had to change from a "lower case s" to a "capital-S" file so
> > that I could use macros to check for either the Thumb1 ISA
> > ("__thumb2__ || !__thumb__") or for an M-Profile device
> > ("!__ARM_ARCH_ISA_ARM").
>
> Is __ARM_ARCH_ISA_ARM universally available (even on old compilers)?
> If not this may need an alternate detection. But I'd like to drop as
> much as possible and just make the code compatible rather than having
> 2 versions of it. I don't think there are any places where the
> performance or size is at all relevant.

I hadn't thought enough about that until your reply, so I used
Compiler Explorer to at least get a general idea of when that macro
was supported. Clang 3.5 supports it, but Clang 3.4.1 does not. GCC
5.4 supports it, but 4.64 (the previous version available on Compiler
Explorer) does not. With that, I tried to come up with something to
detect M-Profile devices and have updated my changes with it.

   #if __ARM_ARCH_6M__ || __ARM_ARCH_7M__ || __ARM_ARCH_7EM__ \
     || (__ARM_ARCH > 7 && !__ARM_ARCH_ISA_ARM)

This presumes that the __ARM_ARCH_xx__ macros were supported for as
long as that variant was supported and that any compiler that supports
an ARM arch greater than 7 is new enough to have the
__ARM_ARCH_ISA_ARM macro, but this should cover a wider range of
toolchains.

I also did merge some of the alternate paths like you suggested.

> > The changes under
> > "src/thread/arm/__set_thread_area.c" are different in that I made
> > those because I don't believe Cortex-M devices could handle what was
> > there (no M-Profile device has Coprocessor 15, for example) and so I
>
> Unless this is an ISA level that can't be executed on a normal (non-M)
> ARM profile, it still needs all the backends that might be needed and
> runtime selection of which to use. This is okay. I believe Linux for
> nommu ARM has a syscall for get_tp, which is rather awful but probably
> needs to be added as a backend. The right way to do this would have
> been with trap-and-emulate (of cp15) I think...

Out of curiosity, what makes that syscall awful?

> > sort of punted for now and figured that a user's fault handler could
> > handle those "_kuser" calls given that there probably won't be a
> > kernel to do so.
>
> I don't think you can use the kuser_helper versions here because (1)
> they're at ARM addresses, not thumb ones, and (2) they're in the
> address range that Cortex M uses for special stuff.

Ah, right you are! It turns out that the range 0xE0100000 and above is
a "vendor system region", presumably for vendors to add their own
system control or debug features on top of what ARM already provides
and so one cannot assume that the region is empty.

> > Unfortunately, I'm not far enough yet in my project to build code that
> > would actually run on one of the micros I have with me, so I haven't
> > yet been able to properly try out these changes. Also, I should
> > mention that my eventual plan is to make a "baremetal" layer that
> > would provide stubs to handle syscalls or thread-related things that
> > you wouldn't normally have on a microcontroller, but I want to get
> > Musl building without that first because I think I'll be able to
> > utilize a lot of what's already present when I get to that point.
> > Hopefully, what's here is still somewhat useful as a starting point
> > should Musl want to support Cortex-M. I'll try to update this diff as
> > a result of feedback or my own eventual testing (I'm a long way from
> > that, though!).
>
> Thanks for the update. Please try to attach as UTF-8 rather than
> UTF-16 next time so it's more readable/accessible. I had to guess it
> was UTF-16 and switch the mime encoding headers to read it.

Oops, sorry about that. I must have accidentally created the diff with
PowerShell instead of my normal WSL Bash terminal.

> Further comments inline below:
>
> > diff --git a/arch/arm/atomic_arch.h b/arch/arm/atomic_arch.h
> > index 9e3937cc..21f102ad 100644
> > --- a/arch/arm/atomic_arch.h
> > +++ b/arch/arm/atomic_arch.h
> > @@ -64,11 +64,18 @@ static inline int a_cas(volatile int *p, int t, int s)
> >
> >  #ifndef a_barrier
> >  #define a_barrier a_barrier
> > +#if __ARM_ARCH_ISA_ARM
> >  static inline void a_barrier()
> >  {
> >       register uintptr_t ip __asm__("ip") = __a_barrier_ptr;
> >       __asm__ __volatile__( BLX " ip" : "+r"(ip) : : "memory", "cc", "lr" );
> >  }
> > +#else
> > +static inline void a_barrier()
> > +{
> > +     __asm__ __volatile__ ("dmb" : : : "memory");
> > +}
> > +#endif
> >  #endif
>
> There's a condition above with this code already; you just need to
> make it used for an additional case.

Fixed. The original code was nested inside an ifdef that also would
have used LDREX and STREX--neither of which ARMv6-M supports--so I
moved that out and added the extra check.

> >  #define a_crash a_crash
> > diff --git a/arch/arm/crt_arch.h b/arch/arm/crt_arch.h
> > index 99508b1d..c48b14b0 100644
> > --- a/arch/arm/crt_arch.h
> > +++ b/arch/arm/crt_arch.h
> > @@ -3,13 +3,25 @@ __asm__(
> >  ".global " START " \n"
> >  ".type " START ",%function \n"
> >  START ": \n"
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  "    mov fp, #0 \n"
> >  "    mov lr, #0 \n"
> > +#else
> > +"    movs a3, #0 \n"
> > +"    mov fp, a3 \n"
> > +"    mov lr, a3 \n"
> > +#endif
> >  "    ldr a2, 1f \n"
> >  "    add a2, pc, a2 \n"
> >  "    mov a1, sp \n"
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  "2:  and ip, a1, #-16 \n"
> >  "    mov sp, ip \n"
> > +#else
> > +"2:  subs a3, #16 \n"
> > +"    ands a1, a3 \n"
> > +"    mov sp, a1 \n"
> > +#endif
> >  "    bl " START "_c \n"
> >  ".weak _DYNAMIC \n"
> >  ".hidden _DYNAMIC \n"
>
> This can just use the new code unconditionally.

Done.

> > diff --git a/arch/arm/pthread_arch.h b/arch/arm/pthread_arch.h
> > index e689ea21..a89fb554 100644
> > --- a/arch/arm/pthread_arch.h
> > +++ b/arch/arm/pthread_arch.h
> > @@ -1,5 +1,5 @@
> > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
> > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
>
> That's probably not the right condition.

Can you provide more detail as I'm missing what you mean? The intent
was to prevent this check from succeeding for any M-Profile device so
that they fall back to the __a gettp_ptr solution. I did update this
check to what I described at the top of this reply.

> >  static inline pthread_t __pthread_self()
> >  {
> > diff --git a/configure b/configure
> > old mode 100755
> > new mode 100644
> > diff --git a/crt/arm/crtn.S b/crt/arm/crtn.S
> > index dc020f92..83e20da2 100644
> > --- a/crt/arm/crtn.S
> > +++ b/crt/arm/crtn.S
> > @@ -1,9 +1,17 @@
> >  .syntax unified
> >
> >  .section .init
> > +#if __ARM_ARCH_ISA_ARM
> >       pop {r0,lr}
> >       bx lr
> > +#else
> > +     pop {r0,pc}
> > +#endif
> >
> >  .section .fini
> > +#if __ARM_ARCH_ISA_ARM
> >       pop {r0,lr}
> >       bx lr
> > +#else
> > +     pop {r0,pc}
> > +#endif
>
> Ideally the linker could handle this.

The issue here is that ARMv6-M (and I believe v8-M.base) cannot POP
into LR. Can this sequence replace LR with R1? My current version POPs
into R1 and moves that into LR because I wasn't sure if that was
important for the caller.

> > diff --git a/src/ldso/arm/tlsdesc.S b/src/ldso/arm/tlsdesc.S
> > index 3ae133c9..1843d97d 100644
> > --- a/src/ldso/arm/tlsdesc.S
> > +++ b/src/ldso/arm/tlsdesc.S
> > @@ -12,13 +12,19 @@ __tlsdesc_static:
> >  .hidden __tlsdesc_dynamic
> >  .type __tlsdesc_dynamic,%function
> >  __tlsdesc_dynamic:
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       push {r2,r3,ip,lr}
> > +#else
> > +     push {r2-r4,lr}
> > +     mov r2,ip
> > +     push {r2}
> > +#endif
>
> AFAICT ip is not special here. If it's a pain to push/pop, just use a
> different register.

Gotcha. It looks like there isn't really a good reason for the
original to use IP, either, so I could change that and merge a couple
of those alternate code paths. If you don't mind a few extra
instructions, I think I can get rid of all of the (__thumb2__ ||
!__thumb__) checks in there.

> >       ldr r1,[r0]
> >       ldr r2,[r1,#4]  // r2 = offset
> >       ldr r1,[r1]     // r1 = modid
> >
> > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
> > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
> >       mrc p15,0,r0,c13,c0,3
> >  #else
> >       ldr r0,1f
> > @@ -36,19 +42,34 @@ __tlsdesc_dynamic:
> >       bx r0
> >  #endif
> >  #endif
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       ldr r3,[r0,#-4] // r3 = dtv
> >       ldr ip,[r3,r1,LSL #2]
> >       sub r0,ip,r0
> > +#else
> > +     mov r4,r0
> > +     subs r4,#4
> > +     ldr r3,[r4]
> > +     lsls r4,r1,#2
> > +     ldr r4,[r3,r4]
> > +     subs r0,r4,r0
> > +#endif
> >       add r0,r0,r2    // r0 = r3[r1]-r0+r2
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  #if __ARM_ARCH >= 5
> >       pop {r2,r3,ip,pc}
> >  #else
> >       pop {r2,r3,ip,lr}
> >       bx lr
> >  #endif
> > +#else
> > +     pop {r2}
> > +     mov ip,r2
> > +     pop {r2-r4,pc}
> > +#endif
> >
> > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
> > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
> >  #else
> >       .align 2
> >  1:   .word __a_gettp_ptr - 2b
> > diff --git a/src/process/arm/vfork.s b/src/process/arm/vfork.s
> > index d7ec41b3..b6f0260e 100644
> > --- a/src/process/arm/vfork.s
> > +++ b/src/process/arm/vfork.s
> > @@ -3,7 +3,7 @@
> >  .type vfork,%function
> >  vfork:
> >       mov ip, r7
> > -     mov r7, 190
> > +     movs r7, 190
> >       svc 0
> >       mov r7, ip
> >       .hidden __syscall_ret
>
> OK.
>
> > diff --git a/src/setjmp/arm/longjmp.S b/src/setjmp/arm/longjmp.S
> > index 8df0b819..745c9ba2 100644
> > --- a/src/setjmp/arm/longjmp.S
> > +++ b/src/setjmp/arm/longjmp.S
> > @@ -5,6 +5,7 @@
> >  .type longjmp,%function
> >  _longjmp:
> >  longjmp:
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       mov ip,r0
> >       movs r0,r1
> >       moveq r0,#1
> > @@ -16,7 +17,7 @@ longjmp:
> >       ldr r2,1f
> >       ldr r1,[r1,r2]
> >
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       tst r1,#0x260
> >       beq 3f
> >       // HWCAP_ARM_FPA
> > @@ -31,7 +32,7 @@ longjmp:
> >       .fpu softvfp
> >       .eabi_attribute 10, 0
> >       .eabi_attribute 27, 0
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       // HWCAP_ARM_IWMMXT
> >  2:   tst r1,#0x200
> >       beq 3f
> > @@ -42,6 +43,36 @@ longjmp:
> >       ldcl p1, cr14, [ip], #8
> >       ldcl p1, cr15, [ip], #8
> >  #endif
> > +#else
> > +     mov ip,r0
> > +     movs r0,r1
> > +     bne 4f
> > +     movs r0,#1
> > +4:   mov r1,ip
> > +     adds r1,#16
> > +     ldmia r1!, {r2-r7}
> > +     mov lr,r7
> > +     mov sp,r6
> > +     mov r11,r5
> > +     mov r10,r4
> > +     mov r9,r3
> > +     mov r8,r2
> > +     mov ip,r1
> > +     subs r1,#40
> > +     ldmia r1!, {r4-r7}
> > +
> > +     adr r1,1f
> > +     ldr r2,1f
> > +     ldr r1,[r1,r2]
> > +     movs r2,#0x40
> > +     tst r1,r2
> > +     beq 2f
> > +     .fpu vfp
> > +     vldmia ip!, {d8-d15}
> > +     .fpu softvfp
> > +     .eabi_attribute 10, 0
> > +     .eabi_attribute 27, 0
> > +#endif
> >  2:
> >  3:   bx lr
>
> This should probably be unified somehow.

I was able to partially merge this.

> > diff --git a/src/setjmp/arm/setjmp.S b/src/setjmp/arm/setjmp.S
> > index 45731d22..440c166d 100644
> > --- a/src/setjmp/arm/setjmp.S
> > +++ b/src/setjmp/arm/setjmp.S
> > @@ -8,6 +8,7 @@
> >  __setjmp:
> >  _setjmp:
> >  setjmp:
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       mov ip,r0
> >       stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
> >       mov r2,sp
> > @@ -18,7 +19,7 @@ setjmp:
> >       ldr r2,1f
> >       ldr r1,[r1,r2]
> >
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       tst r1,#0x260
> >       beq 3f
> >       // HWCAP_ARM_FPA
> > @@ -33,7 +34,7 @@ setjmp:
> >       .fpu softvfp
> >       .eabi_attribute 10, 0
> >       .eabi_attribute 27, 0
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       // HWCAP_ARM_IWMMXT
> >  2:   tst r1,#0x200
> >       beq 3f
> > @@ -44,6 +45,30 @@ setjmp:
> >       stcl p1, cr14, [ip], #8
> >       stcl p1, cr15, [ip], #8
> >  #endif
> > +#else
> > +     stmia r0!,{r4-r7}
> > +     mov r1,r8
> > +     mov r2,r9
> > +     mov r3,r10
> > +     mov r4,r11
> > +     mov r5,sp
> > +     mov r6,lr
> > +     stmia r0!,{r1-r6}
> > +     mov ip,r0
> > +     movs r0,#0
> > +
> > +     adr r1,1f
> > +     ldr r2,1f
> > +     ldr r1,[r1,r2]
> > +     movs r2,#0x40
> > +     tst r1,r2
> > +     beq 2f
> > +     .fpu vfp
> > +     vstmia ip!, {d8-d15}
> > +     .fpu softvfp
> > +     .eabi_attribute 10, 0
> > +     .eabi_attribute 27, 0
> > +#endif
> >  2:
> >  3:   bx lr
>
> Same here.

I partially merged this as well.

> > diff --git a/src/signal/arm/restore.s b/src/signal/arm/restore.s
> > index fb086d9b..2b7621b1 100644
> > --- a/src/signal/arm/restore.s
> > +++ b/src/signal/arm/restore.s
> > @@ -4,12 +4,12 @@
> >  .hidden __restore
> >  .type __restore,%function
> >  __restore:
> > -     mov r7,#119
> > +     movs r7,#119
> >       swi 0x0
> >
> >  .global __restore_rt
> >  .hidden __restore_rt
> >  .type __restore_rt,%function
> >  __restore_rt:
> > -     mov r7,#173
> > +     movs r7,#173
> >       swi 0x0
>
> Probably OK, unles gdb does something ridiculous wanting the exact
> insn sequence...
>
> > diff --git a/src/signal/arm/sigsetjmp.S b/src/signal/arm/sigsetjmp.S
> > index 69ebbf49..85a71666 100644
> > --- a/src/signal/arm/sigsetjmp.S
> > +++ b/src/signal/arm/sigsetjmp.S
> > @@ -9,16 +9,36 @@ __sigsetjmp:
> >       bne 1f
> >       b setjmp
> >
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  1:   str lr,[r0,#256]
> >       str r4,[r0,#260+8]
> > +#else
> > +1:   mov ip,r2
> > +     mov r2,lr
> > +     adds r0,#200
> > +     str r2,[r0,#56]
> > +     str r4,[r0,#60+8]
> > +     subs r0,#200
> > +     mov r2,ip
> > +#endif
> >       mov r4,r0
> >
> >       bl setjmp
> >
> >       mov r1,r0
> >       mov r0,r4
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       ldr lr,[r0,#256]
> >       ldr r4,[r0,#260+8]
> > +#else
> > +     mov ip,r2
> > +     adds r0,#200
> > +     ldr r2,[r0,#56]
> > +     mov lr,r2
> > +     ldr r4,[r0,#60+8]
> > +     subs r0,#200
> > +     mov r2,ip
> > +#endif
>
> Ideally should be unified.

Done.

> >  .hidden __sigsetjmp_tail
> >       b __sigsetjmp_tail
> > diff --git a/src/string/arm/memcpy.S b/src/string/arm/memcpy.S
> > index 869e3448..c5d17533 100644
> > --- a/src/string/arm/memcpy.S
> > +++ b/src/string/arm/memcpy.S
> > @@ -43,6 +43,8 @@
> >   * building as thumb 2 and big-endian.
> >   */
> >
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> > +
> >  .syntax unified
> >
> >  .global memcpy
> > @@ -477,3 +479,4 @@ copy_last_3_and_return:
> >       ldmfd   sp!, {r0, r4, lr}
> >       bx      lr
> >
> > +#endif // defined(__thumb2__)  ||  !defined(__thumb__)
>
> The logic to enable the C memcpy seems to be missing.

I wasn't actually sure of the best way to do this. In my current
version, I sort of followed the lead of the stuff in src/math/arm and
made it a C file that will either conditionally have the ARM version
as inline assembly or include the generic C version. Is there a better
way? The diff didn't capture this because of the rename; should I
attach it separately?

> > diff --git a/src/thread/arm/__set_thread_area.c b/src/thread/arm/__set_thread_area.c
> > index 09de65aa..cb5d757d 100644
> > --- a/src/thread/arm/__set_thread_area.c
> > +++ b/src/thread/arm/__set_thread_area.c
> > @@ -26,7 +26,11 @@ extern hidden uintptr_t __a_barrier_ptr, __a_cas_ptr, __a_gettp_ptr;
> >
> >  int __set_thread_area(void *p)
> >  {
> > -#if !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
> > +#if !__ARM_ARCH_ISA_ARM
> > +     __a_gettp_ptr = __a_gettp_kuser;
> > +     __a_cas_ptr = __a_cas_kuser;
> > +     __a_barrier_ptr = __a_barrier_kuser;
> > +#elif !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
> >       if (__hwcap & HWCAP_TLS) {
> >               size_t *aux;
> >               __a_cas_ptr = __a_cas_v7;
>
> HWCAP needs to be honored here. You also need to add the new logic to
> use the syscall for getting thread pointer, I think.

I updated this to use the "get_tls" syscall (0xf0006) for M-Profile
devices. The barrier and cas pointers point to the default dummy
functions.

I'm not entirely sure how I should handle HWCAP. What should I do if
TLS is or isn't set? If it isn't set, the original code uses those
'kuser' addresses that you pointed out aren't usable and will bail out
if the version at the special address isn't correct. Right now, I have
mine set to call a_crash() if TLS is not set in HWCAP because that
seems to be the closest I can come to the current functionality.

> > diff --git a/src/thread/arm/__unmapself.s b/src/thread/arm/__unmapself.s
> > index 29c2d07b..fe0406e3 100644
> > --- a/src/thread/arm/__unmapself.s
> > +++ b/src/thread/arm/__unmapself.s
> > @@ -3,7 +3,7 @@
> >  .global __unmapself
> >  .type   __unmapself,%function
> >  __unmapself:
> > -     mov r7,#91
> > +     movs r7,#91
> >       svc 0
> > -     mov r7,#1
> > +     movs r7,#1
> >       svc 0
>
> OK, but this file can't be used on nommu anyway. Instead the generic C
> version has to be used.

LIke memcpy above (and the stuffi in src/math/arm), I made this a C
file that will either have this code as inline assembly or will
include the generic C version. The diff didn't capture this, should I
attach this separately?

> > diff --git a/src/thread/arm/atomics.S b/src/thread/arm/atomics.S
> > index da50508d..d49809d9 100644
> > --- a/src/thread/arm/atomics.S
> > +++ b/src/thread/arm/atomics.S
> > @@ -1,3 +1,5 @@
> > +#if __ARM_ARCH_ISA_ARM
> > +
> >  .syntax unified
> >  .text
> >
> > @@ -104,3 +106,5 @@ __a_cas_ptr:
> >  .hidden __a_gettp_ptr
> >  __a_gettp_ptr:
> >       .word __a_gettp_cp15
> > +
> > +#endif
>
> This file should not be conditional. If it can't be assembled in
> default mode it just needs ISA level override directives. Which
> functions are actually called is selected at runtime.

I removed the condition so that everything in here is always built and
added directives as needed.

I did run into an issue, though. Is there a way to restore the default
architecture after setting it using ".arch <arch>"? Something like
".set push" and ".set pop" in MIPS assembly? I didn't realize what the
"scope" of those .arch directives were until it occurred to me that
one of the functions shouldn't have built for ARMv6-M. I rearranged
the functions so that the ones without .arch directives come first,
which did take care of the problem for now. Trying ".arch" by itself
gets me an error from Clang.

> > diff --git a/src/thread/arm/clone.S b/src/thread/arm/clone.S
> > index bb0965da..70d29f70 100644
> > --- a/src/thread/arm/clone.S
> > +++ b/src/thread/arm/clone.S
> > @@ -4,24 +4,30 @@
> >  .hidden __clone
> >  .type   __clone,%function
> >  __clone:
> > -     stmfd sp!,{r4,r5,r6,r7}
> > -     mov r7,#120
> > +     push {r4,r5,r6,r7}
> > +     movs r7,#120
> >       mov r6,r3
> >       mov r5,r0
> >       mov r0,r2
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       and r1,r1,#-16
> > +#else
> > +     movs r2,#0
> > +     subs r2,#16
> > +     ands r1,r2
> > +#endif
> >       ldr r2,[sp,#16]
> >       ldr r3,[sp,#20]
> >       ldr r4,[sp,#24]
> >       svc 0
> >       tst r0,r0
> >       beq 1f
> > -     ldmfd sp!,{r4,r5,r6,r7}
> > +     pop {r4,r5,r6,r7}
> >       bx lr
> >
> >  1:   mov r0,r6
> >       bl 3f
> > -2:   mov r7,#1
> > +2:   movs r7,#1
> >       svc 0
> >       b 2b
>
> I think this can all be unconditional, no #if's.

Changed to remove the #ifs.

> > diff --git a/src/thread/arm/syscall_cp.S b/src/thread/arm/syscall_cp.S
> > index e607dd42..f53cbb3a 100644
> > --- a/src/thread/arm/syscall_cp.S
> > +++ b/src/thread/arm/syscall_cp.S
> > @@ -11,7 +11,7 @@
> >  .type __syscall_cp_asm,%function
> >  __syscall_cp_asm:
> >       mov ip,sp
> > -     stmfd sp!,{r4,r5,r6,r7}
> > +     push {r4,r5,r6,r7}
> >  __cp_begin:
> >       ldr r0,[r0]
> >       cmp r0,#0
> > @@ -19,11 +19,16 @@ __cp_begin:
> >       mov r7,r1
> >       mov r0,r2
> >       mov r1,r3
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       ldmfd ip,{r2,r3,r4,r5,r6}
> > +#else
> > +     mov r2,ip
> > +     ldmfd r2,{r2,r3,r4,r5,r6}
> > +#endif
> >       svc 0
> >  __cp_end:
> > -     ldmfd sp!,{r4,r5,r6,r7}
> > +     pop {r4,r5,r6,r7}
> >       bx lr
> >  __cp_cancel:
> > -     ldmfd sp!,{r4,r5,r6,r7}
> > +     pop {r4,r5,r6,r7}
> >       b __cancel
> > diff --git a/tools/install.sh b/tools/install.sh
> > old mode 100755
> > new mode 100644
>
> This can be unconditional, and maybe a register other than ip can be
> used to make it simpler..?
>
> Rich

Removed the condition.

Thanks for your help!
Jesse

[-- Attachment #2: musl_cortexm_v2.diff --]
[-- Type: application/octet-stream, Size: 11640 bytes --]

diff --git a/arch/arm/atomic_arch.h b/arch/arm/atomic_arch.h
index 9e3937cc..fa8a8573 100644
--- a/arch/arm/atomic_arch.h
+++ b/arch/arm/atomic_arch.h
@@ -27,16 +27,6 @@ static inline int a_sc(volatile int *p, int v)
 	return !r;
 }
 
-#if __ARM_ARCH_7A__ || __ARM_ARCH_7R__ ||  __ARM_ARCH >= 7
-
-#define a_barrier a_barrier
-static inline void a_barrier()
-{
-	__asm__ __volatile__ ("dmb ish" : : : "memory");
-}
-
-#endif
-
 #define a_pre_llsc a_barrier
 #define a_post_llsc a_barrier
 
@@ -62,13 +52,22 @@ static inline int a_cas(volatile int *p, int t, int s)
 
 #endif
 
-#ifndef a_barrier
 #define a_barrier a_barrier
+#if __ARM_ARCH_6M__ || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ ||  __ARM_ARCH >= 7
+
+static inline void a_barrier()
+{
+	__asm__ __volatile__ ("dmb ish" : : : "memory");
+}
+
+#else
+
 static inline void a_barrier()
 {
 	register uintptr_t ip __asm__("ip") = __a_barrier_ptr;
 	__asm__ __volatile__( BLX " ip" : "+r"(ip) : : "memory", "cc", "lr" );
 }
+
 #endif
 
 #define a_crash a_crash
diff --git a/arch/arm/crt_arch.h b/arch/arm/crt_arch.h
index 99508b1d..66080422 100644
--- a/arch/arm/crt_arch.h
+++ b/arch/arm/crt_arch.h
@@ -3,13 +3,15 @@ __asm__(
 ".global " START " \n"
 ".type " START ",%function \n"
 START ": \n"
-"	mov fp, #0 \n"
-"	mov lr, #0 \n"
+"	movs a3, #0 \n"
+"	mov fp, a3 \n"
+"	mov lr, a3 \n"
 "	ldr a2, 1f \n"
 "	add a2, pc, a2 \n"
 "	mov a1, sp \n"
-"2:	and ip, a1, #-16 \n"
-"	mov sp, ip \n"
+"2:	subs a3, #16 \n"
+"	ands a1, a3 \n"
+"	mov sp, a1 \n"
 "	bl " START "_c \n"
 ".weak _DYNAMIC \n"
 ".hidden _DYNAMIC \n"
diff --git a/arch/arm/pthread_arch.h b/arch/arm/pthread_arch.h
index e689ea21..ab8f2af2 100644
--- a/arch/arm/pthread_arch.h
+++ b/arch/arm/pthread_arch.h
@@ -1,5 +1,6 @@
-#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
- || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
+#if !(__ARM_ARCH_6M__ || __ARM_ARCH_7M__ || __ARM_ARCH_7EM__ || (__ARM_ARCH > 7 && !__ARM_ARCH_ISA_ARM)) \
+ && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
+ || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
 
 static inline pthread_t __pthread_self()
 {
diff --git a/crt/arm/crtn.s b/crt/arm/crtn.s
index dc020f92..d751f8cd 100644
--- a/crt/arm/crtn.s
+++ b/crt/arm/crtn.s
@@ -1,9 +1,11 @@
 .syntax unified
 
 .section .init
-	pop {r0,lr}
+	pop {r0,r1}
+	mov lr,r1
 	bx lr
 
 .section .fini
-	pop {r0,lr}
+	pop {r0,r1}
+	mov lr,r1
 	bx lr
diff --git a/src/ldso/arm/tlsdesc.S b/src/ldso/arm/tlsdesc.S
index 3ae133c9..f3fcdf43 100644
--- a/src/ldso/arm/tlsdesc.S
+++ b/src/ldso/arm/tlsdesc.S
@@ -12,13 +12,18 @@ __tlsdesc_static:
 .hidden __tlsdesc_dynamic
 .type __tlsdesc_dynamic,%function
 __tlsdesc_dynamic:
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 	push {r2,r3,ip,lr}
+#else
+	push {r2-r4,lr}
+#endif
 	ldr r1,[r0]
 	ldr r2,[r1,#4]  // r2 = offset
 	ldr r1,[r1]     // r1 = modid
 
-#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
- || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
+#if !(__ARM_ARCH_6M__ || __ARM_ARCH_7M__ || __ARM_ARCH_7EM__ || (__ARM_ARCH > 7 && !__ARM_ARCH_ISA_ARM)) \
+ && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
+ || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
 	mrc p15,0,r0,c13,c0,3
 #else
 	ldr r0,1f
@@ -36,19 +41,33 @@ __tlsdesc_dynamic:
 	bx r0
 #endif
 #endif
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 	ldr r3,[r0,#-4] // r3 = dtv
 	ldr ip,[r3,r1,LSL #2]
 	sub r0,ip,r0
+#else
+	mov r4,r0
+	subs r4,#4
+	ldr r3,[r4]
+	lsls r4,r1,#2
+	ldr r4,[r3,r4]
+	subs r0,r4,r0
+#endif
 	add r0,r0,r2    // r0 = r3[r1]-r0+r2
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 #if __ARM_ARCH >= 5
 	pop {r2,r3,ip,pc}
 #else
 	pop {r2,r3,ip,lr}
 	bx lr
 #endif
+#else
+	pop {r2-r4,pc}
+#endif
 
-#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
- || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
+#if !(__ARM_ARCH_6M__ || __ARM_ARCH_7M__ || __ARM_ARCH_7EM__ || (__ARM_ARCH > 7 && !__ARM_ARCH_ISA_ARM)) \
+ && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
+ || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
 #else
 	.align 2
 1:	.word __a_gettp_ptr - 2b
diff --git a/src/process/arm/vfork.s b/src/process/arm/vfork.s
index d7ec41b3..b6f0260e 100644
--- a/src/process/arm/vfork.s
+++ b/src/process/arm/vfork.s
@@ -3,7 +3,7 @@
 .type vfork,%function
 vfork:
 	mov ip, r7
-	mov r7, 190
+	movs r7, 190
 	svc 0
 	mov r7, ip
 	.hidden __syscall_ret
diff --git a/src/setjmp/arm/longjmp.S b/src/setjmp/arm/longjmp.S
index 8df0b819..0241011a 100644
--- a/src/setjmp/arm/longjmp.S
+++ b/src/setjmp/arm/longjmp.S
@@ -5,18 +5,36 @@
 .type longjmp,%function
 _longjmp:
 longjmp:
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 	mov ip,r0
 	movs r0,r1
 	moveq r0,#1
 	ldmia ip!, {v1,v2,v3,v4,v5,v6,sl,fp}
 	ldmia ip!, {r2,lr}
 	mov sp,r2
-
+#else
+	mov ip,r0
+	movs r0,r1
+	bne 4f
+	movs r0,#1
+4:	mov r1,ip
+	adds r1,#16
+	ldmia r1!, {r2-r7}
+	mov lr,r7
+	mov sp,r6
+	mov r11,r5
+	mov r10,r4
+	mov r9,r3
+	mov r8,r2
+	mov ip,r1
+	subs r1,#40
+	ldmia r1!, {r4-r7}
+#endif
 	adr r1,1f
 	ldr r2,1f
 	ldr r1,[r1,r2]
 
-#if __ARM_ARCH < 8
+#if !__ARM_ARCH_6M__ && !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && __ARM_ARCH < 8
 	tst r1,#0x260
 	beq 3f
 	// HWCAP_ARM_FPA
@@ -24,14 +42,15 @@ longjmp:
 	beq 2f
 	ldc p2, cr4, [ip], #48
 #endif
-2:	tst r1,#0x40
+2:	movs r2,#0x40
+	tst r1,r2
 	beq 2f
 	.fpu vfp
 	vldmia ip!, {d8-d15}
 	.fpu softvfp
 	.eabi_attribute 10, 0
 	.eabi_attribute 27, 0
-#if __ARM_ARCH < 8
+#if !__ARM_ARCH_6M__ && !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && __ARM_ARCH < 8
 	// HWCAP_ARM_IWMMXT
 2:	tst r1,#0x200
 	beq 3f
diff --git a/src/setjmp/arm/setjmp.S b/src/setjmp/arm/setjmp.S
index 45731d22..98e71068 100644
--- a/src/setjmp/arm/setjmp.S
+++ b/src/setjmp/arm/setjmp.S
@@ -8,17 +8,28 @@
 __setjmp:
 _setjmp:
 setjmp:
+#if defined(__thumb2__)  ||  !defined(__thumb__)
 	mov ip,r0
 	stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
 	mov r2,sp
 	stmia ip!,{r2,lr}
-	mov r0,#0
-
+#else
+	stmia r0!,{r4-r7}
+	mov r1,r8
+	mov r2,r9
+	mov r3,r10
+	mov r4,r11
+	mov r5,sp
+	mov r6,lr
+	stmia r0!,{r1-r6}
+	mov ip,r0
+#endif
+	movs r0,#0
 	adr r1,1f
 	ldr r2,1f
 	ldr r1,[r1,r2]
 
-#if __ARM_ARCH < 8
+#if !__ARM_ARCH_6M__ && !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && __ARM_ARCH < 8
 	tst r1,#0x260
 	beq 3f
 	// HWCAP_ARM_FPA
@@ -26,14 +37,15 @@ setjmp:
 	beq 2f
 	stc p2, cr4, [ip], #48
 #endif
-2:	tst r1,#0x40
+2:	movs r2,#0x40
+	tst r1,r2
 	beq 2f
 	.fpu vfp
 	vstmia ip!, {d8-d15}
 	.fpu softvfp
 	.eabi_attribute 10, 0
 	.eabi_attribute 27, 0
-#if __ARM_ARCH < 8
+#if !__ARM_ARCH_6M__ && !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && __ARM_ARCH < 8
 	// HWCAP_ARM_IWMMXT
 2:	tst r1,#0x200
 	beq 3f
diff --git a/src/signal/arm/restore.s b/src/signal/arm/restore.s
index fb086d9b..2b7621b1 100644
--- a/src/signal/arm/restore.s
+++ b/src/signal/arm/restore.s
@@ -4,12 +4,12 @@
 .hidden __restore
 .type __restore,%function
 __restore:
-	mov r7,#119
+	movs r7,#119
 	swi 0x0
 
 .global __restore_rt
 .hidden __restore_rt
 .type __restore_rt,%function
 __restore_rt:
-	mov r7,#173
+	movs r7,#173
 	swi 0x0
diff --git a/src/signal/arm/sigsetjmp.s b/src/signal/arm/sigsetjmp.s
index 69ebbf49..48ebe7cb 100644
--- a/src/signal/arm/sigsetjmp.s
+++ b/src/signal/arm/sigsetjmp.s
@@ -9,16 +9,26 @@ __sigsetjmp:
 	bne 1f
 	b setjmp
 
-1:	str lr,[r0,#256]
-	str r4,[r0,#260+8]
+1:	mov ip,r2
+	mov r2,lr
+	adds r0,#200
+	str r2,[r0,#56]
+	str r4,[r0,#60+8]
+	subs r0,#200
+	mov r2,ip
 	mov r4,r0
 
 	bl setjmp
 
 	mov r1,r0
 	mov r0,r4
-	ldr lr,[r0,#256]
-	ldr r4,[r0,#260+8]
+	mov ip,r2
+	adds r0,#200
+	ldr r2,[r0,#56]
+	mov lr,r2
+	ldr r4,[r0,#60+8]
+	subs r0,#200
+	mov r2,ip
 
 .hidden __sigsetjmp_tail
 	b __sigsetjmp_tail
diff --git a/src/thread/arm/__set_thread_area.c b/src/thread/arm/__set_thread_area.c
index 09de65aa..adccd2ce 100644
--- a/src/thread/arm/__set_thread_area.c
+++ b/src/thread/arm/__set_thread_area.c
@@ -8,7 +8,7 @@
 extern hidden const unsigned char
 	__a_barrier_oldkuser[], __a_barrier_v6[], __a_barrier_v7[],
 	__a_cas_v6[], __a_cas_v7[],
-	__a_gettp_cp15[];
+	__a_gettp_cp15[], __a_gettp_syscall[];
 
 #define __a_barrier_kuser 0xffff0fa0
 #define __a_barrier_oldkuser (uintptr_t)__a_barrier_oldkuser
@@ -26,7 +26,13 @@ extern hidden uintptr_t __a_barrier_ptr, __a_cas_ptr, __a_gettp_ptr;
 
 int __set_thread_area(void *p)
 {
-#if !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
+#if __ARM_ARCH_6M__ || __ARM_ARCH_7M__ || __ARM_ARCH_7EM__ || (__ARM_ARCH > 7 && !__ARM_ARCH_ISA_ARM)
+	if (__hwcap & HWCAP_TLS) {
+		__a_gettp_ptr = __a_gettp_syscall;
+	} else {
+		a_crash();
+	}
+#elif !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
 	if (__hwcap & HWCAP_TLS) {
 		size_t *aux;
 		__a_cas_ptr = __a_cas_v7;
diff --git a/src/thread/arm/__unmapself.s b/src/thread/arm/__unmapself.s
index 29c2d07b..fe0406e3 100644
--- a/src/thread/arm/__unmapself.s
+++ b/src/thread/arm/__unmapself.s
@@ -3,7 +3,7 @@
 .global __unmapself
 .type   __unmapself,%function
 __unmapself:
-	mov r7,#91
+	movs r7,#91
 	svc 0
-	mov r7,#1
+	movs r7,#1
 	svc 0
diff --git a/src/thread/arm/atomics.s b/src/thread/arm/atomics.s
index da50508d..af3b60a2 100644
--- a/src/thread/arm/atomics.s
+++ b/src/thread/arm/atomics.s
@@ -7,10 +7,35 @@
 __a_barrier_dummy:
 	bx lr
 
+.global __a_cas_dummy
+.hidden __a_cas_dummy
+.type __a_cas_dummy,%function
+__a_cas_dummy:
+	mov r3,r0
+	ldr r0,[r2]
+	subs r0,r3,r0
+	bne 1f
+	str r1,[r2]
+1:	bx lr
+
+.global __a_gettp_cp15
+.hidden __a_gettp_cp15
+.type __a_gettp_cp15,%function
+__a_gettp_syscall:
+	push {r7}
+	movs r7,#0xf
+	lsls r7,r7,#16
+	adds r7,#6          /* ARM get_tls syscall (0xf0006) */
+	svc 0
+	pop {r7}
+	bx lr
+
 .global __a_barrier_oldkuser
 .hidden __a_barrier_oldkuser
 .type __a_barrier_oldkuser,%function
 __a_barrier_oldkuser:
+	.arch armv6
+	.arm
 	push {r0,r1,r2,r3,ip,lr}
 	mov r1,r0
 	mov r2,sp
@@ -36,16 +61,6 @@ __a_barrier_v7:
 	dmb ish
 	bx lr
 
-.global __a_cas_dummy
-.hidden __a_cas_dummy
-.type __a_cas_dummy,%function
-__a_cas_dummy:
-	mov r3,r0
-	ldr r0,[r2]
-	subs r0,r3,r0
-	streq r1,[r2]
-	bx lr
-
 .global __a_cas_v6
 .hidden __a_cas_v6
 .type __a_cas_v6,%function
@@ -80,6 +95,8 @@ __a_cas_v7:
 .hidden __a_gettp_cp15
 .type __a_gettp_cp15,%function
 __a_gettp_cp15:
+	.arch armv6
+	.arm
 	mrc p15,0,r0,c13,c0,3
 	bx lr
 
diff --git a/src/thread/arm/clone.s b/src/thread/arm/clone.s
index bb0965da..33c2e59b 100644
--- a/src/thread/arm/clone.s
+++ b/src/thread/arm/clone.s
@@ -4,24 +4,26 @@
 .hidden __clone
 .type   __clone,%function
 __clone:
-	stmfd sp!,{r4,r5,r6,r7}
-	mov r7,#120
+	push {r4,r5,r6,r7}
+	movs r7,#120
 	mov r6,r3
 	mov r5,r0
 	mov r0,r2
-	and r1,r1,#-16
+	movs r2,#0
+	subs r2,#16
+	ands r1,r2
 	ldr r2,[sp,#16]
 	ldr r3,[sp,#20]
 	ldr r4,[sp,#24]
 	svc 0
 	tst r0,r0
 	beq 1f
-	ldmfd sp!,{r4,r5,r6,r7}
+	pop {r4,r5,r6,r7}
 	bx lr
 
 1:	mov r0,r6
 	bl 3f
-2:	mov r7,#1
+2:	movs r7,#1
 	svc 0
 	b 2b
 
diff --git a/src/thread/arm/syscall_cp.s b/src/thread/arm/syscall_cp.s
index e607dd42..421e64f4 100644
--- a/src/thread/arm/syscall_cp.s
+++ b/src/thread/arm/syscall_cp.s
@@ -11,7 +11,7 @@
 .type __syscall_cp_asm,%function
 __syscall_cp_asm:
 	mov ip,sp
-	stmfd sp!,{r4,r5,r6,r7}
+	push {r4,r5,r6,r7}
 __cp_begin:
 	ldr r0,[r0]
 	cmp r0,#0
@@ -19,11 +19,12 @@ __cp_begin:
 	mov r7,r1
 	mov r0,r2
 	mov r1,r3
-	ldmfd ip,{r2,r3,r4,r5,r6}
+	mov r2,ip
+	ldmfd r2,{r2,r3,r4,r5,r6}
 	svc 0
 __cp_end:
-	ldmfd sp!,{r4,r5,r6,r7}
+	pop {r4,r5,r6,r7}
 	bx lr
 __cp_cancel:
-	ldmfd sp!,{r4,r5,r6,r7}
+	pop {r4,r5,r6,r7}
 	b __cancel

  parent reply	other threads:[~2020-12-18  7:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16 23:43 Jesse DeGuire
2020-12-17  0:23 ` Rich Felker
2020-12-17  4:55   ` Patrick Oppenlander
2020-12-17  5:10     ` Patrick Oppenlander
2020-12-18  7:17       ` Jesse DeGuire
2020-12-22 21:43         ` Patrick Oppenlander
2021-01-06  3:24           ` Jesse DeGuire
2021-01-06  4:01             ` Patrick Oppenlander
2021-01-13 23:51               ` Jesse DeGuire
2020-12-18  7:15   ` Jesse DeGuire [this message]
2020-12-18 17:30     ` Rich Felker
2020-12-21 23:58       ` Jesse DeGuire
2020-12-22  1:39         ` Rich Felker
2020-12-18 19:38   ` Markus Wichmann
2020-12-18 20:34     ` Rich Felker
2020-12-22  0:00       ` Jesse DeGuire
2020-12-22  1:40         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALqyXLhqNtZ55Nbs62j8yKCQ0mvCzCazSCCF2MTYuhSWUwF6tQ@mail.gmail.com \
    --to=jesse.a.deguire@gmail.com \
    --cc=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).