Hi everyone, Here's attempt three at Musl on Arm M-profile devices. I think I've incorporated all of the suggestions in this email thread. Let me know your thoughts. More specific responses are below. On Mon, Dec 21, 2020 at 8:39 PM Rich Felker wrote: > > On Mon, Dec 21, 2020 at 06:58:47PM -0500, Jesse DeGuire wrote: > > On Fri, Dec 18, 2020 at 12:30 PM Rich Felker wrote: > > > If it lacks LDREX and STREX how do you implement atomic? I don't see > > > where you're adding any alternative, so is v6-m support > > > non-functional? That rather defeats the purpose of doing anything to > > > support it... > > > > Correct, I haven't yet added an alternative. Arm's answer--and what we > > generally do in the embedded world--is to disable interrupts using > > "cpsid", do your thing, then re-enable interrupts with "cpsie". This > > could be done with a new "__a_cas_v6m" variant that I'd add to > > atomics.s. This still won't work for Linux because the "cps(ie|id)" > > instruction is effectively a no-op if it is executed in an > > unprivileged context (meaning you can't trap and emulate it). You'd be > > looking at another system call if you really wanted v6-m Linux. That > > said, this could let Musl work on v6-m in a bare metal or RTOS > > environment, which I think Musl would be great for, and so I'd still > > work on adding support for it. Also, not all v6-m devices support a > > privilege model and run as though everything is privileged. > > ARMv8-M.base is similar to v6-m with LDREX and STREX and so that could > > have full support. > > I'm not sure what the right answer for this is and whether it makes > support suitable for upstream or not at this point. We should probably > investigate further. If LDREX/STREX are trappable we could just use > them and require the kernel to trap and emulate but that's kinda ugly > (llsc-type atomics are much harder to emulate correctly than a cas > primitive). LDREX/STREX should cause a fault on v6m that can be trapped and handled. It's unfortunate that CPSID/CPSIE are ignored because trapping on those seem like they'd be easier for whoever has to handle them. My current solution is to use the HWCAP_TLS flag on M-profile devices to indicate how to handle this. If it's set, then the code will use STREX, LDREX, and MRC with the assumption that they will be trapped and emulated. If the flag is cleared, then the code will use the ARM get_tls syscall instead of MRC and will use interrupt masking only if the platform (aux{AT_PLATFORM]) indicates a "v6" device. This does mean that I'm overloading an already overloaded flag, but I'm not yet sure how else to handle this. I can easily not set the flag in my bare metal environment and get the behavior I want, but I'm not sure if this is sufficient for nommu Linux users. > > > With M profile support, though, AIUI it's possible that you have the > > > atomics but not the thread pointer. You should not assume that lack of > > > HWCAP_TLS implies lack of the atomics; rather you just have to assume > > > the atomics are present, I think, or use some other means of detection > > > with fallback to interrupt masking (assuming these devices have no > > > kernel/user distinction that prevents you from masking interrupts). > > > HWCAP_TLS should probably be probed just so you don't assume the > > > syscall exists in case a system omits it and does trap-and-emulate for > > > the instruction instead. > > > > I think I'm starting to understand this, which is good because it's > > looking like my startup code for the micros will need to properly set > > HWCAP before Musl can be used. I assume I'll need to set that > > 'aux{"AT_PLATFORM"}' to "v6" or "v7" as well to make this runtime > > detection work properly. I'll have to figure out if "v6m" and "v7m" > > are supported values for the platform. I may have more questions in > > the future as I try actually implementing something. > > Yes that sounds right. There are other aux vector entries that have to > be set correctly too for startup code, particularly AT_PHDR for > __init_tls to find the program headers (and for dl_iterate_phdr to > work). On some archs AT_HWCAP and AT_PLATFORM are also needed for > detection of features. AT_MINSIGSTKSZ is needed if the signal frame > size is variable and may exceed the default one defined in the macro. > AT_RANDOM is desirable for hardening but not mandatory. AT_EXECFN is > used as a fallback for program_invocation_name if auxv[0] is missing. > AT_SYSINFO_EHDR is used to offer vdso but is optional. And AT_*ID and > AT_SECURE are used to control behavior under suid (not trust > environment, etc.). Good to know, thanks. On Tue, Dec 22, 2020 at 4:44 PM Patrick Oppenlander wrote: > > On Fri, Dec 18, 2020 at 6:17 PM Jesse DeGuire wrote: > > > > On Thu, Dec 17, 2020 at 12:10 AM Patrick Oppenlander > > wrote: > > > > > > On Thu, Dec 17, 2020 at 3:55 PM Patrick Oppenlander > > > wrote: > > > > > > > > On Thu, Dec 17, 2020 at 11:24 AM Rich Felker wrote: > > > > > > > > > > On Wed, Dec 16, 2020 at 06:43:15PM -0500, Jesse DeGuire wrote: > > > > > > Hey everyone, > > > > > > > > > > > > I'm working on putting together a Clang-based toolchain to use with > > > > > > Microchip PIC32 (MIPS32) and SAM (Cortex-M) microcontrollers as an > > > > > > alternative to their paid XC32 toolchain and I'd like to use Musl as > > > > > > the C library. Currently, I'm trying to get it to build for a few > > > > > > different Cortex-M devices and have found that Musl builds fine for > > > > > > ARMv7-M, but not for ARMv6-M or v8-M Baseline because Musl uses > > > > > > instructions not supported on the Thumb ISA subset used by v6-M and > > > > > > v8-M Baseline devices. I'm using the v1.2.1 tag version of Musl, but > > > > > > can easily switch to HEAD if needed. I am using a Python script to > > > > > > build Musl (and eventually the rest of the toolchain), which you can > > > > > > see on GitHub at the following link. It's a bit of a mess at the > > > > > > moment, but the build_musl() function is what I'm currently using to > > > > > > build Musl. > > > > > > > > > > I had assumed the thumb1[-ish?] subset wouldn't be interesting, but if > > > > > it is, let's have a look. > > > > > > > > > > > https://github.com/jdeguire/buildPic32Clang/blob/master/buildPic32Clang.py > > > > > > > > > > > > Anyway, I have managed to get Musl to build for v6-M, v7-M, and v8-M > > > > > > Baseline and have attached a diff to this email. If you like, I can go > > > > > > into more detail as to why I made the changes I made; however, many > > > > > > changes were merely the result of my attempts to correct errors > > > > > > reported by Clang due to it encountering instruction sequences not > > > > > > supported on ARMv6-M. > > > > > > > > > > Are there places where clang's linker is failing to make substitutions > > > > > that the GNU one can do, that would make this simpler? For example I > > > > > know the GNU one can replace bx rn by mov pc,rn if needed (based on a > > > > > relocation the assembler emits on the insn). > > > > > > > > > > > A number of errors were simply the result of > > > > > > ARMv6-M requiring one to use the "S" variant of an instruction that > > > > > > sets status flags (such as "ADDS" vs "ADD" or "MOVS" vs "MOV"). A few > > > > > > files I had to change from a "lower case s" to a "capital-S" file so > > > > > > that I could use macros to check for either the Thumb1 ISA > > > > > > ("__thumb2__ || !__thumb__") or for an M-Profile device > > > > > > ("!__ARM_ARCH_ISA_ARM"). > > > > > > > > > > Is __ARM_ARCH_ISA_ARM universally available (even on old compilers)? > > > > > If not this may need an alternate detection. But I'd like to drop as > > > > > much as possible and just make the code compatible rather than having > > > > > 2 versions of it. I don't think there are any places where the > > > > > performance or size is at all relevant. > > > > > > > > > > > The changes under > > > > > > "src/thread/arm/__set_thread_area.c" are different in that I made > > > > > > those because I don't believe Cortex-M devices could handle what was > > > > > > there (no M-Profile device has Coprocessor 15, for example) and so I > > > > > > > > > > Unless this is an ISA level that can't be executed on a normal (non-M) > > > > > ARM profile, it still needs all the backends that might be needed and > > > > > runtime selection of which to use. This is okay. I believe Linux for > > > > > nommu ARM has a syscall for get_tp, which is rather awful but probably > > > > > needs to be added as a backend. The right way to do this would have > > > > > been with trap-and-emulate (of cp15) I think... > > > > > > > > Linux emulates mrc 15 on old -A cores but they decided not to on -M > > > > for some reason. BTW, the syscall is called get_tls. > > > > > > > > Is there any option other than supporting the get_tls syscall? Even if > > > > someone puts in the effort to wire up the trap-and-emulate backend, > > > > musl linked executables will still only run on new kernels. > > > > > > > > I took the trap-and-emulate approach in Apex RTOS to avoid opening > > > > this can of worms. It's the only missing link for musl on armv7-m. > > > > Everything else works beautifully. > > > > > > Another consideration is qemu-user: Currently it aborts when > > > encountering an mrc 15 instruction while emulating armv7-m. I guess > > > that would probably also be solved by supporting the syscall. > > > > > > Patrick > > > > ARMv6-M and v8-M.base do not support the MRC instruction at all. Could > > that play into why Linux and qemu bail? > > > > Jesse > > Sorry, I missed this reply. > > qemu-user refuses to translate the instruction because cp15 is not > implemented on armv7-m, exactly the same issue as is being discussed > here. If you run the same executable but tell qemu to emulate an A > profile core instead it happily runs it. > > Linux will probably kill the executable with SIGILL or something like > that (I haven't tried, just guessing). > > It's related to this discussion as changing musl to use the syscall > will likely result in qemu-user working too. > > I would personally prefer to see a solution which doesn't use the > syscall. It's possible to implement the trap-and-emulate much more > efficiently than the syscall as it can quite easily be done without > preserving any more registers than the core pushes on exception entry > anyway. https://github.com/apexrtos/apex/blob/master/sys/arch/arm/v7m/emulate.S > is what I came up with. That implementation could be even tighter as > it can never run from handler mode, so the stack detection at the > beginning is unnecessary. However, I haven't considered v6-m or v8-m. > > trap-and-emulate also gracefully degrades when running the same > executable on A vs M cores. > > Patrick Any thoughts on what's shown in this patch? For your RTOS and v7m/v8m, I'm thinking you'd be able to get the behavior you want by setting the HWCAP_TLS flag early in your startup code. For my purposes, I plan to use the syscall because I intend to eventually make a "baremetal" arch in Musl that turns syscalls into simple function calls. Therefore, I'd clear the flag in my startup code. -Jesse