On Tue, Oct 4, 2022 at 12:46 PM Rich Felker wrote: > The atomics in musl implement the "POSIX memory model" which is much > simpler to understand and less error-prone than the C11 one (with the > tradeoff being that it admits a lot less optimization for > performance), and is a valid implementation choice for the C11 one. It > has only one relationship, "synchronizes memory", that all > synchronization primitives and atomics entail. Mmmm, maybe I'm weird, but I find it significantly easier to understand when code uses the standard atomics, because there is copious information available about that model -- what it means, the real-world implications of those semantics, and the correct instruction sequences to properly implement them on various architectures. Memory and concurrency models are _really_ _hard_ no matter what (as I think this thread demonstrates), and having a standardized model to base things on is a huge advantage. If musl's model was "C11 atomics, but we only use seq_cst operations", that would be wonderful...but it's not. It's something different -- with different guarantees, and different implications, and thus requires developers to do unique analysis. Atomics in musl are implemented > entirely in asm, because the compilers do not get theirs right and do > not support the runtime selection of methods necessary for some of the > archs we support (especially 32-bit arm and sh). Even if you need to provide a custom implementation to workaround compiler issues on some platforms, IMO it'd still be an improvement to mirror the standard API/semantics -- and to use the compiler support on all the platforms where it does work. Though, I do believe it ought to DTRT on ARM32 Linux targets. When targeting older CPUs that don't guarantee LLSC availability, the compiler will generate a function call to a libgcc function. That library function then calls the kernel-provided kuser_helper cmpxchg and barrier functions. (gcc/libgcc/config/arm/linux-atomic.c for the libgcc side). Then, which instruction sequence is used to implement the atomics is handled purely by the kernel helper. This design _should_ be correct for all ARM CPUs, but with a bit of overhead if running on a modern CPU (because operations like fetch_add get implemented on top of cmpxchg). But, I dunno, perhaps there's bugs. I've never looked at the situation on SuperH...but going by the GCC manual's description of -matomic-model...yikes...that does look like a complete mess of a situation all around.