On Tue, Oct 4, 2022 at 12:46 PM Rich Felker <dalias@libc.org> wrote:  
The atomics in musl implement the "POSIX memory model" which is much
simpler to understand and less error-prone than the C11 one (with the
tradeoff being that it admits a lot less optimization for
performance), and is a valid implementation choice for the C11 one. It
has only one relationship, "synchronizes memory", that all
synchronization primitives and atomics entail.

Mmmm, maybe I'm weird, but I find it significantly easier to understand when code uses the standard atomics, because there is copious information available about that model -- what it means, the real-world implications of those semantics, and the correct instruction sequences to properly implement them on various architectures. Memory and concurrency models are _really_ _hard_ no matter what (as I think this thread demonstrates), and having a standardized model to base things on is a huge advantage. If musl's model was "C11 atomics, but we only use seq_cst operations", that would be wonderful...but it's not. It's something different -- with different guarantees, and different implications, and thus requires developers to do unique analysis.

Atomics in musl are implemented
entirely in asm, because the compilers do not get theirs right and do
not support the runtime selection of methods necessary for some of the
archs we support (especially 32-bit arm and sh).

Even if you need to provide a custom implementation to workaround compiler issues on some platforms, IMO it'd still be an improvement to mirror the standard API/semantics -- and to use the compiler support on all the platforms where it does work.

Though, I do believe it ought to DTRT on ARM32 Linux targets. When targeting older CPUs that don't guarantee LLSC availability, the compiler will generate a function call to a libgcc function. That library function then calls the kernel-provided kuser_helper cmpxchg and barrier functions. (gcc/libgcc/config/arm/linux-atomic.c for the libgcc side). Then, which instruction sequence is used to implement the atomics is handled purely by the kernel helper. This design _should_ be correct for all ARM CPUs, but with a bit of overhead if running on a modern CPU (because operations like fetch_add get implemented on top of cmpxchg). But, I dunno, perhaps there's bugs.

I've never looked at the situation on SuperH...but going by the GCC manual's description of -matomic-model...yikes...that does look like a complete mess of a situation all around.