On Tue, Oct 4, 2022 at 12:46 PM Rich Felker <dalias@libc.org> wrote:

> The atomics in musl implement the "POSIX memory model" which is much
> simpler to understand and less error-prone than the C11 one (with the
> tradeoff being that it admits a lot less optimization for
> performance), and is a valid implementation choice for the C11 one. It
> has only one relationship, "synchronizes memory", that all
> synchronization primitives and atomics entail.


Mmmm, maybe I'm weird, but I find it significantly easier to understand
when code uses the standard atomics, because there is copious information
available about that model -- what it means, the real-world implications of
those semantics, and the correct instruction sequences to properly
implement them on various architectures. Memory and concurrency models are
_really_ _hard_ no matter what (as I think this thread demonstrates), and
having a standardized model to base things on is a huge advantage. If
musl's model was "C11 atomics, but we only use seq_cst operations", that
would be wonderful...but it's not. It's something different -- with
different guarantees, and different implications, and thus requires
developers to do unique analysis.

Atomics in musl are implemented
> entirely in asm, because the compilers do not get theirs right and do
> not support the runtime selection of methods necessary for some of the
> archs we support (especially 32-bit arm and sh).


Even if you need to provide a custom implementation to workaround compiler
issues on some platforms, IMO it'd still be an improvement to mirror the
standard API/semantics -- and to use the compiler support on all the
platforms where it does work.

Though, I do believe it ought to DTRT on ARM32 Linux targets. When
targeting older CPUs that don't guarantee LLSC availability, the compiler
will generate a function call to a libgcc function. That library function
then calls the kernel-provided kuser_helper cmpxchg and barrier functions.
(gcc/libgcc/config/arm/linux-atomic.c for the libgcc side). Then, which
instruction sequence is used to implement the atomics is handled purely by
the kernel helper. This design _should_ be correct for all ARM CPUs, but
with a bit of overhead if running on a modern CPU (because operations like
fetch_add get implemented on top of cmpxchg). But, I dunno, perhaps there's
bugs.

I've never looked at the situation on SuperH...but going by the GCC
manual's description of -matomic-model...yikes...that does look like a
complete mess of a situation all around.