Even if you need to provide a custom implementation to workaround compiler issues on some platforms, IMO it'd still be an improvement to mirror the standard API/semantics -- and to use the compiler support on all the platforms where it does work.
Though, I do believe it ought to DTRT on ARM32 Linux targets. When targeting older CPUs that don't guarantee LLSC availability, the compiler will generate a function call to a libgcc function. That library function then calls the kernel-provided kuser_helper cmpxchg and barrier functions. (gcc/libgcc/config/arm/linux-atomic.c for the libgcc side). Then, which instruction sequence is used to implement the atomics is handled purely by the kernel helper. This design _should_ be correct for all ARM CPUs, but with a bit of overhead if running on a modern CPU (because operations like fetch_add get implemented on top of cmpxchg). But, I dunno, perhaps there's bugs.
I've never looked at the situation on SuperH...but going by the GCC manual's description of -matomic-model...yikes...that does look like a complete mess of a situation all around.