It came to my attention as part of reviewing and testing the or1k port (actually I was aware of this issue before but somehow missed it for powerpc) that using "r"(ptr) input constraints for object modified by inline asm is not valid, unless the asm block is volatile-qualified: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile However, the better way to tell the compiler that pointed-to data will be accessed/modified is to use a type "+m"(*ptr) constraint: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers Presently, we're using the latter on x86[_64] and arm, the former on mips and microblaze, and neither on powerpc. Thus powerpc is clearly broken. I've attached a patch (untested) that should fix powerpc and also make the mips and microblaze versions consistent, but before applying it, I want to make sure that using only the "+m"(*ptr) approach (plus "memory" in the clobberlist, of course) is actually sufficient. My impression is that there is only one scenario where the semantics might differ. Consider: { int x; __asm__ __volatile__ ( ... : : "r"(&x) : "memory" ); } vs. { int x; __asm__ ( ... : "+m"(x) : : "memory" ); } In the first case, the asm being volatile forces it to be emitted. In the second case, since the address of x has not leaked and since the asm is not "needed" (by the AST) except for possibly writing x, whose lifetime is immediately ending, it seems like the compiler could chose to omit the entire block, thereby also omitting any other side effects (e.g. synchronizing memory between cores) that the asm might have. I'm not aware of anywhere in musl where this scenario would arise (using atomics on an object whose lifetime is ending) but perhaps making the asm volatile everywhere is actually more correct semantically anyway. On the other hand, if LTO caused this such optimizations as in case 2 above to be applied for the last unlock before freeing a reference-counted object, I think omitting the atomics and the memory-synchronization would actually be the correct optimizing behavior -- nothing else can see the object at this point, so it's acceptable for the unlock to be a nop. If we do opt for volatile asm everywhere, should we also use the "+m"(ptr) operand form just for the sake of being explicit that the asm accesses the atomic object? Rich