It came to my attention as part of reviewing and testing the or1k port
(actually I was aware of this issue before but somehow missed it for
powerpc) that using "r"(ptr) input constraints for object modified by
inline asm is not valid, unless the asm block is volatile-qualified:

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile

However, the better way to tell the compiler that pointed-to data will
be accessed/modified is to use a type "+m"(*ptr) constraint:

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers

Presently, we're using the latter on x86[_64] and arm, the former on
mips and microblaze, and neither on powerpc. Thus powerpc is clearly
broken. I've attached a patch (untested) that should fix powerpc and
also make the mips and microblaze versions consistent, but before
applying it, I want to make sure that using only the "+m"(*ptr)
approach (plus "memory" in the clobberlist, of course) is actually
sufficient.

My impression is that there is only one scenario where the semantics
might differ. Consider:

{
	int x;
	__asm__ __volatile__ ( ... : : "r"(&x) : "memory" );
}

vs.

{
	int x;
	__asm__ ( ... : "+m"(x) : : "memory" );
}

In the first case, the asm being volatile forces it to be emitted. In
the second case, since the address of x has not leaked and since the
asm is not "needed" (by the AST) except for possibly writing x, whose
lifetime is immediately ending, it seems like the compiler could chose
to omit the entire block, thereby also omitting any other side effects
(e.g. synchronizing memory between cores) that the asm might have.

I'm not aware of anywhere in musl where this scenario would arise
(using atomics on an object whose lifetime is ending) but perhaps
making the asm volatile everywhere is actually more correct
semantically anyway. On the other hand, if LTO caused this such
optimizations as in case 2 above to be applied for the last unlock
before freeing a reference-counted object, I think omitting the
atomics and the memory-synchronization would actually be the correct
optimizing behavior -- nothing else can see the object at this point,
so it's acceptable for the unlock to be a nop.

If we do opt for volatile asm everywhere, should we also use the
"+m"(ptr) operand form just for the sake of being explicit that the
asm accesses the atomic object?

Rich