Hi, > There is an alternate algorithm for pthread_once that doesn't require > a barrier in the common case, which I've considered implementing. But > it does need efficient access to thread-local storage. At one time, > this was a kinda bad assumption (especially legacy mips is horribly > slow at TLS) but nowadays it's probably the right choice to make, and > we should check that out again... 1¡¢Can we move dmb after we get the value of control£¿ like this£º int __pthread_once(pthread_once_t *control, void (*init)(void)) { /* Return immediately if init finished before, but ensure that * effects of the init routine are visible to the caller. */ if (*(volatile int *)control == 2) { // a_barrier(); return 0; } a_barrier(); return __pthread_once_full(control, init); } 2¡¢Can we use 'ldar' to instead of dmb here? I see musl already use 'stlxr' in a_sc. like this: static inline int load(volatile int *p) { int v; __asm__ __volatile__ ("ldar %w0,%1" : "=r"(v) : "Q"(*p)); return v; } if (load((volatile int *)control) == 2) { return 0; } ... Chuang Yin