Hello Alexander, On Tue, 20 Jun 2017 17:41:35 +0300 (MSK) Alexander Monakov wrote: > > + /* We can only go into wait, if we know that > > somebody holds the > > + lock and will eventually wake us up, again. */ > > + if (current < 0) { > > + __futexwait(l, current, 1); > > Here you reuse the value of 'current' after resuming after a > futex-wait. But if the lock is under contention, with high > probability other threads have modified the lock while we were > sleeping, so the value of 'current' is likely stale, and the > following a_cas is pretty much guaranteed to fail. Wouldn't it be > better to reload current = *l after resuming? Not to my experience. The benchs that you can find in the references that I have given, clearly show that this strategy is better than the current lock strategy (and a lot of variants that I tried.) These benchmarks were performed on two architectures that are representative, I think: x86 for one that locks the bus and arm for one that uses a monitor. (And the gain on arm is more important.) For all what I found, the real cost here is the bus transfer (caches are probably not uptodate, anyhow) so doing a CAS compared to a load is not much more expensive. On the other hand, if we guess "current" correctly, we are done. Also this guess is not as bad as it might appear. Comming out of the wait should usually be after a wakeup, were just the current lock holder left the CS. So the educated guess in that situation, that the lock is free and there is one thread less in the CS, works quite well. And if it doesn't, and all other threads of the CS are in kernel sleep, the next iteration does the trick. So for the usual case you'd deal one forced load plus one CAS, against a single CAS in the good case, and two CAS in the bad one. After coming back from a kernel sleep, you have other things to worry (reloading all your data, e.g). Again, I'd like to emphasize that the performance of this part is not very important. I wouldn't know a program could hammer as much on a lock that uses this code inside musl, such that a lot of threads are stuck in that congestion loop. Thanks Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::