mailing list of musl libc
 help / color / mirror / code / Atom feed
* New optimized normal-type mutex?
@ 2015-05-21 23:44 Rich Felker
  2015-05-22  7:30 ` Jens Gustedt
  2015-07-29 12:09 ` Joakim Sindholt
  0 siblings, 2 replies; 20+ messages in thread
From: Rich Felker @ 2015-05-21 23:44 UTC (permalink / raw)
  To: musl

I realized (thanks to conversations with wm4) that it's possible to
optimize normal-type mutexes a lot to eliminate spurious futex wakes
and reduce the number of atomic ops/barriers under contention.

The concept is to store the full mutex state in a single int:

bits 0-29: waiter count
bit 30: waiter flag
bit 31: lock bit

Then unlock is:

old = a_fetch_and(p, 0x3fffffff); // new op
if (old & 0x40000000) wake(p,1);

Trylock is:

old = *p;
if (old < 0) {
	a_barrier();
	old = *p;
}
if (old >= 0 && a_cas(p, old, old|INT_MIN)) return 0;

Lock is:

while (spins--) {
	if (!trylock()) return 0;
	a_spin();
}
while ((old = *p) < 0) {
	new = old+1 | 0x40000000;
	if (a_cas(p, old, new)==old) break;
}
for(;;) {
	while ((old = *p) < 0) {
		futex_wait(p, old);
	}
	new = old-1;
	if (new) new |= 0x40000000; // set waiters flag if any other waiters
	new |= INT_MIN;
	if (a_cas(p, old, new)==old) return 0;
}

The waiter flag marks that unlocking needs to send a wake. It is set
whenever a new waiter arrives, and when a waiter that consumed a wake
observes that there are more waiters left to be woken. The flag is
notably clear when the same thread which just released the mutex
happens to obtain it again before an existing waiter does (when
hammering the mutex); this is where we should see huge savings, since
the subsequent unlock(s) will not produce additional futex wakes which
(1) incur syscall overhead, and (2) cause additional waiters which
cannot yet get the mutex to wake up, try to lock it, and go back to
waiting again. Further:

The code path for unlock has exactly one atomic.

The code path for trylock has at least one barrier and at most (only
on success) one atomic, but may have a spurious barrier if a stale
value was initially read and the operation subsequently succeeds.

The code path for lock on contention (trylock failed and thus produced
no atomic changes, only barriers/spins), contains one atomic cas to
set the waiter state (versus 2 in the current musl implementation),
whatever the kernel does in futex, and one more atomic cas to finally
take the lock (versus 2 in current musl).

Are these changes worthwhile and worth having additional custom code
for the normal-type mutex? I'm not sure. We should probably do an
out-of-tree test implementation to measure differences then consider
integrating in musl if it helps.

If it does look helpful to musl, it may make sense to use this code as
the implementation for __lock/__unlock (implementation-internal locks)
and thereby shrink them all from int[2] to int[1], and then have the
pthread mutex code tail-call to these functions for the normal-mutex
case.

Even if it's not a performance help for musl, the above design may be
very nice as a third-party mutex library since it handles
self-synchronized destruction, minimizes spurious futex wakes, and
fits the whole lock in a single int.

Rich


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-08-04  5:49 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-21 23:44 New optimized normal-type mutex? Rich Felker
2015-05-22  7:30 ` Jens Gustedt
2015-05-22  7:51   ` Rich Felker
2015-07-29 12:09 ` Joakim Sindholt
2015-07-29 22:11   ` Jens Gustedt
2015-07-29 23:30     ` Rich Felker
2015-07-29 23:49       ` Jens Gustedt
2015-07-30  0:10         ` Rich Felker
2015-07-30  8:07           ` Jens Gustedt
2015-07-30  9:10             ` Jens Gustedt
2015-07-30  9:36               ` Alexander Monakov
2015-07-30 10:00                 ` Jens Gustedt
2015-07-30 11:37                   ` Alexander Monakov
2015-07-30 13:46                     ` Rich Felker
2015-07-30 16:07                       ` Jens Gustedt
2015-08-03 16:36                         ` Alexander Monakov
2015-08-03 19:43                           ` Jens Gustedt
2015-08-03 20:05                             ` Isaac Dunham
2015-08-04  5:49                               ` Jens Gustedt
2015-07-30 13:45             ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).