mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Spin logic flaw in mutexes, etc.
@ 2024-10-04 16:25 Rich Felker
  0 siblings, 0 replies; only message in thread
From: Rich Felker @ 2024-10-04 16:25 UTC (permalink / raw)
  To: musl

I realized while reviewing how we opportunistically spin trying to
obtain locks that there's a problem with the logic: it was intended
not to spin if there are already waiters (this is both a fairness
consideration and a wasted-cycles one), but it can only see waiters
that have already enterred the futex-wait phase. Waiters which are
spinning are invisible. This leads to a situation where a large number
of threads arriving at the same time hammering a lock will all spin.

Just incrementing the waiter count before the spin is an obvious
remedy, but by itself that would lead to the unlocking thread
performing a useless and costly futex wake.

I think we could redefine the waiter count to be adjusted by 1, so
that a value of 1 would mean there's a spinning waiter, and N>1 would
mean N-1 waiters potentially in futex wait that require a wake.

With this approach, I think it would work to have the caller remember
that it performed an additional waiters increment that needs to be
reverted, or just do logic for transitioning 2->0 rather than 2->1. I
thought this could be put in __wait, but I believe all the important
callers actually use __timedwait which does not integrate the waiters
increment/decrement logic.

There might be other code paths that exit early where the waiter
accounting needs to be reverted too though. For example I see one on
line 80 of pthread_mutex_timedlock.c, but I think having this check is
erroneous anyway, since the earier call to pthread_mutex_trylock would
necessarily have caught this condition in the absence of
state-corrupting UB.

I'm not sure if we should try to avoid ever returning the waiters
count to 0 between futex waits, which can happen now if there's only
one waiter. Returning the count momentarily to zero allows a newly
arriving waiter to spin (stealing the opportunity to take the lock).
However stealing the lock is always possible before spinning anyway,
anyway due to the trylock, and I don't think any complete mitigation
for this is possible.

The affected functions seem to be:

- pthread_barrier_wait
- pthread_mutex_timedwait
- pthread_rwlock_timedrdlock
- pthread_rwlock_timedwrlock
- sem_timedwait

and of course their callers. A similar issue applies to the internal
__lock, but as it spins a lot fewer times and is called only in more
controlled contexts, I'm not sure if it matters. The same mitigations
can probably be applied to it if needed.

Rich

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-10-04 16:25 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-04 16:25 [musl] Spin logic flaw in mutexes, etc Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).