* [musl] Spin logic flaw in mutexes, etc.
@ 2024-10-04 16:25 Rich Felker
0 siblings, 0 replies; only message in thread
From: Rich Felker @ 2024-10-04 16:25 UTC (permalink / raw)
To: musl
I realized while reviewing how we opportunistically spin trying to
obtain locks that there's a problem with the logic: it was intended
not to spin if there are already waiters (this is both a fairness
consideration and a wasted-cycles one), but it can only see waiters
that have already enterred the futex-wait phase. Waiters which are
spinning are invisible. This leads to a situation where a large number
of threads arriving at the same time hammering a lock will all spin.
Just incrementing the waiter count before the spin is an obvious
remedy, but by itself that would lead to the unlocking thread
performing a useless and costly futex wake.
I think we could redefine the waiter count to be adjusted by 1, so
that a value of 1 would mean there's a spinning waiter, and N>1 would
mean N-1 waiters potentially in futex wait that require a wake.
With this approach, I think it would work to have the caller remember
that it performed an additional waiters increment that needs to be
reverted, or just do logic for transitioning 2->0 rather than 2->1. I
thought this could be put in __wait, but I believe all the important
callers actually use __timedwait which does not integrate the waiters
increment/decrement logic.
There might be other code paths that exit early where the waiter
accounting needs to be reverted too though. For example I see one on
line 80 of pthread_mutex_timedlock.c, but I think having this check is
erroneous anyway, since the earier call to pthread_mutex_trylock would
necessarily have caught this condition in the absence of
state-corrupting UB.
I'm not sure if we should try to avoid ever returning the waiters
count to 0 between futex waits, which can happen now if there's only
one waiter. Returning the count momentarily to zero allows a newly
arriving waiter to spin (stealing the opportunity to take the lock).
However stealing the lock is always possible before spinning anyway,
anyway due to the trylock, and I don't think any complete mitigation
for this is possible.
The affected functions seem to be:
- pthread_barrier_wait
- pthread_mutex_timedwait
- pthread_rwlock_timedrdlock
- pthread_rwlock_timedwrlock
- sem_timedwait
and of course their callers. A similar issue applies to the internal
__lock, but as it spins a lot fewer times and is called only in more
controlled contexts, I'm not sure if it matters. The same mitigations
can probably be applied to it if needed.
Rich
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-10-04 16:25 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-04 16:25 [musl] Spin logic flaw in mutexes, etc Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).