* [musl] Memory lock needed for semaphores? @ 2024-08-04 7:29 Markus Wichmann 2024-08-04 13:31 ` Rich Felker 0 siblings, 1 reply; 7+ messages in thread From: Markus Wichmann @ 2024-08-04 7:29 UTC (permalink / raw) To: musl Hi all, in the light of the recent thread about barriers I looked at the semaphore implementation under the light of the sudden unmap issue again. To recap, the issue is that multiple threads may interact with the same pshared semaphore, and in particular one thread might unmap a semaphore that another thread is currently still interacting with. Is that an issue with semaphores? I'm thinking of the following scenario: Start: One thread of another process is blocked at the semaphore (sem->__val = {INT_MIN, 1, 0}). Thread 1: Calls sem_post(), gets until just after the a_cas() before being preempted (sem->__val = {1, 1, 0}, but this thread saw val < 0 and waiters <= 1, so broadcast wake will issue once thread resumes). Thread 2: Calls sem_trywait(), sets sem->__val = {0, 1, 0}. Thread 2: Calls sem_post(), sets sem->__val = {1, 1, 0}, does not wake. Thread 2: Unmaps the semaphore. Thread 1: Resumes. SYS_futex returns EFAULT. Other process remains sleeping. Am I missing something that makes this impossible? If not, do we maybe need a vmlock for posting pshared semaphores? That would probably be the easiest fix. Ciao, Markus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] Memory lock needed for semaphores? 2024-08-04 7:29 [musl] Memory lock needed for semaphores? Markus Wichmann @ 2024-08-04 13:31 ` Rich Felker 2024-08-04 22:40 ` Rich Felker 0 siblings, 1 reply; 7+ messages in thread From: Rich Felker @ 2024-08-04 13:31 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl, Alexey Izbyshev On Sun, Aug 04, 2024 at 09:29:03AM +0200, Markus Wichmann wrote: > Hi all, > > in the light of the recent thread about barriers I looked at the > semaphore implementation under the light of the sudden unmap issue > again. To recap, the issue is that multiple threads may interact with > the same pshared semaphore, and in particular one thread might unmap a > semaphore that another thread is currently still interacting with. > > Is that an issue with semaphores? I'm thinking of the following > scenario: > > Start: One thread of another process is blocked at the semaphore > (sem->__val = {INT_MIN, 1, 0}). > > Thread 1: Calls sem_post(), gets until just after the a_cas() before > being preempted (sem->__val = {1, 1, 0}, but this thread saw val < 0 and > waiters <= 1, so broadcast wake will issue once thread resumes). > > Thread 2: Calls sem_trywait(), sets sem->__val = {0, 1, 0}. > Thread 2: Calls sem_post(), sets sem->__val = {1, 1, 0}, does not wake. > Thread 2: Unmaps the semaphore. > Thread 1: Resumes. SYS_futex returns EFAULT. Other process remains > sleeping. > > Am I missing something that makes this impossible? If not, do we maybe > need a vmlock for posting pshared semaphores? That would probably be the > easiest fix. This is not possible and would be a misuse of it even if it were. sem_post is required to be AS-safe, so it cannot take any sort of lock. I think the issue you report is real, but provided that's the case, it's a flaw in the logic to be relying on the first sem_post to perform the wake the second logically needs to perform. I haven't yet looked into what the cost of leaving the waiters flag set would be. There are kinda 3 approaches I can see exploring to fix this, and I'm not sure which are low-cost or even valid yet: 1. Have post only clear the waiters flag if the count is <1 (not <=1) 2. Have trywait re-add a waiters flag if it sees waiters>0 3. Have post perform the wake if it saw waiters>0 even if waiters bit was not set. I'll look at this more later. I've CC'd Alexey Izbyshev who introduced the current waiter accounting. Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] Memory lock needed for semaphores? 2024-08-04 13:31 ` Rich Felker @ 2024-08-04 22:40 ` Rich Felker 2024-08-07 7:15 ` Markus Wichmann 2024-08-10 20:47 ` Rich Felker 0 siblings, 2 replies; 7+ messages in thread From: Rich Felker @ 2024-08-04 22:40 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl, Alexey Izbyshev On Sun, Aug 04, 2024 at 09:31:10AM -0400, Rich Felker wrote: > On Sun, Aug 04, 2024 at 09:29:03AM +0200, Markus Wichmann wrote: > > Hi all, > > > > in the light of the recent thread about barriers I looked at the > > semaphore implementation under the light of the sudden unmap issue > > again. To recap, the issue is that multiple threads may interact with > > the same pshared semaphore, and in particular one thread might unmap a > > semaphore that another thread is currently still interacting with. > > > > Is that an issue with semaphores? I'm thinking of the following > > scenario: > > > > Start: One thread of another process is blocked at the semaphore > > (sem->__val = {INT_MIN, 1, 0}). > > > > Thread 1: Calls sem_post(), gets until just after the a_cas() before > > being preempted (sem->__val = {1, 1, 0}, but this thread saw val < 0 and > > waiters <= 1, so broadcast wake will issue once thread resumes). > > > > Thread 2: Calls sem_trywait(), sets sem->__val = {0, 1, 0}. > > Thread 2: Calls sem_post(), sets sem->__val = {1, 1, 0}, does not wake. > > Thread 2: Unmaps the semaphore. > > Thread 1: Resumes. SYS_futex returns EFAULT. Other process remains > > sleeping. > > > > Am I missing something that makes this impossible? If not, do we maybe > > need a vmlock for posting pshared semaphores? That would probably be the > > easiest fix. > > This is not possible and would be a misuse of it even if it were. > sem_post is required to be AS-safe, so it cannot take any sort of > lock. > > I think the issue you report is real, but provided that's the case, > it's a flaw in the logic to be relying on the first sem_post to > perform the wake the second logically needs to perform. I haven't yet > looked into what the cost of leaving the waiters flag set would be. > > There are kinda 3 approaches I can see exploring to fix this, and I'm > not sure which are low-cost or even valid yet: > > 1. Have post only clear the waiters flag if the count is <1 (not <=1) > 2. Have trywait re-add a waiters flag if it sees waiters>0 > 3. Have post perform the wake if it saw waiters>0 even if waiters bit > was not set. > > I'll look at this more later. I've CC'd Alexey Izbyshev who introduced > the current waiter accounting. From the original thread: https://www.openwall.com/lists/musl/2022/11/22/3 On Tue, Nov 22, 2022 at 08:41:44AM +0300, Alexey Izbyshev wrote: > ... > * WAITERS_BIT is set and waiters > 1: don't reset WAITERS_BIT since > it's likely that some waiters will remain (and it's always fine to > err on the side of preserving the WAITERS_BIT); wake a single > waiter. > > * WAITERS_BIT is set and waiters <= 1: reset WAITERS_BIT and wake > all waiters. In non-racy cases only a single waiter will be woken. > > * WAITERS_BIT is unset: don't wake anybody. Even if there are some > waiters, another sem_post (that reset the WAITERS_BIT) is > responsible for waking them. The third case here is the one we have a problem with: it's possible that the "another sem_post" responsible for waking them can't do so because the object is unmapped before it gets a chance to. I don't see any harm in breaking it down into 2 case: 3a. WAITERS_BIT is unset and waiter count is zero: don't wake anybody. 3b. WAITERS_BIT is unset and waiter count is nonzero: since newly arriving waiters always set the waiters bit, this can only happen if another sem_post cleared the waiters bit after the waiter arrived. That sem_post should perform a broadcast wake as part of clearing the waiters bit, so either it has not yet made forward progress to the wake, or the wake was sent but the waiter has not yet woken to decrement itself from the waiter count and attempt to decrement the semaphore value. Re-issuing the broadcase wake will ensure, in the former case, that the wake actually gets sent, and that it happens before before sem_post returns. I think that solves the problem. The only harm here is that, for patterns like repeatedly calling sem_post to increment the semaphore value by more than 1, it's quite plausible that a waiter has not yet taken itself off the waiter count, and that two or more wake syscalls will be made where one would have sufficed. This happened prior to commit 159d1f6c02569091c7a48bdb2e2e824b844a1902 anyway, where the old condition to wake was: if (val<0 || waiters) __wake(sem->__val, 1, priv); so I don't think we need to be concerned about it. We could *probably* get by with skipping the waiters>0 condition for non-pshared semaphores: if (val<0 || (!priv && waiters)) __wake(sem->__val, 1, priv); however, I think there's a variant of the bug reported here in regards to AS-safety, where a sem_post called from a signal handler that executes between the atomic and the futex wake won't perform another wake, even though it may need to in order to trigger forward progress. Even with the extra wake, there's still an AS-safety issue of sorts where it's possible for the post to happen (value to be updated) without a wake, if a long-running signal handler interrupts sem_post. However, aside from the missed-wake in the case of recursive sem_post, the situation of a signal handler interrupting sem_post doesn't seem to be a significant behavioral difference: it's hard to construct cases where you notice the posted value without sem_getvalue, which has some allowance to be racey. I don't see a way to fully fix this without FUTEX_WAKE_OP having a CAS operation (which it sadly omitted) or just blocking signals in the cases where we're going to issue a wake (which I guess may be an option, since it's already making a syscall anyway). Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] Memory lock needed for semaphores? 2024-08-04 22:40 ` Rich Felker @ 2024-08-07 7:15 ` Markus Wichmann 2024-08-10 3:02 ` Rich Felker 2024-08-10 20:47 ` Rich Felker 1 sibling, 1 reply; 7+ messages in thread From: Markus Wichmann @ 2024-08-07 7:15 UTC (permalink / raw) To: musl; +Cc: Alexey Izbyshev Hi all, an idea I had in the shower just now: Why don't we set the number of threads to awaken equal to the new semaphore value? I mean, once we determine there may be waiters and we have to wake some of them, isn't that the natural number of threads to wake up? The thing is, my example doesn't stop at two threads. There can be any number of threads stuck between the CAS and the wake in sem_post(). And yet, it might be only one sem_post() that gets to make a wake call, so that one call has to wake up all threads that might get the semaphore. So if there are 100 waiters, and we just set the semaphore value to 3 (OK, INT_MIN + 3), then we wake up 3 threads. Because we can get in this state only if there are another 2 threads that haven't awoken any of the waiters, or the waiters haven't decremented the semaphore yet. Of course, there might be another thread just chomping at the bit to wake up another 2 threads, but that thread also might not get to perform the wake at all. I know that currently we only set the val argument to __futexwake to 1 or -1, but nothing in the interface says we have to. Ciao, Markus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] Memory lock needed for semaphores? 2024-08-07 7:15 ` Markus Wichmann @ 2024-08-10 3:02 ` Rich Felker 0 siblings, 0 replies; 7+ messages in thread From: Rich Felker @ 2024-08-10 3:02 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl, Alexey Izbyshev On Wed, Aug 07, 2024 at 09:15:55AM +0200, Markus Wichmann wrote: > Hi all, > > an idea I had in the shower just now: Why don't we set the number of > threads to awaken equal to the new semaphore value? I mean, once we > determine there may be waiters and we have to wake some of them, isn't > that the natural number of threads to wake up? If you do that, then performing N posts in a row will yield N*(N+1)/2 wakes, when you only wanted N wakes. Imagine a producer-consumer setup with 100 consumers and a producer posting 10 new things to be consumed. You get a thundering herd of waiters waking up vying for significantly more slots than are actually available. > The thing is, my example doesn't stop at two threads. There can be any > number of threads stuck between the CAS and the wake in sem_post(). And > yet, it might be only one sem_post() that gets to make a wake call, so > that one call has to wake up all threads that might get the semaphore. At least in the trivial extension of your example case to N threads stuck posting, the unmapping is invalid and the whole example breaks down. This is because the only way the unmapping is valid is by the unmapping thread being able to know it's the last user of the mapped semaphore. If you have other concurrent post operations, then you haven't synchronized being the last user, and thus can't unmap. In your original example, the unmapping thread synchronized knowing it was the last user by virtue of its own wait succeeding, as a result of post modifying the semaphore value. You can probably modify the example by waiting N times rather than just 1 time, so that you've consumed all N posts to conclude that no additional users of the semaphore exist. But then the semaphore value is back at 0, and with the proposed changes we discussed, you would issue a futex wake correctly when posting it again. > So if there are 100 waiters, and we just set the semaphore value to 3 > (OK, INT_MIN + 3), then we wake up 3 threads. Because we can get in this > state only if there are another 2 threads that haven't awoken any of the > waiters, or the waiters haven't decremented the semaphore yet. Of > course, there might be another thread just chomping at the bit to wake > up another 2 threads, but that thread also might not get to perform the > wake at all. > > I know that currently we only set the val argument to __futexwake to 1 > or -1, but nothing in the interface says we have to. Indeed, it just doesn't seem to be useful to set it to a different value. Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] Memory lock needed for semaphores? 2024-08-04 22:40 ` Rich Felker 2024-08-07 7:15 ` Markus Wichmann @ 2024-08-10 20:47 ` Rich Felker 2024-08-11 4:40 ` Markus Wichmann 1 sibling, 1 reply; 7+ messages in thread From: Rich Felker @ 2024-08-10 20:47 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl, Alexey Izbyshev [-- Attachment #1: Type: text/plain, Size: 5958 bytes --] On Sun, Aug 04, 2024 at 06:40:24PM -0400, Rich Felker wrote: > On Sun, Aug 04, 2024 at 09:31:10AM -0400, Rich Felker wrote: > > On Sun, Aug 04, 2024 at 09:29:03AM +0200, Markus Wichmann wrote: > > > Hi all, > > > > > > in the light of the recent thread about barriers I looked at the > > > semaphore implementation under the light of the sudden unmap issue > > > again. To recap, the issue is that multiple threads may interact with > > > the same pshared semaphore, and in particular one thread might unmap a > > > semaphore that another thread is currently still interacting with. > > > > > > Is that an issue with semaphores? I'm thinking of the following > > > scenario: > > > > > > Start: One thread of another process is blocked at the semaphore > > > (sem->__val = {INT_MIN, 1, 0}). > > > > > > Thread 1: Calls sem_post(), gets until just after the a_cas() before > > > being preempted (sem->__val = {1, 1, 0}, but this thread saw val < 0 and > > > waiters <= 1, so broadcast wake will issue once thread resumes). > > > > > > Thread 2: Calls sem_trywait(), sets sem->__val = {0, 1, 0}. > > > Thread 2: Calls sem_post(), sets sem->__val = {1, 1, 0}, does not wake. > > > Thread 2: Unmaps the semaphore. > > > Thread 1: Resumes. SYS_futex returns EFAULT. Other process remains > > > sleeping. > > > > > > Am I missing something that makes this impossible? If not, do we maybe > > > need a vmlock for posting pshared semaphores? That would probably be the > > > easiest fix. > > > > This is not possible and would be a misuse of it even if it were. > > sem_post is required to be AS-safe, so it cannot take any sort of > > lock. > > > > I think the issue you report is real, but provided that's the case, > > it's a flaw in the logic to be relying on the first sem_post to > > perform the wake the second logically needs to perform. I haven't yet > > looked into what the cost of leaving the waiters flag set would be. > > > > There are kinda 3 approaches I can see exploring to fix this, and I'm > > not sure which are low-cost or even valid yet: > > > > 1. Have post only clear the waiters flag if the count is <1 (not <=1) > > 2. Have trywait re-add a waiters flag if it sees waiters>0 > > 3. Have post perform the wake if it saw waiters>0 even if waiters bit > > was not set. > > > > I'll look at this more later. I've CC'd Alexey Izbyshev who introduced > > the current waiter accounting. > > >From the original thread: > > https://www.openwall.com/lists/musl/2022/11/22/3 > > On Tue, Nov 22, 2022 at 08:41:44AM +0300, Alexey Izbyshev wrote: > > ... > > * WAITERS_BIT is set and waiters > 1: don't reset WAITERS_BIT since > > it's likely that some waiters will remain (and it's always fine to > > err on the side of preserving the WAITERS_BIT); wake a single > > waiter. > > > > * WAITERS_BIT is set and waiters <= 1: reset WAITERS_BIT and wake > > all waiters. In non-racy cases only a single waiter will be woken. > > > > * WAITERS_BIT is unset: don't wake anybody. Even if there are some > > waiters, another sem_post (that reset the WAITERS_BIT) is > > responsible for waking them. > > The third case here is the one we have a problem with: it's possible > that the "another sem_post" responsible for waking them can't do so > because the object is unmapped before it gets a chance to. > > I don't see any harm in breaking it down into 2 case: > > 3a. WAITERS_BIT is unset and waiter count is zero: don't wake anybody. > > 3b. WAITERS_BIT is unset and waiter count is nonzero: since newly > arriving waiters always set the waiters bit, this can only happen if > another sem_post cleared the waiters bit after the waiter arrived. > That sem_post should perform a broadcast wake as part of clearing the > waiters bit, so either it has not yet made forward progress to the > wake, or the wake was sent but the waiter has not yet woken to > decrement itself from the waiter count and attempt to decrement the > semaphore value. Re-issuing the broadcase wake will ensure, in the > former case, that the wake actually gets sent, and that it happens > before before sem_post returns. I think that solves the problem. > > The only harm here is that, for patterns like repeatedly calling > sem_post to increment the semaphore value by more than 1, it's quite > plausible that a waiter has not yet taken itself off the waiter count, > and that two or more wake syscalls will be made where one would have > sufficed. This happened prior to commit > 159d1f6c02569091c7a48bdb2e2e824b844a1902 anyway, where the old > condition to wake was: > > if (val<0 || waiters) __wake(sem->__val, 1, priv); > > so I don't think we need to be concerned about it. > > We could *probably* get by with skipping the waiters>0 condition for > non-pshared semaphores: > > if (val<0 || (!priv && waiters)) __wake(sem->__val, 1, priv); > > however, I think there's a variant of the bug reported here in regards > to AS-safety, where a sem_post called from a signal handler that > executes between the atomic and the futex wake won't perform another > wake, even though it may need to in order to trigger forward progress. > > Even with the extra wake, there's still an AS-safety issue of sorts > where it's possible for the post to happen (value to be updated) > without a wake, if a long-running signal handler interrupts sem_post. > However, aside from the missed-wake in the case of recursive sem_post, > the situation of a signal handler interrupting sem_post doesn't seem > to be a significant behavioral difference: it's hard to construct > cases where you notice the posted value without sem_getvalue, which > has some allowance to be racey. I don't see a way to fully fix this > without FUTEX_WAKE_OP having a CAS operation (which it sadly omitted) > or just blocking signals in the cases where we're going to issue a > wake (which I guess may be an option, since it's already making a > syscall anyway). How does the attached look? Rich [-- Attachment #2: 0001-fix-lost-or-delayed-wakes-in-sem_post-under-certain-.patch --] [-- Type: text/plain, Size: 2103 bytes --] From 882aedf6a13891f887d20f6a4184a13e94793b84 Mon Sep 17 00:00:00 2001 From: Rich Felker <dalias@aerifal.cx> Date: Sat, 10 Aug 2024 16:30:28 -0400 Subject: [PATCH] fix lost or delayed wakes in sem_post under certain race conditions if sem_post is interrupted between clearing the waiters bit from the semaphore value and performing the futex wait operation, subsequent calls to sem_post will not perform a wake operation unless a new waiter has arrived. usually, this is at most a minor nuisance, since the original wake operation will eventually happen. however, it's possible that the wake is delayed indefinitely if interrupted by a signal handler, or that the address the wake needs to be performed on is no longer mapped if the semaphore was a process-shared one that has since been unmapped but has a waiter on a different mapping of the same semaphore. this can happen when another thread using the same mapping "steals the post" atomically before actually becoming a second waiter, deduces from success that it was the last user of the semaphore mapping, then re-posts and unmaps the semaphore mapping. this scenario was described in a report by Markus Wichmann. instead of checking only the waiters bit, also check the waiter count that was sampled before the atomic post operation, and perform the wake if it's nonzero. this will not produce any additional wakes under non-race conditions, since the waiters bit only becomes zero when targeting a single waiter for wake. checking both was already the behavior prior to commit 159d1f6c02569091c7a48bdb2e2e824b844a1902. --- src/thread/sem_post.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/thread/sem_post.c b/src/thread/sem_post.c index 5c2517f2..1c8f763c 100644 --- a/src/thread/sem_post.c +++ b/src/thread/sem_post.c @@ -16,6 +16,6 @@ int sem_post(sem_t *sem) if (waiters <= 1) new &= ~0x80000000; } while (a_cas(sem->__val, val, new) != val); - if (val<0) __wake(sem->__val, waiters>1 ? 1 : -1, priv); + if (val<0 || waiters) __wake(sem->__val, waiters>1 ? 1 : -1, priv); return 0; } -- 2.21.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [musl] Memory lock needed for semaphores? 2024-08-10 20:47 ` Rich Felker @ 2024-08-11 4:40 ` Markus Wichmann 0 siblings, 0 replies; 7+ messages in thread From: Markus Wichmann @ 2024-08-11 4:40 UTC (permalink / raw) To: musl; +Cc: Alexey Izbyshev Am Sat, Aug 10, 2024 at 04:47:20PM -0400 schrieb Rich Felker: > How does the attached look? > Looks good to me. I had imagined an extension of the scenario to more threads, but thinking about it further, I noticed that in that scenario the semaphore value is never more than 1. And in the scenario where it does get larger, the unmap is invalid, as you said in the other mail. So yeah, looks good to me. Ciao, Markus ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-11 4:40 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-08-04 7:29 [musl] Memory lock needed for semaphores? Markus Wichmann 2024-08-04 13:31 ` Rich Felker 2024-08-04 22:40 ` Rich Felker 2024-08-07 7:15 ` Markus Wichmann 2024-08-10 3:02 ` Rich Felker 2024-08-10 20:47 ` Rich Felker 2024-08-11 4:40 ` Markus Wichmann
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).