mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] sem_post() can miss waiters
@ 2022-11-21  5:14 Alexey Izbyshev
  2022-11-22  0:09 ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Izbyshev @ 2022-11-21  5:14 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4403 bytes --]

Hi,

While reading musl's POSIX semaphore implementation I realized that its 
accounting of waiters is incorrect (after commit 
88c4e720317845a8e01aee03f142ba82674cd23d) due to combination of the 
following factors:

* sem_post relies on a potentially stale waiters count when deciding 
whether to wake a waiter.
* Presence of waiters is not reflected in the semaphore value when the 
semaphore is unlocked.

Here is an example scenario with 4 threads:

* initial state
   - val = 1, waiters = 0
* T1
   - enters sem_post
   - reads val = 1, waiters = 0, is preempted
   - val = 1, waiters = 0
* T2
   - sem_wait
   - another sem_wait (blocks)
   - val = -1, waiters = 1
* T3
   - sem_wait (blocks)
   - val = -1, waiters = 2
* T4
   - sem_post (wakes T2)
   - val = 1, waiters = 2
* T1
   - continues in sem_post
   - a_cas(1, 2) succeeds
   - T3 is NOT woken because sem_post thinks there are no waiters
   - val = 2, waiters = 1
* T2
   - wakes and completes sem_wait
   - val = 1, waiters = 1

So in the end T3 remains blocked despite that the semaphore is unlocked.

Here is two ideas how this can be fixed without reintroducing the 
problem with accessing a potentially destroyed semaphore after it's 
unlocked.

* Idea 1
   - use a proper WAITERS_BIT in the semaphore state instead of just a 
single value and preserve it in sem_post by default
   - in sem_post, when we notice that waiters count is zero but 
WAITERS_BIT is set, reset WAITERS_BIT, but wake all potential waiters 
afterwards.

This ensures that any new waiters that could arrive after we read 
waiters count are taken into account (and they will proceed to lock the 
semaphore and potentially set WAITERS_BIT again after they are woken).

The sem_post body would look like this (the full patch is attached):

int val, new, priv = sem->__val[2], sync = 0;
do {
     val = sem->__val[0];
     waiters = sem->__val[1];
     if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) {
         errno = EOVERFLOW;
         return -1;
     }
     new = val + 1;
     if (!waiters)
         new &= ~0x80000000;
} while (a_cas(sem->__val, val, new) != val);
if (val<0) __wake(sem->__val, waiters ? 1 : -1, priv);

The main downside of this approach is that a FUTEX_WAKE system call will 
be performed each time the semaphore transitions from "maybe have 
waiters" to "no waiters" state, and in practice it will be useless in 
most cases.

* Idea 2
   - use a proper WAITERS_BIT as above
   - also add a new DETECT_NEW_WAITERS_BIT
   - in sem_post, when we notice that waiters count is zero, WAITERS_BIT 
is set and DETECT_NEW_WAITERS_BIT is not set, atomically transition into 
"waiters detection" state (DETECT_NEW_WAITERS_BIT is set, WAITERS_BIT is 
unset) *without unlocking the semaphore*. Then recheck waiters count 
again, and if it's still zero, try to atomically transition to "no 
waiters" state (both bits are unset) while also unlocking the semaphore. 
If this succeeds, we know that no new waiters arrived before we unlocked 
the semaphore and hence no wake up is needed. Regardless of whether we 
succeeded, we always leave "waiters detection" state before exiting the 
CAS loop.
   - in sem_post, wake a single waiter if at least one of WAITERS_BIT or 
DETECT_NEW_WAITERS_BIT is set. This ensures no waiters are missed in 
sem_post calls racing with the sem_post currently responsible for 
waiters detection.

The sem_post body would look like this (the full patch is attached):

int val, new, priv = sem->__val[2], sync = 0;
for (;;) {
     int cnt;
     val = sem->__val[0];
     cnt = val & SEM_VALUE_MAX;
     if (cnt < SEM_VALUE_MAX) {
         if (!sync && val < 0 && !(val & 0x40000000) && !sem->__val[1]) {
             new = cnt | 0x40000000;
             if (a_cas(sem->__val, val, new) != val)
                 continue;
             val = new;
             sync = 1;
         }
         new = val + 1;
     } else {
         new = val;
     }
     if (sync) {
         if (sem->__val[1])
             new |= 0x80000000;
         new &= ~0x40000000;
     }
     if (a_cas(sem->__val, val, new) == val)
         break;
}
if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) {
     errno = EOVERFLOW;
     return -1;
}
if (new < 0 || (new & 0x40000000)) __wake(sem->__val, 1, priv);

Compared to the first approach, there is just an extra a_cas instead of 
a system call, though we lose one bit in SEM_VALUE_MAX.

Hope this helps,
Alexey

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sem-post-detect-new-waiters.diff --]
[-- Type: text/x-diff; name=sem-post-detect-new-waiters.diff, Size: 3680 bytes --]

diff --git a/include/limits.h b/include/limits.h
index 53a27b9d..c4d3f42b 100644
--- a/include/limits.h
+++ b/include/limits.h
@@ -66,7 +66,7 @@
 #define PTHREAD_KEYS_MAX 128
 #define PTHREAD_STACK_MIN 2048
 #define PTHREAD_DESTRUCTOR_ITERATIONS 4
-#define SEM_VALUE_MAX 0x7fffffff
+#define SEM_VALUE_MAX 0x3fffffff
 #define SEM_NSEMS_MAX 256
 #define DELAYTIMER_MAX 0x7fffffff
 #define MQ_PRIO_MAX 32768
diff --git a/src/thread/sem_getvalue.c b/src/thread/sem_getvalue.c
index d9d83071..c0b7762d 100644
--- a/src/thread/sem_getvalue.c
+++ b/src/thread/sem_getvalue.c
@@ -1,8 +1,9 @@
 #include <semaphore.h>
+#include <limits.h>
 
 int sem_getvalue(sem_t *restrict sem, int *restrict valp)
 {
 	int val = sem->__val[0];
-	*valp = val < 0 ? 0 : val;
+	*valp = val & SEM_VALUE_MAX;
 	return 0;
 }
diff --git a/src/thread/sem_post.c b/src/thread/sem_post.c
index 31e3293d..5051ec97 100644
--- a/src/thread/sem_post.c
+++ b/src/thread/sem_post.c
@@ -1,17 +1,38 @@
 #include <semaphore.h>
+#include <limits.h>
 #include "pthread_impl.h"
 
 int sem_post(sem_t *sem)
 {
-	int val, waiters, priv = sem->__val[2];
-	do {
+	int val, new, priv = sem->__val[2], sync = 0;
+	for (;;) {
+		int cnt;
 		val = sem->__val[0];
-		waiters = sem->__val[1];
-		if (val == SEM_VALUE_MAX) {
-			errno = EOVERFLOW;
-			return -1;
+		cnt = val & SEM_VALUE_MAX;
+		if (cnt < SEM_VALUE_MAX) {
+			if (!sync && val < 0 && !(val & 0x40000000) && !sem->__val[1]) {
+				new = cnt | 0x40000000;
+				if (a_cas(sem->__val, val, new) != val)
+					continue;
+				val = new;
+				sync = 1;
+			}
+			new = val + 1;
+		} else {
+			new = val;
 		}
-	} while (a_cas(sem->__val, val, val+1+(val<0)) != val);
-	if (val<0 || waiters) __wake(sem->__val, 1, priv);
+		if (sync) {
+			if (sem->__val[1])
+				new |= 0x80000000;
+			new &= ~0x40000000;
+		}
+		if (a_cas(sem->__val, val, new) == val)
+			break;
+	}
+	if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) {
+		errno = EOVERFLOW;
+		return -1;
+	}
+	if (new < 0 || (new & 0x40000000)) __wake(sem->__val, 1, priv);
 	return 0;
 }
diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
index 58d3ebfe..51f6a474 100644
--- a/src/thread/sem_timedwait.c
+++ b/src/thread/sem_timedwait.c
@@ -1,4 +1,5 @@
 #include <semaphore.h>
+#include <limits.h>
 #include "pthread_impl.h"
 
 static void cleanup(void *p)
@@ -13,14 +14,18 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
 	if (!sem_trywait(sem)) return 0;
 
 	int spins = 100;
-	while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin();
+	while (spins-- && !(sem->__val[0] & SEM_VALUE_MAX) && !sem->__val[1])
+		a_spin();
 
 	while (sem_trywait(sem)) {
-		int r;
+		int r, t, val = sem->__val[0];
+		if (val & SEM_VALUE_MAX)
+			continue;
 		a_inc(sem->__val+1);
-		a_cas(sem->__val, 0, -1);
+		t = val | 0x80000000;
+		a_cas(sem->__val, val, t);
 		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
-		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
+		r = __timedwait_cp(sem->__val, t, CLOCK_REALTIME, at, sem->__val[2]);
 		pthread_cleanup_pop(1);
 		if (r) {
 			errno = r;
diff --git a/src/thread/sem_trywait.c b/src/thread/sem_trywait.c
index 04edf46b..beb435da 100644
--- a/src/thread/sem_trywait.c
+++ b/src/thread/sem_trywait.c
@@ -1,12 +1,12 @@
 #include <semaphore.h>
+#include <limits.h>
 #include "pthread_impl.h"
 
 int sem_trywait(sem_t *sem)
 {
 	int val;
-	while ((val=sem->__val[0]) > 0) {
-		int new = val-1-(val==1 && sem->__val[1]);
-		if (a_cas(sem->__val, val, new)==val) return 0;
+	while ((val=sem->__val[0]) & SEM_VALUE_MAX) {
+		if (a_cas(sem->__val, val, val-1)==val) return 0;
 	}
 	errno = EAGAIN;
 	return -1;

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: sem-post-wake-all.diff --]
[-- Type: text/x-diff; name=sem-post-wake-all.diff, Size: 2795 bytes --]

diff --git a/src/thread/sem_getvalue.c b/src/thread/sem_getvalue.c
index d9d83071..c0b7762d 100644
--- a/src/thread/sem_getvalue.c
+++ b/src/thread/sem_getvalue.c
@@ -1,8 +1,9 @@
 #include <semaphore.h>
+#include <limits.h>
 
 int sem_getvalue(sem_t *restrict sem, int *restrict valp)
 {
 	int val = sem->__val[0];
-	*valp = val < 0 ? 0 : val;
+	*valp = val & SEM_VALUE_MAX;
 	return 0;
 }
diff --git a/src/thread/sem_post.c b/src/thread/sem_post.c
index 31e3293d..ca8d663c 100644
--- a/src/thread/sem_post.c
+++ b/src/thread/sem_post.c
@@ -1,17 +1,21 @@
 #include <semaphore.h>
+#include <limits.h>
 #include "pthread_impl.h"
 
 int sem_post(sem_t *sem)
 {
-	int val, waiters, priv = sem->__val[2];
+	int val, new, waiters, priv = sem->__val[2];
 	do {
 		val = sem->__val[0];
 		waiters = sem->__val[1];
-		if (val == SEM_VALUE_MAX) {
+		if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) {
 			errno = EOVERFLOW;
 			return -1;
 		}
-	} while (a_cas(sem->__val, val, val+1+(val<0)) != val);
-	if (val<0 || waiters) __wake(sem->__val, 1, priv);
+		new = val + 1;
+		if (!waiters)
+			new &= ~0x80000000;
+	} while (a_cas(sem->__val, val, new) != val);
+	if (val<0) __wake(sem->__val, waiters ? 1 : -1, priv);
 	return 0;
 }
diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
index 58d3ebfe..aa67376c 100644
--- a/src/thread/sem_timedwait.c
+++ b/src/thread/sem_timedwait.c
@@ -1,4 +1,5 @@
 #include <semaphore.h>
+#include <limits.h>
 #include "pthread_impl.h"
 
 static void cleanup(void *p)
@@ -13,14 +14,15 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
 	if (!sem_trywait(sem)) return 0;
 
 	int spins = 100;
-	while (spins-- && sem->__val[0] <= 0 && !sem->__val[1]) a_spin();
+	while (spins-- && !(sem->__val[0] & SEM_VALUE_MAX) && !sem->__val[1])
+		a_spin();
 
 	while (sem_trywait(sem)) {
-		int r;
+		int r, priv = sem->__val[2];
 		a_inc(sem->__val+1);
-		a_cas(sem->__val, 0, -1);
+		a_cas(sem->__val, 0, 0x80000000);
 		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
-		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
+		r = __timedwait_cp(sem->__val, 0x80000000, CLOCK_REALTIME, at, priv);
 		pthread_cleanup_pop(1);
 		if (r) {
 			errno = r;
diff --git a/src/thread/sem_trywait.c b/src/thread/sem_trywait.c
index 04edf46b..beb435da 100644
--- a/src/thread/sem_trywait.c
+++ b/src/thread/sem_trywait.c
@@ -1,12 +1,12 @@
 #include <semaphore.h>
+#include <limits.h>
 #include "pthread_impl.h"
 
 int sem_trywait(sem_t *sem)
 {
 	int val;
-	while ((val=sem->__val[0]) > 0) {
-		int new = val-1-(val==1 && sem->__val[1]);
-		if (a_cas(sem->__val, val, new)==val) return 0;
+	while ((val=sem->__val[0]) & SEM_VALUE_MAX) {
+		if (a_cas(sem->__val, val, val-1)==val) return 0;
 	}
 	errno = EAGAIN;
 	return -1;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-22 18:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21  5:14 [musl] sem_post() can miss waiters Alexey Izbyshev
2022-11-22  0:09 ` Rich Felker
2022-11-22  5:41   ` A. Wilcox
2022-11-22  5:41   ` Alexey Izbyshev
2022-12-14  1:48     ` Rich Felker
2022-12-14 10:29       ` Alexey Izbyshev
2023-02-22 18:12         ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).