* sem_getvalue conformance considerations @ 2014-08-27 2:33 Rich Felker 2014-08-27 7:05 ` Jens Gustedt 0 siblings, 1 reply; 19+ messages in thread From: Rich Felker @ 2014-08-27 2:33 UTC (permalink / raw) To: musl POSIX says: "The sem_getvalue() function shall update the location referenced by the sval argument to have the value of the semaphore referenced by sem without affecting the state of the semaphore. The updated value represents an actual semaphore value that occurred at some unspecified time during the call, but it need not be the actual value of the semaphore when it is returned to the calling process." musl currently implements this as: int val = sem->__val[0]; *valp = val < 0 ? 0 : val; However, I don't think this is correct. The "semaphore value" read from __val[0] is not the formal semaphore value, but an implementation detail: while there are waiters, after a post there is a window where __val[0] is positive (to allow a waiter to take the semaphore) despite the formal value never becoming positive. sem_post is documented as: "If the semaphore value resulting from this operation is positive, then no threads were blocked waiting for the semaphore to become unlocked; the semaphore value is simply incremented." "If the value of the semaphore resulting from this operation is zero, then one of the threads blocked waiting for the semaphore shall be allowed to return successfully from its call to sem_wait()." So the formal "semaphore value" should always be zero while there are more waiters than posts, even if those waiters have not yet woken to take the semaphore. It seems at least semi-possible to observe incorrect behavior with the current implementation: Example 1: Initially two waiters. One call to sem_post followed by sem_getvalue should read a value of 0, not 1. In principle we could compensate for this by declaring the value zero if there are any waiters. But then: Example 2: Initially one waiter. Two calls to sem_post followed by sem_getvalue should read a value of 1, not 0. What if we try to get fancy and subtract waiters from __val[0]? Unfortunately we can't necessarily read __val[0] and waiters (__val[1]) atomically together, so it's possible that one is outdated by the time we read the other, such that the resulting difference is not the correct formal semaphore value at any time during the sem_getvalue call. The one easy cop-out: We could claim the current behavior is conforming anyway, since there is no way to prove that a thread is actually blocked on a semaphore. (This is in contrast to cond vars, where having unlocked the mutex proves they've formally become waiters.) The state "blocked on semaphore" is fundamentally indistinguishable from the state of "about to enter sem_wait". Thus, we could say that, formally, there are never any waiters; sem_post always makes the semaphore value positive, and a would-be waiter formally arrives and decrements the semaphore value at some time after the post makes the value positive. Of course this is a really pathetic solution, but is there any way we can provide real, consistent, meaningful results from sem_getvalue? Or is the whole concept of sem_getvalue just so messed up that this is the best possible way to handle it? Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-27 2:33 sem_getvalue conformance considerations Rich Felker @ 2014-08-27 7:05 ` Jens Gustedt 2014-08-27 7:43 ` Rich Felker 0 siblings, 1 reply; 19+ messages in thread From: Jens Gustedt @ 2014-08-27 7:05 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1822 bytes --] Am Dienstag, den 26.08.2014, 22:33 -0400 schrieb Rich Felker: > What if we try to get fancy and subtract waiters from __val[0]? > Unfortunately we can't necessarily read __val[0] and waiters > (__val[1]) atomically together, Doing the correct thing is always fancy :) Sure that this depends on the architecture, but where this is possible we should just do that, this is the semantically correct value. On i386 and follow ups 64bit atomic read should always be possible, and if I remember correctly the arm arch that I touched once had such a thing, too. For once the gcc extensions for builtins have easy to use feature test macros. If you'd have the sem_t changed to a union with alternative __vall long's, something along the line #if defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) && defined(__ATOMIC_ACQUIRE) uint64_t combined = __atomic_load_n(&s->__vall[0], __ATOMIC_ACQUIRE); #else ... fallback ... #endif > so it's possible that one is outdated > by the time we read the other, such that the resulting difference is > not the correct formal semaphore value at any time during the > sem_getvalue call. On arch where atomic read of these two values together is not possible, this is the best approximation that you can get. On these archs there is simply no precise moment in time for that feature because the sequence points are not synchronized between the different threads. Nobody can ask you to return an exact value for a concept that is not well defined. Jens -- :: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-27 7:05 ` Jens Gustedt @ 2014-08-27 7:43 ` Rich Felker 2014-08-27 10:43 ` Alexander Monakov 0 siblings, 1 reply; 19+ messages in thread From: Rich Felker @ 2014-08-27 7:43 UTC (permalink / raw) To: musl On Wed, Aug 27, 2014 at 09:05:41AM +0200, Jens Gustedt wrote: > Am Dienstag, den 26.08.2014, 22:33 -0400 schrieb Rich Felker: > > What if we try to get fancy and subtract waiters from __val[0]? > > Unfortunately we can't necessarily read __val[0] and waiters > > (__val[1]) atomically together, > > Doing the correct thing is always fancy :) > Sure that this depends on the architecture, but where this is possible > we should just do that, this is the semantically correct value. > > On i386 and follow ups 64bit atomic read should always be possible, > and if I remember correctly the arm arch that I touched once had such > a thing, too. Yes, I'm aware that 64-bit atomic read may exist on some archs (note: this does not include i386; 8-byte atomic read was not possible until at least i586 generation and our "i386" baseline is really "i486", the first model with cmpxchg, which is mandatory for working pthread primitives), but as one of musl's big general principles is providing uniform behavior across archs, I'd rather not implement something where the behavior is going to differ like that based on a feature. > > so it's possible that one is outdated > > by the time we read the other, such that the resulting difference is > > not the correct formal semaphore value at any time during the > > sem_getvalue call. > > On arch where atomic read of these two values together is not > possible, this is the best approximation that you can get. On these > archs there is simply no precise moment in time for that feature > because the sequence points are not synchronized between the different > threads. Nobody can ask you to return an exact value for a concept > that is not well defined. I'm not entirely convinced there's not a solution. There may be sufficient information to determine whether or not there are waiters without a 64-bit atomic read. Let V be the implementation semaphore value (__val[0]) and W the waiter count (__val[1]). After observing a nonzero V, W cannot increase without V first reaching zero. So if we read V first, then W, the value of W read will be less than or equal to the value of W at the time V was read. This seems to be sufficient for the semantics I thought were right. However, I'm doubtful of them too. :-) Even if we know the number of waiters exactly at the time the value is read, that's not sufficient to assign a formal value to the semaphore, because these waiters could race to return EINTR or ETIMEDOUT, or act upon cancellation, before they consume the post. In this case, sem_getvalue would have reported an observably incorrect value: Example: Initially 2 waiters, posting thread posts 3 times, calls sem_getvalue and sees a value of 1, calls pthread_cancel on both waiters, then calls sem_getvalue again and sees a value of 3, despite no additional posts having happened. The only easy way around this problem is the current behavior: having sem_getvalue treat waiters as not-having-arrived-yet. The other solution I see, which would allow sem_getvalue to report waiters, would be to ensure that waiters always do a final sem_trywait after observing an error, and ignore the error if the trywait succeeds. However doing this with cancellation is not easy; it would require a longjmp, which would require adding setjmp overhead to each sem_wait. Of course if __timedwait could return ECANCELED rather than invoking cancellation handlers, that would make things a lot nicer, and it's something I've wanted to be able to do for a long time, so perhaps we can revisit this issue once that's implemented... :) Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-27 7:43 ` Rich Felker @ 2014-08-27 10:43 ` Alexander Monakov 2014-08-27 13:32 ` Alexander Monakov 0 siblings, 1 reply; 19+ messages in thread From: Alexander Monakov @ 2014-08-27 10:43 UTC (permalink / raw) To: musl Why wouldn't the following design work? val[0] is sem value if >= 0, negated waiters count otherwise val[1] is wakeup count, incremented before futex wake, tested and decremented by waiters returning from futex wait Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-27 10:43 ` Alexander Monakov @ 2014-08-27 13:32 ` Alexander Monakov 2014-08-27 19:06 ` Alexander Monakov 0 siblings, 1 reply; 19+ messages in thread From: Alexander Monakov @ 2014-08-27 13:32 UTC (permalink / raw) To: musl On Wed, 27 Aug 2014, Alexander Monakov wrote: > Why wouldn't the following design work? > > val[0] is sem value if >= 0, negated waiters count otherwise > val[1] is wakeup count, incremented before futex wake, tested and decremented > by waiters returning from futex wait Unless I'm missing something, the above can simplify sem ops (sorry, eliding some details in the following pseudocode) trywait: val = sem->val[0] while (val > 0) { oldval = val; if ((val = a_cas(sem->val, val, val-1)) == oldval) return 0; } errno = EAGAIN; return -1; wait: if (atomic_fetch_and_decrement(sem->val) > 0) return 0; while (!(futex_wait(sem->val+1, 0) && errno == EINTR)) { wakecnt = sem->val[1]; while (wakecnt > 0) { oldwcnt = wakecnt; if ((wakecnt = a_cas(sem->val+1, wakecnt, wakecnt-1)) == oldwcnt) return 0; } } return -1; post: val = sem->val[0]; do { if (val == SEM_VALUE_MAX) { errno = EOVERFLOW; return -1; } oldval = val; } while ((val = a_cas(sem->val, val, val+1)) != oldval); if (val < 0) { a_inc(sem->val+1); futex_wake(sem->val+1, 1); } return 0; Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-27 13:32 ` Alexander Monakov @ 2014-08-27 19:06 ` Alexander Monakov 2014-08-27 21:06 ` Alexander Monakov 0 siblings, 1 reply; 19+ messages in thread From: Alexander Monakov @ 2014-08-27 19:06 UTC (permalink / raw) To: musl On IRC Rich noted that I've too conveniently elided cancellation, so here's how I think cancellation handler should look like: val = sem->val[0]; while (val < 0) { /* We believe we are a waiter that no sem_post has "seen". */ oldval = val; if ((val = a_cas(sem->val, val, val+1)) == oldval) return; } /* We are a waiter, and a non-negative val[0] indicates that one sem_post * noticed us. Consume the corresponding wake count increment. */ a_dec(sem->val+1); Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-27 19:06 ` Alexander Monakov @ 2014-08-27 21:06 ` Alexander Monakov 2014-08-28 20:47 ` Alexander Monakov 0 siblings, 1 reply; 19+ messages in thread From: Alexander Monakov @ 2014-08-27 21:06 UTC (permalink / raw) To: musl On Wed, 27 Aug 2014, Alexander Monakov wrote: > On IRC Rich noted that I've too conveniently elided cancellation, so here's how > I think cancellation handler should look like: s/cancellation handler/"unwait" procedure/: returning with EINTR is botched in my earlier mail and should employ the same procedure to undo state change > val = sem->val[0]; > while (val < 0) { > /* We believe we are a waiter that no sem_post has "seen". */ > oldval = val; > if ((val = a_cas(sem->val, val, val+1)) == oldval) > return; > } > /* We are a waiter, and a non-negative val[0] indicates that one sem_post > * noticed us. Consume the corresponding wake count increment. */ > a_dec(sem->val+1); A plain atomic decrement on wake count like above is not correct. However: 1. I think it's correct as long as no new waiters appear while we are performing unwait. If new waiters are matched by new posts it's ok; we only care if the amount of new waiters is higher than the amount of new posts, as then we are risking to cause a missing wake for one of those, or setting wake count to -1 if they all consume wakes before we do the increment. 2. Thus it needs a fancy atomic op: decrement val[1] if val[0] is still equal to "val" we retrieved earlier. Otherwise, retry from the beginning. Futhermore, doing the above in a true atomic fashion might not be required! Isn't it okay to decrement wake count, and if we observe new waiters, increment it back and cause a futex wake? Thus: retry: val = sem->val[0]; while (val < 0) { /* We believe we are a waiter that no sem_post has "seen". */ oldval = val; if ((val = a_cas(sem->val, val, val+1)) == oldval) return; } /* We are a waiter, and a non-negative val[0] indicates that one sem_post * noticed us. Consume the corresponding wake count increment. */ a_dec(sem->val+1); if (val > sem->val[0]) { /* New waiters appeared. Avoid causing a missing wake and restart. We * don't enter here if more new posts are posted than new waiters come. */ a_inc(sem->val+1); futex_wake(sem->val+1, 1); goto retry; } Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-27 21:06 ` Alexander Monakov @ 2014-08-28 20:47 ` Alexander Monakov 2014-08-29 22:51 ` Alexander Monakov 0 siblings, 1 reply; 19+ messages in thread From: Alexander Monakov @ 2014-08-28 20:47 UTC (permalink / raw) To: musl Thanks (again) to Rich for showing on IRC that that still was not correct, with at least two issues: 1. with one waiter, unwait causes an in-progress post to be lost (perhaps workaroundable by consuming the post from unwait); 2. it is invalid to update val[1] non-atomically with val[0] in sem_post, as the semaphore may be destroyed in between by a waiter that noticed the post via unwait Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-28 20:47 ` Alexander Monakov @ 2014-08-29 22:51 ` Alexander Monakov 2014-08-30 5:12 ` Rich Felker 2014-09-01 17:50 ` Alexander Monakov 0 siblings, 2 replies; 19+ messages in thread From: Alexander Monakov @ 2014-08-29 22:51 UTC (permalink / raw) To: musl Hi, To begin, I have a few questions regarding cancellation (please excuse deviating from topic for a while): 1. It looks non-conforming not to act on pending cancellation in uncontended sem_wait (or any cancellation point that does not invoke a syscall, for that matter; nsz pointed out that many instances should already be handled, but pthread_join for instance seems to miss explicit timedwait). A bug? 2. I had trouble understanding cancel_handler. What is re-raising SIGCANCEL at the end for? 3. My understanding is that for deferred cancellation, ideally you'd want SIGCANCEL to be delivered only as interruptions to syscalls --- that would eliminate some scaffolding around syscall_cp including IP range check in cancel_handler, correct? 4. If the above could be provided, it would be possible to guarantee that pthread cancellation cleanup is executing in a signal handler context only for asynchronous cancellation. I'd like that for my sem unwait. Too bad? ====== Trying to approach a formal description of my proposed sem ops implementation, here's what I can do now: 1. Sem ops begin execution by performing an atomic read or read-modify-write on val[0]. The sequence of atomic accesses establishes an order between threads doing the ops (with respect to a given semaphore). 2. Returning from futex wait with timeout or interrupt also accesses val[0] and is included in that order 3. The following invariant is maintained: val[0] is equal to the semaphore value minus the number of current waiters (as given by the linear sequence of events established above) 4. sem_post begins by performing an atomic increment on val[0]; if the previous value was negative, there was at least one waiter, which is now discounted from val[0]; we must arrange that one waiter eventually wakes up and proceeds 4.1. Note: when sem_post increments negative val[0], we need that one waiter returns successfully from sem_wait; if they were to exit exceptionally (eintr, timeout, cancel), they would need to increment val[0] yet again; but this will allow other threads to observe incrementing sem_getvalue despite no new posts, as Rich noted in his email 4.2. We arrange that exactly one waiter proceeds by atomically incrementing val[1] and futex-waking it; since we are doing that non-atomically with val[0] update, we must guarantee that no waiters can proceed to destroy the semaphore between val[0] and val[1] updates in sem_post 5. sem_wait begins by atomically decrementing val[0]; if the previous value was positive, we have simply successfully decremented the semaphore; otherwise, we have become a waiter and must suspend 5.1. sem_wait futex-waits on val[1]; on normal wakeups, it tries to atomically-decrement-if-positive val[1] and leaves if successful FIXME: we are reading from val[1] even though val[0] was increased long ago and semaphore may be now positive; is there a justification for that, why can't other threads assume the semaphore is free to be destroyed? 5.2. On eintr/timeout/cancel, a waiter must decide if it may just discount itself as a waiter, or must deal with a racing post; in case there is a racing post, the waiter must either consume it and return normally, or discount itself as a waiter nevertheless and ensure that rather than causing a wake the post increases semaphore value beyond zero It seems there are two ways of doing it. We either prefer to consume a pending post if available (a), or to propagate eintr/cancel (b). 5.2.a. Begin by incrementing-if-negative val[0]. If successful, we have discounted ourselves as a waiter, so we don't need to care about concurrent posts (there are fewer pending posts than waiters) and may propagate eintr/timeout/cancel. If unsuccessful, there is a pending post. Consume it and cause normal return from sem_wait (as if interruption did not happen). Can't easily cause normal return from sem_wait if acting upon cancellation. 5.2.b. Begin by incrementing val[0] unconditionally. If it was negative, we have discounted ourselves as a waiter; otherwise, there is a pending post, and our increment is making it look as if the post is increasing an uncontended semaphore (out of sequence, so other threads may observe oddly increasing value); however there is a pending increase of val[1] and we must undo it. Wait for val[1] to become positive by futex-waiting and then decrement it. This looks more hairy, causes non-conforming behavior, but easier to execute from a signal handler context for cancellation. Does that help? Thoughts? Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-29 22:51 ` Alexander Monakov @ 2014-08-30 5:12 ` Rich Felker 2014-09-01 17:50 ` Alexander Monakov 1 sibling, 0 replies; 19+ messages in thread From: Rich Felker @ 2014-08-30 5:12 UTC (permalink / raw) To: musl On Sat, Aug 30, 2014 at 02:51:35AM +0400, Alexander Monakov wrote: > Hi, > > To begin, I have a few questions regarding cancellation (please excuse > deviating from topic for a while): > > 1. It looks non-conforming not to act on pending cancellation in uncontended > sem_wait (or any cancellation point that does not invoke a syscall, for that > matter; nsz pointed out that many instances should already be handled, but > pthread_join for instance seems to miss explicit timedwait). A bug? Probably so. > 2. I had trouble understanding cancel_handler. What is re-raising > SIGCANCEL at the end for? It's for the case where the signal arrives during a signal handler that does not itself contain cancellation points, but which has interrupted a cancellation point. In this case, we want cancellation to re-trigger after the signal handler returns. This is achieved by re-raising the signal, but blocking the signal in the ucontext that cancel_handler is about to return to (if it weren't blocked there, re-raising it would make an infinite loop). When the next-level-up signal handler returns, it unblocks the signal again and it gets delivered, and cancel_handler runs again and can evaluate whether it's at a cancellation point. > 3. My understanding is that for deferred cancellation, ideally you'd want > SIGCANCEL to be delivered only as interruptions to syscalls --- that would > eliminate some scaffolding around syscall_cp including IP range check in > cancel_handler, correct? No, that's impossible to eliminate. Any solution without it has a race condition where cancellation that arrives after the check-before-syscall but before actually enterring the syscall gets lost. And that's even assuming an interrupting signal can be used for cancellation, which is false. An interrupting signal would also interrupt blocking syscalls which are not cancellation points, thereby messing up program semantics. > 4. If the above could be provided, it would be possible to guarantee that > pthread cancellation cleanup is executing in a signal handler context only for > asynchronous cancellation. I'd like that for my sem unwait. Too bad? Not possible. But I've already described the cancellation mode I want to add whose observable behavior is like this. I'll address the actual semaphore ideas later. Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: sem_getvalue conformance considerations 2014-08-29 22:51 ` Alexander Monakov 2014-08-30 5:12 ` Rich Felker @ 2014-09-01 17:50 ` Alexander Monakov 2015-02-27 23:21 ` semaphore redesign Alexander Monakov 1 sibling, 1 reply; 19+ messages in thread From: Alexander Monakov @ 2014-09-01 17:50 UTC (permalink / raw) To: musl Hi, If there's interest, my basic model file for semaphores in Promela/Spin is pasted below. Perhaps if I started doing this earlier it would help to avoid some mistakes. // spin -a sem.pml && gcc -O2 -DSAFETY pan.c && ./a.out typedef sem_t { int value, waiters; // Behavior int val0, val1; // Implementation }; sem_t sem; #define sem_invariants ( \ (sem.value >= 0 && sem.waiters >= 0) && \ (sem.value == 0 || sem.waiters == 0) && \ sem.val0 == sem.value - sem.waiters && \ sem.val1 >= 0) bool done; active proctype monitor() { do :: d_step { !sem_invariants -> assert(0); } :: done -> break; od } inline sem_trywait(retval) { // d_step sequences are not preemptible if :: d_step { sem.val0 > 0 -> sem.val0--; sem.value--; retval = 0; } :: else { retval = -1; } fi } inline sem_post() { int v; d_step { v = sem.val0; sem.val0++; if :: v >= 0 -> sem.value++; :: v < 0 -> sem.waiters--; fi } if :: v < 0 -> sem.val1++; :: else fi } inline sem_wait(interruptible, retval) { int v; d_step { retval = 0; v = sem.val0; sem.val0--; if :: v > 0 -> sem.value--; :: v <= 0 -> sem.waiters++; fi } if :: v <= 0 -> if // non-deterministic if interruptible && sem.val1 > 0 :: d_step {sem.val1 > 0 -> sem.val1--;} :: interruptible -> d_step { v = sem.val0; if :: v < 0 -> sem.val0++; sem.waiters--; retval = -1; :: else fi } if :: v >= 0 -> d_step {sem.val1 > 0; sem.val1--;} :: else fi fi :: else fi } int n_posts, n_waits, n_waitfails; proctype waiter(bool interruptible) { int retval; n_waits++; sem_wait(interruptible, retval); n_waitfails = n_waitfails - retval; } proctype poster() { n_posts++; sem_post(); } #define NPROCMAX 4 init { int n_procs = NPROCMAX; do // start a non-deterministic amount of posters :: n_procs > 0 -> run poster(); n_procs--; :: 1 -> break; od; do // ditto for waiters :: n_procs > 0 -> run waiter(false); n_procs--; :: 1 -> break; od; do :: n_procs > 0 -> run waiter(true); n_procs--; :: 1 -> break; od; timeout; // wait until quiescent state assert(sem.val1 == 0 && sem.val0 == n_posts + n_waitfails - n_waits); n_procs = sem.waiters; do :: n_procs > 0 -> run poster(); n_procs--; :: else -> break; od; timeout; // wait; there should be no processes except "monitor" assert(sem.val1 == 0 && sem.val0 == n_posts + n_waitfails - n_waits); assert(sem.waiters == 0); done = true; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* semaphore redesign 2014-09-01 17:50 ` Alexander Monakov @ 2015-02-27 23:21 ` Alexander Monakov 2015-02-28 15:42 ` Rich Felker 2015-03-01 17:30 ` Szabolcs Nagy 0 siblings, 2 replies; 19+ messages in thread From: Alexander Monakov @ 2015-02-27 23:21 UTC (permalink / raw) To: musl Hello, As new cancellation has landed (except that __timedwait fails to propagate ECANCELED in addition to ETIMEDOUT and EINTR), I'm happy to post semaphore redesign. We discussed this implementation with Rich on IRC once, and presently I'm not aware of any issues (finally!), but still testing and performance comparison need to be done. This implementation aims to solve the problem with sem_getvalue that started this thread, while also making fast (uncontended) paths leaner. There were some explanations scattered in the old thread; I'll try to provide a summary. Conceptually, a semaphore has a non-negative value and when the value is zero, a non-negative waiter count. The new implementation stores the value minus number of waiters in val[0], and in val[1] it stores the "wake count": the number of waiters that were discounted from val[0] in sem_post; val[1] is futex-woken in sem_post, and futex-waited on in sem_wait. A thread that entered sem_wait and became a waiter by decrementing non-positive val[0] may leave either by consuming a post (decrementing a positive val[1]), or, if error condition such as timeout or cancellation has been propagated from futex wait, by discounting itself as a waiter by incrementing a negative val[0]. Conversely, if after encountering an error a waiter observes non-negative val[0], it means that another thread already discounted it from waiters when doing a sem_post, so the waiter proceeds to consume the post and suppress the error. Since any leaving waiter must either decrement positive val[1] or increment negative val[0], accessing val[1] non-atomically with val[0] in sem_post does not lead to potential use of destroyed semaphore. At Rich's request, I'm posting this as plain C source code rather than a diff. The implementation of sem_getvalue stays the same. Alexander #include <semaphore.h> #include "pthread_impl.h" int sem_post(sem_t *sem) { int val; do { val = sem->__val[0]; if (val == SEM_VALUE_MAX) { errno = EOVERFLOW; return -1; } } while (val != a_cas(sem->__val, val, val+1)); if (val < 0) { int priv = sem->__val[2]; a_inc(sem->__val+1); __wake(sem->__val+1, 1, priv); } return 0; } int sem_trywait(sem_t *sem) { int val; do { val = sem->__val[0]; if (val <= 0) { errno = EAGAIN; return -1; } } while (val != a_cas(sem->__val, val, val-1)); return 0; } static void dummy(void *arg) { } int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at) { pthread_testcancel(); // Do not spin if already contended (val0 is negative) for (int spins = 100; spins && sem->__val[0] == 0; spins--) a_spin(); if (a_fetch_add(sem->__val, -1) > 0) return 0; int priv = sem->__val[2], e = 0, ee, cs; pthread_setcancelstate(PTHREAD_CANCEL_MASKED, &cs); do { ee = __timedwait(sem->__val+1, 0, CLOCK_REALTIME, at, dummy, 0, priv); int wak = sem->__val[1]; if (wak > 0 && wak == a_cas(sem->__val+1, wak, wak-1)) goto done; } while (!ee || ee == EINTR); // Upon returning from wait with an error, either discount ourselves as a waiter // by incrementing negative val0, and propagate error, or consume a racing post // if val0 is non-negative, and suppress error. for (;;) { int val = sem->__val[0]; if (val < 0 && val == a_cas(sem->__val, val, val+1)) break; int wak = sem->__val[1]; if (wak > 0 && wak == a_cas(sem->__val+1, wak, wak-1)) goto done; a_spin(); } e = ee; done: pthread_setcancelstate(cs, 0); if (!e) return 0; if (e == ECANCELED) { pthread_testcancel(); pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, 0); } errno = e; return -1; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: semaphore redesign 2015-02-27 23:21 ` semaphore redesign Alexander Monakov @ 2015-02-28 15:42 ` Rich Felker 2015-03-01 18:54 ` Alexander Monakov 2015-03-01 17:30 ` Szabolcs Nagy 1 sibling, 1 reply; 19+ messages in thread From: Rich Felker @ 2015-02-28 15:42 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1337 bytes --] On Sat, Feb 28, 2015 at 02:21:22AM +0300, Alexander Monakov wrote: > Hello, > > As new cancellation has landed (except that __timedwait fails to propagate > ECANCELED in addition to ETIMEDOUT and EINTR), I'm happy to post semaphore > redesign. We discussed this implementation with Rich on IRC once, and > presently I'm not aware of any issues (finally!), but still testing and > performance comparison need to be done. Thanks! I'm attaching a performance testing program I used in the past. Its time measurement is not very precise, but it should show large-scale differences. It measures the number of messages that can be send back and forth between two threads in 2 seconds by cancelling the two threads after sleep(2). Some other things we should test for performance: - Do something like the attached but with multiple threads contending on semaphores rather than uncontended message passing. - Time/cycles per call for sem_post and sem_trywait and uncontended sem_wait -- this looks like it should be an obvious win though. - Perhaps something to hammer posts and trywait -- it's not clear that this would model any real-world behavior, but it could show something interesting anyway. I doubt we'll see any measurable differences in usage cases where the futex syscall is expected; it should dominate there. Rich [-- Attachment #2: sem_bench.c --] [-- Type: text/plain, Size: 647 bytes --] #include <stdio.h> #include <unistd.h> #include <pthread.h> #include <semaphore.h> static sem_t sem[2]; static long count[2]; static void *func(void *arg) { long my_id = (long) arg; for (;;) { sem_wait(&sem[my_id]); count[my_id]++; sem_post(&sem[1-my_id]); } return NULL; } int main(void) { void *ret; pthread_t t1, t2; sem_init(&sem[0], 0, 1); sem_init(&sem[1], 0, 0); pthread_create(&t1, NULL, func, (void*) 0); pthread_create(&t2, NULL, func, (void*) 1); sleep(2); pthread_cancel(t1); pthread_cancel(t2); pthread_join(t1, &ret); pthread_join(t2, &ret); printf("count=%ld\n", count[0] + count[1]); return 0; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: semaphore redesign 2015-02-28 15:42 ` Rich Felker @ 2015-03-01 18:54 ` Alexander Monakov 0 siblings, 0 replies; 19+ messages in thread From: Alexander Monakov @ 2015-03-01 18:54 UTC (permalink / raw) To: musl A few more things from IRC discussion. Fields in the semaphore struct should be made volatile. The new implementation eliminates the last call site with non-dummy cleanup argument to __timedwait. New sem_timedwait shows worse performance in Rich's sem-bench on my desktop, I think primarily because it does not have sem_trywait preceding the spin loop anymore, plus the loop itself is also a bit leaner. My proposal is to use a different spin count somewhere in 150-200 range instead of 100. Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: semaphore redesign 2015-02-27 23:21 ` semaphore redesign Alexander Monakov 2015-02-28 15:42 ` Rich Felker @ 2015-03-01 17:30 ` Szabolcs Nagy 2015-03-01 17:50 ` Szabolcs Nagy 2015-03-01 18:24 ` Alexander Monakov 1 sibling, 2 replies; 19+ messages in thread From: Szabolcs Nagy @ 2015-03-01 17:30 UTC (permalink / raw) To: musl * Alexander Monakov <amonakov@ispras.ru> [2015-02-28 02:21:22 +0300]: > int sem_post(sem_t *sem) > { > int val; > do { > val = sem->__val[0]; > if (val == SEM_VALUE_MAX) { > errno = EOVERFLOW; > return -1; as discussed on irc early return here without a barrier is not ok (it is a hard to observe corner case, i add the comment here so it does not get forgotten) http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11 > } > } while (val != a_cas(sem->__val, val, val+1)); > if (val < 0) { > int priv = sem->__val[2]; > a_inc(sem->__val+1); > __wake(sem->__val+1, 1, priv); > } > return 0; > } > > int sem_trywait(sem_t *sem) > { > int val; > do { > val = sem->__val[0]; > if (val <= 0) { > errno = EAGAIN; > return -1; likewise > } > } while (val != a_cas(sem->__val, val, val-1)); > return 0; > } > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: semaphore redesign 2015-03-01 17:30 ` Szabolcs Nagy @ 2015-03-01 17:50 ` Szabolcs Nagy 2015-03-02 22:40 ` Alexander Monakov 2015-03-01 18:24 ` Alexander Monakov 1 sibling, 1 reply; 19+ messages in thread From: Szabolcs Nagy @ 2015-03-01 17:50 UTC (permalink / raw) To: musl * Szabolcs Nagy <nsz@port70.net> [2015-03-01 18:30:49 +0100]: > * Alexander Monakov <amonakov@ispras.ru> [2015-02-28 02:21:22 +0300]: > > int sem_post(sem_t *sem) > > { > > int val; > > do { > > val = sem->__val[0]; > > if (val == SEM_VALUE_MAX) { > > errno = EOVERFLOW; > > return -1; > > as discussed on irc early return here without a barrier is not ok > (it is a hard to observe corner case, i add the comment here so > it does not get forgotten) > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11 > sorry the code is ok (applications cannot rely on the barrier in case of failure), it can lead to surprising results if the application uses relaxed atomics, but it's not a conformance issue > > } > > } while (val != a_cas(sem->__val, val, val+1)); > > if (val < 0) { > > int priv = sem->__val[2]; > > a_inc(sem->__val+1); > > __wake(sem->__val+1, 1, priv); > > } > > return 0; > > } > > > > int sem_trywait(sem_t *sem) > > { > > int val; > > do { > > val = sem->__val[0]; > > if (val <= 0) { > > errno = EAGAIN; > > return -1; > > likewise > > > } > > } while (val != a_cas(sem->__val, val, val-1)); > > return 0; > > } > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: semaphore redesign 2015-03-01 17:50 ` Szabolcs Nagy @ 2015-03-02 22:40 ` Alexander Monakov 2015-03-02 22:45 ` Rich Felker 0 siblings, 1 reply; 19+ messages in thread From: Alexander Monakov @ 2015-03-02 22:40 UTC (permalink / raw) To: musl On Sun, 1 Mar 2015, Szabolcs Nagy wrote: > sorry > > the code is ok (applications cannot rely on the barrier in case of > failure), it can lead to surprising results if the application > uses relaxed atomics, but it's not a conformance issue There was some follow up on IRC with the conclusion, as I understood, that even though the rest of memory may be unsynchronized after returning an error, the memory holding the semaphore itself needs to be synchronized (otherwise the decision to return an error might have been based on stale memory). Does sem_getvalue need to synchronize memory as well? I think it should, but current implementation does not. Alexander ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: semaphore redesign 2015-03-02 22:40 ` Alexander Monakov @ 2015-03-02 22:45 ` Rich Felker 0 siblings, 0 replies; 19+ messages in thread From: Rich Felker @ 2015-03-02 22:45 UTC (permalink / raw) To: musl On Tue, Mar 03, 2015 at 01:40:37AM +0300, Alexander Monakov wrote: > On Sun, 1 Mar 2015, Szabolcs Nagy wrote: > > sorry > > > > the code is ok (applications cannot rely on the barrier in case of > > failure), it can lead to surprising results if the application > > uses relaxed atomics, but it's not a conformance issue > > There was some follow up on IRC with the conclusion, as I understood, that > even though the rest of memory may be unsynchronized after returning an error, > the memory holding the semaphore itself needs to be synchronized (otherwise > the decision to return an error might have been based on stale memory). Right. The exemption from formally synchronizing memory in the case of errors does not give license to falsely report errors due to failure of the implementation to detect that it's in a non-error state. That would just be an implementation bug. > Does sem_getvalue need to synchronize memory as well? I think it should, but > current implementation does not. sem_getvalue is required return a value that was valid at some time during the call. There's no formal requirement to "synchronize memory" in the POSIX sense AFAIK, but if a barrier is required to meet the requirement for the value, it should use one. Rich ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: semaphore redesign 2015-03-01 17:30 ` Szabolcs Nagy 2015-03-01 17:50 ` Szabolcs Nagy @ 2015-03-01 18:24 ` Alexander Monakov 1 sibling, 0 replies; 19+ messages in thread From: Alexander Monakov @ 2015-03-01 18:24 UTC (permalink / raw) To: musl On Sun, 1 Mar 2015, Szabolcs Nagy wrote: > * Alexander Monakov <amonakov@ispras.ru> [2015-02-28 02:21:22 +0300]: > > int sem_post(sem_t *sem) > > { > > int val; > > do { > > val = sem->__val[0]; > > if (val == SEM_VALUE_MAX) { > > errno = EOVERFLOW; > > return -1; > > as discussed on irc early return here without a barrier is not ok > (it is a hard to observe corner case, i add the comment here so > it does not get forgotten) We further discussed that to fix it, one can recheck the value after a barrier in the error path, and restart from the beginning if it changed, or always proceed to CAS (with value changed only if not leading to error), and handle errors after the cas-retry loop. The following code implements the latter approach. I strongly prefer it for sem_trywait, where I think EAGAIN is relatively common. For sem_post it's not so clear cut for me, as EOVERFLOW should be extremely rare, but still it helps to get good code layout from the compiler (otherwise GCC lays out error return path inside of the cas loop). int sem_trywait(sem_t *sem) { int val; do val = sem->__val[0]; while (val != a_cas(sem->__val, val, val-!!(val>0))); if (val > 0) return 0; errno = EAGAIN; return -1; } int sem_post(sem_t *sem) { int val; do val = sem->__val[0]; while (val != a_cas(sem->__val, val, val+!!(val<SEM_VALUE_MAX))); if (val < 0) { int priv = sem->__val[2]; a_inc(sem->__val+1); __wake(sem->__val+1, 1, priv); } if (val < SEM_VALUE_MAX) return 0; errno = EOVERFLOW; return -1; } ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-03-02 22:45 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-27 2:33 sem_getvalue conformance considerations Rich Felker 2014-08-27 7:05 ` Jens Gustedt 2014-08-27 7:43 ` Rich Felker 2014-08-27 10:43 ` Alexander Monakov 2014-08-27 13:32 ` Alexander Monakov 2014-08-27 19:06 ` Alexander Monakov 2014-08-27 21:06 ` Alexander Monakov 2014-08-28 20:47 ` Alexander Monakov 2014-08-29 22:51 ` Alexander Monakov 2014-08-30 5:12 ` Rich Felker 2014-09-01 17:50 ` Alexander Monakov 2015-02-27 23:21 ` semaphore redesign Alexander Monakov 2015-02-28 15:42 ` Rich Felker 2015-03-01 18:54 ` Alexander Monakov 2015-03-01 17:30 ` Szabolcs Nagy 2015-03-01 17:50 ` Szabolcs Nagy 2015-03-02 22:40 ` Alexander Monakov 2015-03-02 22:45 ` Rich Felker 2015-03-01 18:24 ` Alexander Monakov
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).