From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7104 Path: news.gmane.org!not-for-mail From: Alexander Monakov Newsgroups: gmane.linux.lib.musl.general Subject: semaphore redesign Date: Sat, 28 Feb 2015 02:21:22 +0300 (MSK) Message-ID: References: <20140827023338.GA21076@brightrain.aerifal.cx> <1409123141.4476.18.camel@eris.loria.fr> <20140827074310.GK12888@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Trace: ger.gmane.org 1425079303 11744 80.91.229.3 (27 Feb 2015 23:21:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 27 Feb 2015 23:21:43 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7117-gllmg-musl=m.gmane.org@lists.openwall.com Sat Feb 28 00:21:42 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YRUE1-0004rR-CJ for gllmg-musl@m.gmane.org; Sat, 28 Feb 2015 00:21:41 +0100 Original-Received: (qmail 27889 invoked by uid 550); 27 Feb 2015 23:21:38 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 27834 invoked from network); 27 Feb 2015 23:21:33 -0000 In-Reply-To: User-Agent: Alpine 2.11 (LNX 23 2013-08-11) Xref: news.gmane.org gmane.linux.lib.musl.general:7104 Archived-At: Hello, As new cancellation has landed (except that __timedwait fails to propagate ECANCELED in addition to ETIMEDOUT and EINTR), I'm happy to post semaphore redesign. We discussed this implementation with Rich on IRC once, and presently I'm not aware of any issues (finally!), but still testing and performance comparison need to be done. This implementation aims to solve the problem with sem_getvalue that started this thread, while also making fast (uncontended) paths leaner. There were some explanations scattered in the old thread; I'll try to provide a summary. Conceptually, a semaphore has a non-negative value and when the value is zero, a non-negative waiter count. The new implementation stores the value minus number of waiters in val[0], and in val[1] it stores the "wake count": the number of waiters that were discounted from val[0] in sem_post; val[1] is futex-woken in sem_post, and futex-waited on in sem_wait. A thread that entered sem_wait and became a waiter by decrementing non-positive val[0] may leave either by consuming a post (decrementing a positive val[1]), or, if error condition such as timeout or cancellation has been propagated from futex wait, by discounting itself as a waiter by incrementing a negative val[0]. Conversely, if after encountering an error a waiter observes non-negative val[0], it means that another thread already discounted it from waiters when doing a sem_post, so the waiter proceeds to consume the post and suppress the error. Since any leaving waiter must either decrement positive val[1] or increment negative val[0], accessing val[1] non-atomically with val[0] in sem_post does not lead to potential use of destroyed semaphore. At Rich's request, I'm posting this as plain C source code rather than a diff. The implementation of sem_getvalue stays the same. Alexander #include #include "pthread_impl.h" int sem_post(sem_t *sem) { int val; do { val = sem->__val[0]; if (val == SEM_VALUE_MAX) { errno = EOVERFLOW; return -1; } } while (val != a_cas(sem->__val, val, val+1)); if (val < 0) { int priv = sem->__val[2]; a_inc(sem->__val+1); __wake(sem->__val+1, 1, priv); } return 0; } int sem_trywait(sem_t *sem) { int val; do { val = sem->__val[0]; if (val <= 0) { errno = EAGAIN; return -1; } } while (val != a_cas(sem->__val, val, val-1)); return 0; } static void dummy(void *arg) { } int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at) { pthread_testcancel(); // Do not spin if already contended (val0 is negative) for (int spins = 100; spins && sem->__val[0] == 0; spins--) a_spin(); if (a_fetch_add(sem->__val, -1) > 0) return 0; int priv = sem->__val[2], e = 0, ee, cs; pthread_setcancelstate(PTHREAD_CANCEL_MASKED, &cs); do { ee = __timedwait(sem->__val+1, 0, CLOCK_REALTIME, at, dummy, 0, priv); int wak = sem->__val[1]; if (wak > 0 && wak == a_cas(sem->__val+1, wak, wak-1)) goto done; } while (!ee || ee == EINTR); // Upon returning from wait with an error, either discount ourselves as a waiter // by incrementing negative val0, and propagate error, or consume a racing post // if val0 is non-negative, and suppress error. for (;;) { int val = sem->__val[0]; if (val < 0 && val == a_cas(sem->__val, val, val+1)) break; int wak = sem->__val[1]; if (wak > 0 && wak == a_cas(sem->__val+1, wak, wak-1)) goto done; a_spin(); } e = ee; done: pthread_setcancelstate(cs, 0); if (!e) return 0; if (e == ECANCELED) { pthread_testcancel(); pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, 0); } errno = e; return -1; }