From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7492 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Resuming work on new semaphore Date: Thu, 23 Apr 2015 12:06:24 -0400 Message-ID: <20150423160624.GF17573@brightrain.aerifal.cx> References: <20150402152642.GW6817@brightrain.aerifal.cx> <20150402231457.GC6817@brightrain.aerifal.cx> <20150405190214.GF6817@brightrain.aerifal.cx> <20150405202314.GG6817@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1429805216 23541 80.91.229.3 (23 Apr 2015 16:06:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 23 Apr 2015 16:06:56 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7505-gllmg-musl=m.gmane.org@lists.openwall.com Thu Apr 23 18:06:40 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YlJeB-0005Ro-IE for gllmg-musl@m.gmane.org; Thu, 23 Apr 2015 18:06:39 +0200 Original-Received: (qmail 30689 invoked by uid 550); 23 Apr 2015 16:06:37 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 30615 invoked from network); 23 Apr 2015 16:06:36 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7492 Archived-At: I'm going to try to summarize some of the issues that have been discussed on IRC since this. On Sun, Apr 12, 2015 at 01:22:34AM +0300, Alexander Monakov wrote: > On Mon, 6 Apr 2015, Alexander Monakov wrote: > > One other thing to consider. In the absence of concurrent operations on the > > semaphore, return value of sem_getvalue should be equal to the number of times > > sem_trywait will indicate success when called repeatedly. So if the > > implementation performs post-stealing in trywait, it should return the higher > > bound as semaphore value. Likewise for timedwait. > > If we accept the above, it follows that in the new implementation getvalue > should return not max(0, val[0] + val[1]), but rather max(0, val[0]) + val[1]. Indeed. But then max(0, val[0]) + val[1] can overflow SEM_VALUE_MAX unless we prevent it, which takes some work, but I think it's possible. > int sem_post(sem_t *sem) > { > int val; > do val = sem->__val[0]; > while (val != a_cas(sem->__val, val, val+!!(val if (val < 0) { > int priv = sem->__val[2]; > a_inc(sem->__val+1); > __wake(sem->__val+1, 1, priv); > } > if (val < SEM_VALUE_MAX) return 0; > errno = EOVERFLOW; > return -1; > } The first observation we made was that this checks val__val[0]; val -= val==SEM_VALUE_MAX; while (a_cas(sem->__val, val, val+1) != val) { if ((val = sem->__val[0]) == SEM_VALUE_MAX) { errno = EOVERFLOW; return -1; } } if (val < 0) { int priv = sem->__val[2]; a_inc(sem->__val+1); __wake(sem->__val+1, 1, priv); } return 0; } or this (my1b): int sem_post(sem_t *sem) { int old, val = sem->__val[0]; val -= val==SEM_VALUE_MAX; while ((old = a_cas(sem->__val, val, val+1)) != val) { if ((val = old) == SEM_VALUE_MAX) { errno = EOVERFLOW; return -1; } } if (val < 0) { int priv = sem->__val[2]; a_inc(sem->__val+1); __wake(sem->__val+1, 1, priv); } return 0; } The latter saves the result of a_cas to prevent an extra load, but I don't think it makes any significant difference and it might be seen as uglier. However neither of those address the overflow issue, which I've tried to address here: #define VAL0_MAX ((SEM_VALUE_MAX+1)/2) #define VAL1_MAX (SEM_VALUE_MAX/2) int sem_post(sem_t *sem) { int val = sem->__val[0]; val -= val==VAL0_MAX; while (a_cas(sem->__val, val, val+1) != val) { if ((val = sem->__val[0]) == VAL0_MAX) { int tmp = sem->__val[1]; if (tmp >= VAL1_MAX) { errno = EOVERFLOW; return -1; } if (a_cas(sem->__val+1, tmp, tmp+1) == tmp) { return 0; } val--; } } if (val < 0) { int priv = sem->__val[2]; a_inc(sem->__val+1); __wake(sem->__val+1, 1, priv); } return 0; } This is code whose idea was discussed on IRC but not yet presented, so it may have significant bugs. The idea is to limit the main sem value component and the wake count separately to half the max. Once val[0] hits VAL0_MAX, further posts will be in the form of wakes for nonexistent waiters (which are ok but more costly). This allows the total observed value to reach all the way up to SEM_VALUE_MAX. If this happens, waiters will consume all of val[0] first, and the wakes will all remain pending until val[0] reaches 0. At that point, new waiters will decrement val[0] to a negative value (indicating a waiter), attempt a futex wait, fail because there are wakes pending, consume one of the wakes, and exit. (Note: this useless futex wait can be optimized out by reordering the do-while loop body in sem_timedwait.) During this state, there is a race window where val[1] can exceed VAL1_MAX -- if a post happens after a new waiter decrements val[0] but before it consumes a wake from val[1], a concurrent post will increment val[0] back to 0 and increment val[1] unconditionally. However, the magnitude of such overshoot is bounded by the number of tasks which is necessarily bounded by INT_MAX/4 which is less than VAL1_MAX, so no integer overflow can happen here (except in the case of async-killed waiters). Does this all sound correct? Rich