Re: Resuming work on new semaphore

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Resuming work on new semaphore
Date: Thu, 23 Apr 2015 12:06:24 -0400	[thread overview]
Message-ID: <20150423160624.GF17573@brightrain.aerifal.cx> (raw)
In-Reply-To: <alpine.LNX.2.11.1504120104090.8190@monopod.intra.ispras.ru>

I'm going to try to summarize some of the issues that have been
discussed on IRC since this.

On Sun, Apr 12, 2015 at 01:22:34AM +0300, Alexander Monakov wrote:
> On Mon, 6 Apr 2015, Alexander Monakov wrote:
> > One other thing to consider.  In the absence of concurrent operations on the
> > semaphore, return value of sem_getvalue should be equal to the number of times
> > sem_trywait will indicate success when called repeatedly.  So if the
> > implementation performs post-stealing in trywait, it should return the higher
> > bound as semaphore value.  Likewise for timedwait.
> 
> If we accept the above, it follows that in the new implementation getvalue
> should return not max(0, val[0] + val[1]), but rather max(0, val[0]) + val[1].

Indeed. But then max(0, val[0]) + val[1] can overflow SEM_VALUE_MAX
unless we prevent it, which takes some work, but I think it's
possible.

> int sem_post(sem_t *sem)
> {
> 	int val;
> 	do val = sem->__val[0];
> 	while (val != a_cas(sem->__val, val, val+!!(val<SEM_VALUE_MAX)));
> 	if (val < 0) {
> 		int priv = sem->__val[2];
> 		a_inc(sem->__val+1);
> 		__wake(sem->__val+1, 1, priv);
> 	}
> 	if (val < SEM_VALUE_MAX) return 0;
> 	errno = EOVERFLOW;
> 	return -1;
> }

The first observation we made was that this checks val<SEM_VALUE_MAX
twice in the success path for an extra useless branch. It can be fixed
by something like this (my1):

int sem_post(sem_t *sem)
{
	int val = sem->__val[0];
	val -= val==SEM_VALUE_MAX;
	while (a_cas(sem->__val, val, val+1) != val) {
		if ((val = sem->__val[0]) == SEM_VALUE_MAX) {
			errno = EOVERFLOW;
			return -1;
		}
	}
	if (val < 0) {
		int priv = sem->__val[2];
		a_inc(sem->__val+1);
		__wake(sem->__val+1, 1, priv);
	}
	return 0;
}

or this (my1b):

int sem_post(sem_t *sem)
{
	int old, val = sem->__val[0];
	val -= val==SEM_VALUE_MAX;
	while ((old = a_cas(sem->__val, val, val+1)) != val) {
		if ((val = old) == SEM_VALUE_MAX) {
			errno = EOVERFLOW;
			return -1;
		}
	}
	if (val < 0) {
		int priv = sem->__val[2];
		a_inc(sem->__val+1);
		__wake(sem->__val+1, 1, priv);
	}
	return 0;
}

The latter saves the result of a_cas to prevent an extra load, but I
don't think it makes any significant difference and it might be seen
as uglier.

However neither of those address the overflow issue, which I've tried
to address here:

#define VAL0_MAX ((SEM_VALUE_MAX+1)/2)
#define VAL1_MAX (SEM_VALUE_MAX/2)

int sem_post(sem_t *sem)
{
	int val = sem->__val[0];
	val -= val==VAL0_MAX;
	while (a_cas(sem->__val, val, val+1) != val) {
		if ((val = sem->__val[0]) == VAL0_MAX) {
			int tmp = sem->__val[1];
			if (tmp >= VAL1_MAX) {
				errno = EOVERFLOW;
				return -1;
			}
			if (a_cas(sem->__val+1, tmp, tmp+1) == tmp) {
				return 0;
			}
			val--;
		}
	}
	if (val < 0) {
		int priv = sem->__val[2];
		a_inc(sem->__val+1);
		__wake(sem->__val+1, 1, priv);
	}
	return 0;
}

This is code whose idea was discussed on IRC but not yet presented, so
it may have significant bugs. The idea is to limit the main sem value
component and the wake count separately to half the max. Once val[0]
hits VAL0_MAX, further posts will be in the form of wakes for
nonexistent waiters (which are ok but more costly). This allows the
total observed value to reach all the way up to SEM_VALUE_MAX.

If this happens, waiters will consume all of val[0] first, and the
wakes will all remain pending until val[0] reaches 0. At that point,
new waiters will decrement val[0] to a negative value (indicating a
waiter), attempt a futex wait, fail because there are wakes pending,
consume one of the wakes, and exit.

(Note: this useless futex wait can be optimized out by reordering the
do-while loop body in sem_timedwait.)

During this state, there is a race window where val[1] can exceed
VAL1_MAX -- if a post happens after a new waiter decrements val[0] but
before it consumes a wake from val[1], a concurrent post will
increment val[0] back to 0 and increment val[1] unconditionally.
However, the magnitude of such overshoot is bounded by the number of
tasks which is necessarily bounded by INT_MAX/4 which is less than
VAL1_MAX, so no integer overflow can happen here (except in the case
of async-killed waiters).

Does this all sound correct?

Rich

next prev parent reply	other threads:[~2015-04-23 16:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-02  1:30 Rich Felker
2015-04-02  7:42 ` Alexander Monakov
2015-04-02 15:26   ` Rich Felker
2015-04-02 21:39     ` Alexander Monakov
2015-04-02 23:14       ` Rich Felker
2015-04-05 14:07         ` Alexander Monakov
2015-04-05 14:17           ` Alexander Monakov
2015-04-05 19:02             ` Rich Felker
2015-04-05 20:03               ` Alexander Monakov
2015-04-05 20:23                 ` Rich Felker
2015-04-05 21:07                   ` Alexander Monakov
2015-04-11 22:22                     ` Alexander Monakov
2015-04-23 16:06                       ` Rich Felker [this message]
2015-04-23 18:24                         ` Alexander Monakov
2015-04-23 20:01                           ` Alexander Monakov
2015-04-24  2:46                             ` Rich Felker
2015-04-24 10:23                               ` Alexander Monakov
2015-04-24 15:03                                 ` Rich Felker
2015-04-24 15:47                                 ` Alexander Monakov
2015-04-24 15:59                                   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150423160624.GF17573@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).