Am Freitag, den 05.09.2014, 12:45 -0400 schrieb Rich Felker:
> On Fri, Sep 05, 2014 at 11:27:52AM +0200, Jens Gustedt wrote:
> > again, out of the head, i remember two places that have "adhoc"
> > locking conventions that we touched in the last weeks, cv and
> > pthread_once_t. Both encode waiters in a non-optimal way: cv only has
> > an estimate of "somebody might be waiting", pthread_once_t uses a
> > static `waiters` variable for all instances.
> 
> Yes, this is bad, but pthread_once_t would be fine with a waiters flag
> in the futex int. I'll look at improving that later. But since the
> number of pthread_once_t objects in the whole program is
> finite/bounded, it's really irrelevant to performance.
> 
> > > but it's a bad idea to have the waiters
> > > count inside the futex int. This is because every time the futex int
> > > changes, you risk FUTEX_WAIT returning with EAGAIN. Under high load of
> > > waiters arriving/leaving, it might be impossible to _ever_ start a
> > > wait; all threads may just spin. The futex int should be something
> > > that doesn't change except for a very small finite number of changes
> > > (like adding a waiter flag, not a waiter count) before you actually
> > > have a chance of successfully getting the lock or whatever.
> > 
> > First, I think you overestimate the trouble that we can have with
> > that.
> 
> Actually I don't. It follows from the massive performance problems I
> had with the original cv implementation where wait was "write your own
> tid then futex wait" and signal was "write your own tid then futex
> wake". This is trivially correct and avoids sequence number wrapping
> issues like the current pshared implementation, but it had a HUGE
> number of spurious wakes: something like 3:1 (spurious:genuine) for
> several threads hammering the cv. 

The situation would be different here. A thread would only update the
waiters part of the `int` once, and then attempt to do the futex wait
in a loop (while EAGAIN) without changing it again.

> On typical cpus, there's a window of 500+ cycles between the atomic
> operation in userspace and rechecking it in kernelspace. This is
> plenty time for another core to cause your futex wait to EAGAIN.
> 
> > In any case, for the two use cases I mention above this problem isn't
> > much relevant, I think. For cv this is the internal lock, there will
> > be at most one of "waiting" threads on that (he has to hold the mutex)
> > and perhaps several signaling threads, but it should be rare that this
> > is contended by more than two threads at a time. For the current
> > version the "may-have-waiters" flag may trigger some superfluous wake
> > up, I think.
> 
> This is possibly true, but would need some analysis, and thinking
> about it now is distracting me from getting done other things that we
> want to get done now. ;-)
> 
> > And pthread_once_t is of minor performance importance, anyhow.
> 
> I don't see how a may-have-waiters approach has any cost for
> pthread_once_t unless some of the waiters are cancelled. And that's a
> pathological case that's not worth optimizing.

right, at least here such an approach would help to avoid the ugly
static "waiters" variable.

Jens

-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::