Am Donnerstag, den 14.08.2014, 12:58 -0400 schrieb Rich Felker:
> On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote:
> > Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker:
> > > Thus I'm skeptical of trying an approach like this when it would be
> > > easier, and likely less costly on the common usage cases, just to
> > > remove requeue and always use broadcast wakes. I modified your test
> > > case for the bug to use a process-shared cv (using broadcast wake),
> > > and as expected, the test runs with no failure.
> > 
> > You shouldn't draw much conclusion from the fact that it works in that
> > case. This still heavily interacts with the waiters count and thus a
> > signaling thread that comes after such a broadcast might wake up a
> > thread that it shouldn't.
> > 
> > (But I didn't do a full analysis of that situation.)
> 
> In the process-shared case, broadcast just increments the sequence
> number and wakes all futex waiters. It's very simple.
> 
> Formally, there's no such thing as waking up a thread you shouldn't,
> since spurious wakes are always allowed.

I meant an internal wake, not one that returns to user space, and that
might wake up a thread that came into waiting after the corresponding
signaling thread entered his call. But sequence numbers here probably
are sufficient to ensure that at that point this is already a thread
that has the right to wakeup.

I am just getting sort of paranoid on that stuff :)

> The current implementation
> has a lot of potential for spurious wakes but they don't happen except
> in certain situations:
> 
> - If a futex wait gets interrupted by a signal, the wait will always
>   terminate after the signal handler returns if any intervening
>   signals or broadcasts happened (except in the case of a full
>   wraparound of the sequence number, i.e. exactly 2<<32 cv signals
>   while stuck in a signal handler, which I don't know how to fix, but
>   it would be easy to write a test for this) even if the signal was
>   already received by another waiter.
> 
> - If the sequence number gets incremented by a signal before the
>   initial futex wait, the waiter will return immediately; this can
>   happen to multiple waiters even for just one signal.
> 
> Really sequence numbers are the wrong tool here, but they were
> introduced because the previous approach (having each waiter write its
> own tid, and futex wait comparing that tid) lead to pathologically bad
> performance under heavy waiter arrival where waiters were constantly
> returning because another waiter was almost always able to write its
> tid before the first one could block on the futex. I'd like to have a
> better solution, but I can't think of any.

I don't think they are too bad, actually. They help to distinguish two
phases for a waiting thread. In the first, he has released the mutex
and no signal or broadcast has been issued. A thread should never
attempt to relock the mutex and/or return to user space during that
phase.

And then the second phase after such a signal or broadcast, where any
wakeup could be legitimate and in the worst case just be spurious.

Jens


-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::