Am Donnerstag, den 14.08.2014, 12:58 -0400 schrieb Rich Felker: > On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote: > > Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker: > > > Thus I'm skeptical of trying an approach like this when it would be > > > easier, and likely less costly on the common usage cases, just to > > > remove requeue and always use broadcast wakes. I modified your test > > > case for the bug to use a process-shared cv (using broadcast wake), > > > and as expected, the test runs with no failure. > > > > You shouldn't draw much conclusion from the fact that it works in that > > case. This still heavily interacts with the waiters count and thus a > > signaling thread that comes after such a broadcast might wake up a > > thread that it shouldn't. > > > > (But I didn't do a full analysis of that situation.) > > In the process-shared case, broadcast just increments the sequence > number and wakes all futex waiters. It's very simple. > > Formally, there's no such thing as waking up a thread you shouldn't, > since spurious wakes are always allowed. I meant an internal wake, not one that returns to user space, and that might wake up a thread that came into waiting after the corresponding signaling thread entered his call. But sequence numbers here probably are sufficient to ensure that at that point this is already a thread that has the right to wakeup. I am just getting sort of paranoid on that stuff :) > The current implementation > has a lot of potential for spurious wakes but they don't happen except > in certain situations: > > - If a futex wait gets interrupted by a signal, the wait will always > terminate after the signal handler returns if any intervening > signals or broadcasts happened (except in the case of a full > wraparound of the sequence number, i.e. exactly 2<<32 cv signals > while stuck in a signal handler, which I don't know how to fix, but > it would be easy to write a test for this) even if the signal was > already received by another waiter. > > - If the sequence number gets incremented by a signal before the > initial futex wait, the waiter will return immediately; this can > happen to multiple waiters even for just one signal. > > Really sequence numbers are the wrong tool here, but they were > introduced because the previous approach (having each waiter write its > own tid, and futex wait comparing that tid) lead to pathologically bad > performance under heavy waiter arrival where waiters were constantly > returning because another waiter was almost always able to write its > tid before the first one could block on the futex. I'd like to have a > better solution, but I can't think of any. I don't think they are too bad, actually. They help to distinguish two phases for a waiting thread. In the first, he has released the mutex and no signal or broadcast has been issued. A thread should never attempt to relock the mutex and/or return to user space during that phase. And then the second phase after such a signal or broadcast, where any wakeup could be legitimate and in the worst case just be spurious. Jens -- :: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::