From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5847
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: My current understanding of cond var access restrictions
Date: Thu, 14 Aug 2014 12:58:17 -0400
Message-ID: <20140814165817.GD12888@brightrain.aerifal.cx>
References: <20140813212358.GA25429@brightrain.aerifal.cx>
 <1407972025.4951.73.camel@eris.loria.fr>
 <20140814061009.GA6599@brightrain.aerifal.cx>
 <1408003204.4951.92.camel@eris.loria.fr>
 <20140814144110.GY12888@brightrain.aerifal.cx>
 <1408033641.4951.116.camel@eris.loria.fr>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1408035517 24110 80.91.229.3 (14 Aug 2014 16:58:37 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 14 Aug 2014 16:58:37 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-5853-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 14 18:58:31 2014
Return-path: <musl-return-5853-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-5853-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1XHyMA-0001mI-In
	for gllmg-musl@plane.gmane.org; Thu, 14 Aug 2014 18:58:30 +0200
Original-Received: (qmail 1349 invoked by uid 550); 14 Aug 2014 16:58:29 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 1338 invoked from network); 14 Aug 2014 16:58:29 -0000
Content-Disposition: inline
In-Reply-To: <1408033641.4951.116.camel@eris.loria.fr>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:5847
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/5847>

On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker:
> > Thus I'm skeptical of trying an approach like this when it would be
> > easier, and likely less costly on the common usage cases, just to
> > remove requeue and always use broadcast wakes. I modified your test
> > case for the bug to use a process-shared cv (using broadcast wake),
> > and as expected, the test runs with no failure.
> 
> You shouldn't draw much conclusion from the fact that it works in that
> case. This still heavily interacts with the waiters count and thus a
> signaling thread that comes after such a broadcast might wake up a
> thread that it shouldn't.
> 
> (But I didn't do a full analysis of that situation.)

In the process-shared case, broadcast just increments the sequence
number and wakes all futex waiters. It's very simple.

Formally, there's no such thing as waking up a thread you shouldn't,
since spurious wakes are always allowed. The current implementation
has a lot of potential for spurious wakes but they don't happen except
in certain situations:

- If a futex wait gets interrupted by a signal, the wait will always
  terminate after the signal handler returns if any intervening
  signals or broadcasts happened (except in the case of a full
  wraparound of the sequence number, i.e. exactly 2<<32 cv signals
  while stuck in a signal handler, which I don't know how to fix, but
  it would be easy to write a test for this) even if the signal was
  already received by another waiter.

- If the sequence number gets incremented by a signal before the
  initial futex wait, the waiter will return immediately; this can
  happen to multiple waiters even for just one signal.

Really sequence numbers are the wrong tool here, but they were
introduced because the previous approach (having each waiter write its
own tid, and futex wait comparing that tid) lead to pathologically bad
performance under heavy waiter arrival where waiters were constantly
returning because another waiter was almost always able to write its
tid before the first one could block on the futex. I'd like to have a
better solution, but I can't think of any.

Rich