From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5847 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: My current understanding of cond var access restrictions Date: Thu, 14 Aug 2014 12:58:17 -0400 Message-ID: <20140814165817.GD12888@brightrain.aerifal.cx> References: <20140813212358.GA25429@brightrain.aerifal.cx> <1407972025.4951.73.camel@eris.loria.fr> <20140814061009.GA6599@brightrain.aerifal.cx> <1408003204.4951.92.camel@eris.loria.fr> <20140814144110.GY12888@brightrain.aerifal.cx> <1408033641.4951.116.camel@eris.loria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1408035517 24110 80.91.229.3 (14 Aug 2014 16:58:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 14 Aug 2014 16:58:37 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5853-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 14 18:58:31 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XHyMA-0001mI-In for gllmg-musl@plane.gmane.org; Thu, 14 Aug 2014 18:58:30 +0200 Original-Received: (qmail 1349 invoked by uid 550); 14 Aug 2014 16:58:29 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 1338 invoked from network); 14 Aug 2014 16:58:29 -0000 Content-Disposition: inline In-Reply-To: <1408033641.4951.116.camel@eris.loria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5847 Archived-At: On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote: > Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker: > > Thus I'm skeptical of trying an approach like this when it would be > > easier, and likely less costly on the common usage cases, just to > > remove requeue and always use broadcast wakes. I modified your test > > case for the bug to use a process-shared cv (using broadcast wake), > > and as expected, the test runs with no failure. > > You shouldn't draw much conclusion from the fact that it works in that > case. This still heavily interacts with the waiters count and thus a > signaling thread that comes after such a broadcast might wake up a > thread that it shouldn't. > > (But I didn't do a full analysis of that situation.) In the process-shared case, broadcast just increments the sequence number and wakes all futex waiters. It's very simple. Formally, there's no such thing as waking up a thread you shouldn't, since spurious wakes are always allowed. The current implementation has a lot of potential for spurious wakes but they don't happen except in certain situations: - If a futex wait gets interrupted by a signal, the wait will always terminate after the signal handler returns if any intervening signals or broadcasts happened (except in the case of a full wraparound of the sequence number, i.e. exactly 2<<32 cv signals while stuck in a signal handler, which I don't know how to fix, but it would be easy to write a test for this) even if the signal was already received by another waiter. - If the sequence number gets incremented by a signal before the initial futex wait, the waiter will return immediately; this can happen to multiple waiters even for just one signal. Really sequence numbers are the wrong tool here, but they were introduced because the previous approach (having each waiter write its own tid, and futex wait comparing that tid) lead to pathologically bad performance under heavy waiter arrival where waiters were constantly returning because another waiter was almost always able to write its tid before the first one could block on the futex. I'd like to have a better solution, but I can't think of any. Rich