From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5845 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: My current understanding of cond var access restrictions Date: Thu, 14 Aug 2014 11:36:15 -0400 Message-ID: <20140814153615.GB12888@brightrain.aerifal.cx> References: <20140813212358.GA25429@brightrain.aerifal.cx> <1407972025.4951.73.camel@eris.loria.fr> <20140814061009.GA6599@brightrain.aerifal.cx> <1408003204.4951.92.camel@eris.loria.fr> <20140814144110.GY12888@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1408030601 19531 80.91.229.3 (14 Aug 2014 15:36:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 14 Aug 2014 15:36:41 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5851-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 14 17:36:32 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XHx4n-0008EL-Du for gllmg-musl@plane.gmane.org; Thu, 14 Aug 2014 17:36:29 +0200 Original-Received: (qmail 11644 invoked by uid 550); 14 Aug 2014 15:36:28 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 11633 invoked from network); 14 Aug 2014 15:36:27 -0000 Content-Disposition: inline In-Reply-To: <20140814144110.GY12888@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5845 Archived-At: On Thu, Aug 14, 2014 at 10:41:10AM -0400, Rich Felker wrote: > On Thu, Aug 14, 2014 at 10:00:04AM +0200, Jens Gustedt wrote: > > Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker: > > > I think I have an informal proof sketch that this is necessary unless > > > we abandon requeue: > > > > > ... > > > > > With that in mind, I'd like to look for ways we can fix the bogus > > > waiter accounting for the mutex that seems to be the source of the bug > > > you found. One "obvious" (but maybe bad/wrong?) solution would be to > > > put the count on the mutex at the time of waiting (rather than moving > > > it there as part of broadcast), so that decrementing the mutex waiter > > > count is always the right thing to do in unwait. > > > > sounds like a good idea, at least for correctness > > > > > Of course this > > > possibly results in lots of spurious futex wakes to the mutex (every > > > time it's unlocked while there are waiters on the cv, which could be a > > > lot). > > > > I we'd be more careful in not spreading too much wakes where we > > shouldn't, there would perhaps not be "a lot" of such wakeups. > > Well this is different from the wake-after-release that you dislike. > It's a wake on a necessarily-valid object that just doesn't have any > actual waiters right now because its potential-waiters are still > waiting on the cv. > > However I think it may be costly (one syscall per unlock) in > applications where mutex is used to protect state that's frequently > modified but where the predicate associated with the cv only rarely > changes (and thus signaling is rare and cv waiters wait around a long > time). In what's arguably the common case (a reasonable number of > waiters as opposed to thousands of waiters on a 4-core box) just > waking all waiters on broadcast would be a lot less expensive. > > Thus I'm skeptical of trying an approach like this when it would be > easier, and likely less costly on the common usage cases, just to > remove requeue and always use broadcast wakes. I modified your test > case for the bug to use a process-shared cv (using broadcast wake), > and as expected, the test runs with no failure. A really ugly hack that might solve the problem: adaptively switching to a less efficient mode the first time a different mutex is used. It could either switch to pre-moving wait counts to the mutex, or revert to broadcast wakes. Rich