From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5842 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: My current understanding of cond var access restrictions Date: Thu, 14 Aug 2014 10:41:10 -0400 Message-ID: <20140814144110.GY12888@brightrain.aerifal.cx> References: <20140813212358.GA25429@brightrain.aerifal.cx> <1407972025.4951.73.camel@eris.loria.fr> <20140814061009.GA6599@brightrain.aerifal.cx> <1408003204.4951.92.camel@eris.loria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1408027291 6248 80.91.229.3 (14 Aug 2014 14:41:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 14 Aug 2014 14:41:31 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5848-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 14 16:41:24 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XHwDT-0000CX-NK for gllmg-musl@plane.gmane.org; Thu, 14 Aug 2014 16:41:23 +0200 Original-Received: (qmail 15828 invoked by uid 550); 14 Aug 2014 14:41:23 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 15814 invoked from network); 14 Aug 2014 14:41:22 -0000 Content-Disposition: inline In-Reply-To: <1408003204.4951.92.camel@eris.loria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5842 Archived-At: On Thu, Aug 14, 2014 at 10:00:04AM +0200, Jens Gustedt wrote: > Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker: > > I think I have an informal proof sketch that this is necessary unless > > we abandon requeue: > > > ... > > > With that in mind, I'd like to look for ways we can fix the bogus > > waiter accounting for the mutex that seems to be the source of the bug > > you found. One "obvious" (but maybe bad/wrong?) solution would be to > > put the count on the mutex at the time of waiting (rather than moving > > it there as part of broadcast), so that decrementing the mutex waiter > > count is always the right thing to do in unwait. > > sounds like a good idea, at least for correctness > > > Of course this > > possibly results in lots of spurious futex wakes to the mutex (every > > time it's unlocked while there are waiters on the cv, which could be a > > lot). > > I we'd be more careful in not spreading too much wakes where we > shouldn't, there would perhaps not be "a lot" of such wakeups. Well this is different from the wake-after-release that you dislike. It's a wake on a necessarily-valid object that just doesn't have any actual waiters right now because its potential-waiters are still waiting on the cv. However I think it may be costly (one syscall per unlock) in applications where mutex is used to protect state that's frequently modified but where the predicate associated with the cv only rarely changes (and thus signaling is rare and cv waiters wait around a long time). In what's arguably the common case (a reasonable number of waiters as opposed to thousands of waiters on a 4-core box) just waking all waiters on broadcast would be a lot less expensive. Thus I'm skeptical of trying an approach like this when it would be easier, and likely less costly on the common usage cases, just to remove requeue and always use broadcast wakes. I modified your test case for the bug to use a process-shared cv (using broadcast wake), and as expected, the test runs with no failure. > > It would be nice if we had a separate field in the mutex (rather > > than in the cv, as it is now) to store these on, and only move them to > > the active waiters count at broadcast time, but I don't see any way to > > get additional space in the mutex structure for this -- it's full. > > I thought of such designs, too, but one major problem (besides the > space) with it is that a mutex can be used by several cv at a time. Yes. It would improve the situation versus the above, but not eliminate it, since in the case where a mutex is used with multiple cv's, a broadcast on one of the cv's would move the entire wait count to the active wait count. > > > > 5. When can [timed]wait safely access the cv? > > > > > > > > Only before unlocking the mutex, unless the implementation > > > > synchronizes with possible signaling threads, or with destruction (and > > > > possibly unmapping). Otherwise, per the above, it's possible that a > > > > signaling thread destroys the cv. > > > > > > so again this suggests an internal lock on the cv that would be used > > > to synchronize between waiters and wakers? > > > > This argument applies even to process-shared cv's, and for them, no > > allocation is possible, > > at least difficult, for sure > > this would need support to allocate some object in the kernel and to > use that object shared between processes :( And as I've mentioned before, this is presently not possible due to security considerations: there's no way to make an object for which the set of processes which can access it exactly matches the set which can access the shared memory the cv lies in. This could be done with a new futex command which returned the object using the memory address as a key, but it would be heavy and ugly and not compatible with any existing systems (so not appropriate for a mandatory feature). Rich