From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5836 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: My current understanding of cond var access restrictions Date: Thu, 14 Aug 2014 02:10:09 -0400 Message-ID: <20140814061009.GA6599@brightrain.aerifal.cx> References: <20140813212358.GA25429@brightrain.aerifal.cx> <1407972025.4951.73.camel@eris.loria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1407996629 3257 80.91.229.3 (14 Aug 2014 06:10:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 14 Aug 2014 06:10:29 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5842-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 14 08:10:23 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XHoEw-0007y9-RY for gllmg-musl@plane.gmane.org; Thu, 14 Aug 2014 08:10:22 +0200 Original-Received: (qmail 7795 invoked by uid 550); 14 Aug 2014 06:10:21 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 7787 invoked from network); 14 Aug 2014 06:10:21 -0000 Content-Disposition: inline In-Reply-To: <1407972025.4951.73.camel@eris.loria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5836 Archived-At: On Thu, Aug 14, 2014 at 01:20:25AM +0200, Jens Gustedt wrote: > > 4. When can signal and broadcast safely use the mutex? > > > > Not at all, unless it can block waiters from exiting the wait. Any > > waiter could spontaneously exit the wait as a result of cancellation, > > timeout, or a cv signal from another thread, and by the above, it may > > be entitled to destroy the mutex. > > Are you suggesting that all waiters when coming back should first > regain an internal lock on the cv? I think I have an informal proof sketch that this is necessary unless we abandon requeue: If we want to be able to use the mutex in broadcast, which is needed for requeue, then broadcast needs a lock that can block at least one waiter from returning, and needs to confirm that at least one waiter remains after the lock is obtained (otherwise it's easy -- there's no work to do), so that the mutex is valid. (Note: Broadcast can immediately release this lock if it determines that the calling thread holds the mutex, since in this case, the mutex will be sufficient to prevent any waiter from returning. But in general it needs to hold the lock until requeue is performed.) In order for this lock to block waiters from returning, any waiter that woke possibly not under the control of broadcast/signal (i.e. futex wait not returning 0) has to obtain the lock. (For safety against application use of futexes that generates spurious wakes, it might be best to just ignore the return value and always attempt to get the lock.) This probably means it has to access the cv object (unless it uses an object at another location whose address was obtained before waiting), which in turn means that we have to track references so that destroy can wait for all references to be released before returning. So I think we're stuck with something like the current implementation, or abandoning requeue and just doing private cond vars the same as process-shared ones. This is actually somewhat reassuring -- it means I wasn't completely insane when I came up with the current implementation a couple years back. Or at least, if I was, the insane line of reasoning is at least reproducible. :-) With that in mind, I'd like to look for ways we can fix the bogus waiter accounting for the mutex that seems to be the source of the bug you found. One "obvious" (but maybe bad/wrong?) solution would be to put the count on the mutex at the time of waiting (rather than moving it there as part of broadcast), so that decrementing the mutex waiter count is always the right thing to do in unwait. Of course this possibly results in lots of spurious futex wakes to the mutex (every time it's unlocked while there are waiters on the cv, which could be a lot). It would be nice if we had a separate field in the mutex (rather than in the cv, as it is now) to store these on, and only move them to the active waiters count at broadcast time, but I don't see any way to get additional space in the mutex structure for this -- it's full. > > 5. When can [timed]wait safely access the cv? > > > > Only before unlocking the mutex, unless the implementation > > synchronizes with possible signaling threads, or with destruction (and > > possibly unmapping). Otherwise, per the above, it's possible that a > > signaling thread destroys the cv. > > so again this suggests an internal lock on the cv that would be used > to synchronize between waiters and wakers? This argument applies even to process-shared cv's, and for them, no allocation is possible, and I don't see a really good way to solve the unmapping issue -- I think broadcast/signal would have to block unmapping, and the last waiter to wake up would have to unblock it. Maybe that's the right solution? Rich