From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13163 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Planned robust mutex internals changes Date: Fri, 31 Aug 2018 12:45:12 -0400 Message-ID: <20180831164512.GV1878@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1535733804 23558 195.159.176.226 (31 Aug 2018 16:43:24 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 31 Aug 2018 16:43:24 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13179-gllmg-musl=m.gmane.org@lists.openwall.com Fri Aug 31 18:43:20 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1fvmVy-00061N-Az for gllmg-musl@m.gmane.org; Fri, 31 Aug 2018 18:43:18 +0200 Original-Received: (qmail 5711 invoked by uid 550); 31 Aug 2018 16:45:26 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 5677 invoked from network); 31 Aug 2018 16:45:25 -0000 Content-Disposition: inline Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13163 Archived-At: Recent debugging with Adelie Linux & POSIX conformance tests gave me a bit of a scare about the safety of the current robust mutex implementation. It turned out the be completely unrelated (kernel problem) but it did remind me that there's some mildly sketchy stuff going on right now. In order to implement ENOTRECOVERABLE, the current implementation changes a bit of the mutex type field to indicate that it's recovered after EOWNERDEAD and will go into ENOTRECOVERABLE state if pthread_mutex_consistent is not called before unlocking. While it's only the thread that holds the lock that needs access to this information (except possibly for the same of pthread_mutex_consistent choosing between EINVAL and EPERM for erroneous calls), the change to the type field is formally a data race with all other threads that perform any operation on the mutex. No individual bits race, and no write races are possible, so things are "okay" in some sense, but it's still not good. What I plan to do is move this state to the mutex owner/lock field, which should be the only field of the object (aside from waiter count) that's mutable. Currently, the lock field looks like this: bits 0-29: owner; 0 if unlocked or if owned died; -1 if unrecoverable bit 30: owner died flag, also set if unrecoverable; otherwise 0 bit 31: new waiters flag Ignoring bit 31, this means the possible values are 0 (unlocked), a tid for the owner, 0x40000000 after owner died, and 0x7fffffff when unrecoverable. What I'd like too do is use 0x40000000|tid as the value for when the lock is held but pthread_mutex_consistent hasn't yet been called. Note that at the kernel level, bit 29 (0x20000000) is reserved not to be used as part of a tid, so this does not overlap with the special value 0x7fffffff. And there's still some room for extensibility, since we could use bit 31 along with values that don't indicate a mutex state that can be waited on. With these changes, no fields of mutex object will be mutable except for the lock state (owner, futex) and the waiter count. Anyway I'm posting this now as notes/reminder and for comments in case others see a problem with the proposal or ideas for improvement. I'll probably hold off on doing any of this until after release. Rich