From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 32754 invoked from network); 24 Sep 2020 16:14:23 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 24 Sep 2020 16:14:23 -0000 Received: (qmail 29971 invoked by uid 550); 24 Sep 2020 16:14:21 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 29953 invoked from network); 24 Sep 2020 16:14:20 -0000 Date: Thu, 24 Sep 2020 12:14:07 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200924161407.GL3265@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Bug with priority inheritance and condition variables On Thu, Sep 24, 2020 at 03:58:17PM +0100, Edward Scott wrote: > Hello, > > There appears to be a bug when using priority inheritance in combination > with condition variables. I have some code that reproduces the bug: > > https://github.com/edward-scott/musl-prio-inherit-cv-bug > > Using git bisect I traced the origin of the bug to this commit: > > https://git.musl-libc.org/cgit/musl/commit/?id=54ca677983d47529bab8752315ac1a2b49888870 > > which is the commit that is described as "implement priority inheritance > mutexes". > > From my analysis it appears that _m_waiters is used by the > priority inheritance logic to maintain some state (as described in the > commit message) but that conflicts with some use of _m_waiters in the > condition variable implementation. I think this is entirely correct analysis. Thanks for catching this! > The consequence is that pthread_mutex_lock erroneously returns EDEADLK. OK, it took me a second to understand this part, because I thought it would be ENOTRECOVERABLE, but that's only for robust+PI mutexes. EDEADLK seems to be a consequence of succeeding but returning EBUSY, which is "wrong" but should only be able to happen with inconsistent state, as produced by pthread_cond_timedwait. > I don't understand the code well enough to produce a fix. I'll take a look. I'd like to just drop adusting the waiters count here and instead set the bit-31 may-have-waiters flag here, but I'm not sure that's right for all mutex types. It certainly can be made to do that just on PI mutexes if needed but having fewer special cases is preferable. > The demo code (a cut version of some production code) will reproduce the > failure. Commenting out the pthread_mutexattr_setprotocol call in > the iot_mutex_init function at the end of the thread.c file will cause the > code to work as intended (without priority inheritance). The code works > fine either way with the GNU lib. > > BTW can I recommend that the "magic numbers" used to represent mutex modes > be replaced at some point with defined constants as it would make the code > much easier to follow. Yes, it's been something I kinda wanted to do, but that would have obfuscated and cluttered the actual changes in development when it was being done. It might be time to go back and add some now that this code is mature. > This is my first post to this list so I hope this message is on the right > list and is helpful. Yep, this is fine. Thanks again! Rich