mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [musl] Bug with priority inheritance and condition variables
Date: Mon, 26 Oct 2020 15:49:05 -0400	[thread overview]
Message-ID: <20201026194905.GM534@brightrain.aerifal.cx> (raw)
In-Reply-To: <20200924161407.GL3265@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 3258 bytes --]

On Thu, Sep 24, 2020 at 12:14:07PM -0400, Rich Felker wrote:
> On Thu, Sep 24, 2020 at 03:58:17PM +0100, Edward Scott wrote:
> > Hello,
> > 
> > There appears to be a bug when using priority inheritance in combination
> > with condition variables. I have some code that reproduces the bug:
> > 
> > https://github.com/edward-scott/musl-prio-inherit-cv-bug
> > 
> > Using git bisect I traced the origin of the bug to this commit:
> > 
> > https://git.musl-libc.org/cgit/musl/commit/?id=54ca677983d47529bab8752315ac1a2b49888870
> > 
> > which is the commit that is described as "implement priority inheritance
> > mutexes".
> > 
> > From my analysis it appears that _m_waiters is used by the
> > priority inheritance logic to maintain some state (as described in the
> > commit message) but that conflicts with some use of _m_waiters in the
> > condition variable implementation.
> 
> I think this is entirely correct analysis. Thanks for catching this!
> 
> > The consequence is that pthread_mutex_lock erroneously returns EDEADLK.
> 
> OK, it took me a second to understand this part, because I thought it
> would be ENOTRECOVERABLE, but that's only for robust+PI mutexes.
> EDEADLK seems to be a consequence of succeeding but returning EBUSY,
> which is "wrong" but should only be able to happen with inconsistent
> state, as produced by pthread_cond_timedwait.
> 
> > I don't understand the code well enough to produce a fix.
> 
> I'll take a look. I'd like to just drop adusting the waiters count
> here and instead set the bit-31 may-have-waiters flag here, but I'm
> not sure that's right for all mutex types. It certainly can be made to
> do that just on PI mutexes if needed but having fewer special cases is
> preferable.
> 
> > The demo code (a cut version of some production code) will reproduce the
> > failure. Commenting out the pthread_mutexattr_setprotocol call in
> > the iot_mutex_init function at the end of the thread.c file will cause the
> > code to work as intended (without priority inheritance). The code works
> > fine either way with the GNU lib.
> > 
> > BTW can I recommend that the "magic numbers"  used to represent mutex modes
> > be replaced at some point with defined constants as it would make the code
> > much easier to follow.
> 
> Yes, it's been something I kinda wanted to do, but that would have
> obfuscated and cluttered the actual changes in development when it was
> being done. It might be time to go back and add some now that this
> code is mature.
> 
> > This is my first post to this list so I hope this message is on the right
> > list and is helpful.
> 
> Yep, this is fine. Thanks again!

This took a while to get to, but here's my proposed patch. It drops
all waiters modification in favor of setting the "may have waiters"
flag whenever there's another waiter to be woken. At the time this is
done, the calling thread holds the mutex (except on error re-locking
it, but then the mutex is non-recoverable or else UB occurred), and
setting the flag guarantees it will perform a wake when it eventually
unlocks it.

With the patch applied, your test program gets further along but still
hangs. I think the problem is the #if 0 block in threadpool.c; with
that changed to #if 1, it runs to completion.

Rich

[-- Attachment #2: cv_with_pi_mutex.diff --]
[-- Type: text/plain, Size: 757 bytes --]

diff --git a/src/thread/pthread_cond_timedwait.c b/src/thread/pthread_cond_timedwait.c
index d1501240..02858f7d 100644
--- a/src/thread/pthread_cond_timedwait.c
+++ b/src/thread/pthread_cond_timedwait.c
@@ -146,14 +146,13 @@ relock:
 
 	if (oldstate == WAITING) goto done;
 
-	if (!node.next) a_inc(&m->_m_waiters);
-
 	/* Unlock the barrier that's holding back the next waiter, and
 	 * either wake it or requeue it to the mutex. */
-	if (node.prev)
+	if (node.prev) {
+		int val = m->_m_lock;
+		if (val>0) a_cas(&m->_m_lock, val, val|0x80000000);
 		unlock_requeue(&node.prev->barrier, &m->_m_lock, m->_m_type & 128);
-	else
-		a_dec(&m->_m_waiters);
+	}
 
 	/* Since a signal was consumed, cancellation is not permitted. */
 	if (e == ECANCELED) e = 0;

      reply	other threads:[~2020-10-26 19:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-24 14:58 Edward Scott
2020-09-24 16:14 ` Rich Felker
2020-10-26 19:49   ` Rich Felker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201026194905.GM534@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).