mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
Date: Mon, 9 Sep 2019 13:49:43 -0400	[thread overview]
Message-ID: <20190909174943.GN9017@brightrain.aerifal.cx> (raw)
In-Reply-To: <20190909145429.GG22009@port70.net>

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

On Mon, Sep 09, 2019 at 04:54:29PM +0200, Szabolcs Nagy wrote:
> * zhaohang (F) <zhaohang14@huawei.com> [2019-09-09 13:57:36 +0000]:
> > diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> > index 7d4dc2e..ae08c0f 100644
> > --- a/src/thread/pthread_create.c
> > +++ b/src/thread/pthread_create.c
> > @@ -181,15 +181,8 @@ static int start(void *p)
> >  {
> >         struct start_args *args = p;
> >         if (args->attr) {
> > -               pthread_t self = __pthread_self();
> > -               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
> > -                       args->attr->_a_policy, &args->attr->_a_prio);
> > -               if (a_swap(args->perr, ret)==-2)
> > -                       __wake(args->perr, 1, 1);
> > -               if (ret) {
> > -                       self->detach_state = DT_DETACHED;
> > -                       __pthread_exit(0);
> > -               }
> > +               if (a_cas(args->perr, -1, -2) == -1)
> > +                       __wait(args->perr, 0, -2, 1);
> >         }
> >         __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
> >         __pthread_exit(args->start_func(args->start_arg));
> > @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
> >         }
> > 
> >         if (attr._a_sched) {
> > -               if (a_cas(&err, -1, -2)==-1)
> > -                       __wait(&err, 0, -2, 1);
> > -               ret = err;
> > -               if (ret) return ret;
> > +               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
> > +               if (ret) {
> > +                       new->detach_state = DT_DETACHED;
> > +                       pthread_cancel(new);
> > +                       return ret;
> 
> the child has the cancel signal blocked so it will never act on the signal.

Also, pthread_create should not pull in cancellation. Aside from being
unnecessary amounts of code that increases lots of costs in static
linking (for example, cancellable syscall paths have to be used),
there's no reason to use cancellation for something like this where
it's not trying to work with arbitrary application code, just a fixed
piece of code that admits explicit negotiation of how to continue.

> but even if that's fixed, the detached child may not get scheduled to
> handle the signal for a long time and will take up stack/tid resources.

That's the side issue I noted which my third patch fixes.

> i think Rich already has a solution that will deal with these issues.

Yes, sorry for not posting it sooner. Attached are the drafts that I
plan to push soon. (If you see something wrong and they've already
been pushed, just let me know and I'll fix it.) Patch 2 is the one
that addresses the issue reported here.

Rich

[-- Attachment #2: 0001-fix-unsynchronized-decrement-of-thread-count-on-pthr.patch --]
[-- Type: text/plain, Size: 1127 bytes --]

From dd0a23dd9e157624347fdcc9dca452675b93da70 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Fri, 6 Sep 2019 15:52:00 -0400
Subject: [PATCH 1/4] fix unsynchronized decrement of thread count on
 pthread_create error

commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af wrongly documented
that all changes to libc.threads_minus_1 were guarded by the thread
list lock, but the decrement for failed SYS_clone took place after the
thread list lock was released.
---
 src/thread/pthread_create.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index ebf61ded..edaf9a6e 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -356,13 +356,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 		new->prev = self;
 		new->next->prev = new;
 		new->prev->next = new;
+	} else {
+		libc.threads_minus_1--;
 	}
 	__tl_unlock();
 	__restore_sigs(&set);
 	__release_ptc();
 
 	if (ret < 0) {
-		libc.threads_minus_1--;
 		if (map) __munmap(map, size);
 		return EAGAIN;
 	}
-- 
2.21.0


[-- Attachment #3: 0002-set-explicit-scheduling-for-new-thread-from-calling-.patch --]
[-- Type: text/plain, Size: 3296 bytes --]

From 022f27d541af68f047d367ad9d93f941135555c1 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Fri, 6 Sep 2019 15:26:44 -0400
Subject: [PATCH 2/4] set explicit scheduling for new thread from calling
 thread, not self

if setting scheduling properties succeeds, the new thread may end up
with lower priority than the caller, and may be unable to continue
running due to another intermediate-priority thread. this produces a
priority inversion situation for the thread calling pthread_create,
since it cannot return until the new thread reports success.

originally, the parent was responsible for setting the new thread's
priority; commits b8742f32602add243ee2ce74d804015463726899 and
40bae2d32fd6f3ffea437fa745ad38a1fe77b27e changed it as part of
trimming down the pthread structure. since then, commit
04335d9260c076cf4d9264bd93dd3b06c237a639 partly reversed the changes,
but did not switch responsibilities back. do that now.
---
 src/thread/pthread_create.c | 33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index edaf9a6e..edcdf041 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -172,22 +172,19 @@ void __do_cleanup_pop(struct __ptcb *cb)
 struct start_args {
 	void *(*start_func)(void *);
 	void *start_arg;
-	pthread_attr_t *attr;
-	volatile int *perr;
+	volatile int control;
 	unsigned long sig_mask[_NSIG/8/sizeof(long)];
 };
 
 static int start(void *p)
 {
 	struct start_args *args = p;
-	if (args->attr) {
-		pthread_t self = __pthread_self();
-		int ret = -__syscall(SYS_sched_setscheduler, self->tid,
-			args->attr->_a_policy, &args->attr->_a_prio);
-		if (a_swap(args->perr, ret)==-2)
-			__wake(args->perr, 1, 1);
-		if (ret) {
-			self->detach_state = DT_DETACHED;
+	int state = args->control;
+	if (state) {
+		if (a_cas(&args->control, 1, 2)==1)
+			__wait(&args->control, 0, 2, 1);
+		if (args->control) {
+			__pthread_self()->detach_state = DT_DETACHED;
 			__pthread_exit(0);
 		}
 	}
@@ -233,7 +230,6 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 		| CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_DETACHED;
 	pthread_attr_t attr = { 0 };
 	sigset_t set;
-	volatile int err = -1;
 
 	if (!libc.can_do_threads) return ENOSYS;
 	self = __pthread_self();
@@ -325,13 +321,7 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	struct start_args *args = (void *)stack;
 	args->start_func = entry;
 	args->start_arg = arg;
-	if (attr._a_sched) {
-		args->attr = &attr;
-		args->perr = &err;
-	} else {
-		args->attr = 0;
-		args->perr = 0;
-	}
+	args->control = attr._a_sched ? 1 : 0;
 
 	/* Application signals (but not the synccall signal) must be
 	 * blocked before the thread list lock can be taken, to ensure
@@ -369,9 +359,10 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	}
 
 	if (attr._a_sched) {
-		if (a_cas(&err, -1, -2)==-1)
-			__wait(&err, 0, -2, 1);
-		ret = err;
+		int ret = -__syscall(SYS_sched_setscheduler, new->tid,
+			attr._a_policy, &attr._a_prio);
+		if (a_swap(&args->control, ret ? 3 : 0)==2)
+			__wake(&args->control, 1, 1);
 		if (ret) return ret;
 	}
 
-- 
2.21.0


[-- Attachment #4: 0003-synchronously-clean-up-pthread_create-failure-due-to.patch --]
[-- Type: text/plain, Size: 3257 bytes --]

From 8a544ee3a2a75af278145b09531177cab4939b41 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Fri, 6 Sep 2019 16:17:44 -0400
Subject: [PATCH 3/4] synchronously clean up pthread_create failure due to
 scheduling errors

previously, when pthread_create failed due to inability to set
explicit scheduling according to the requested attributes, the nascent
thread was detached and made responsible for its own cleanup via the
standard pthread_exit code path. this left it consuming resources
potentially well after pthread_create returned, in a way that the
application could not see or mitigate, and unnecessarily exposed its
existence to the rest of the implementation via the global thread
list.

instead, attempt explicit scheduling early and reuse the failure path
for __clone failure if it fails. the nascent thread's exit futex is
not needed for unlocking the thread list, since the thread calling
pthread_create holds the thread list lock the whole time, so it can be
repurposed to ensure the thread has finished exiting. no pthread_exit
is needed, and freeing the stack, if needed, can happen just as it
would if __clone failed.
---
 src/thread/pthread_create.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index edcdf041..5d00d765 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -184,8 +184,8 @@ static int start(void *p)
 		if (a_cas(&args->control, 1, 2)==1)
 			__wait(&args->control, 0, 2, 1);
 		if (args->control) {
-			__pthread_self()->detach_state = DT_DETACHED;
-			__pthread_exit(0);
+			__syscall(SYS_set_tid_address, &args->control);
+			return 0;
 		}
 	}
 	__syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
@@ -339,8 +339,21 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	libc.threads_minus_1++;
 	ret = __clone((c11 ? start_c11 : start), stack, flags, args, &new->tid, TP_ADJ(new), &__thread_list_lock);
 
-	/* If clone succeeded, new thread must be linked on the thread
-	 * list before unlocking it, even if scheduling may still fail. */
+	/* All clone failures translate to EAGAIN. If explicit scheduling
+	 * was requested, attempt it before unlocking the thread list so
+	 * that the failed thread is never exposed and so that we can
+	 * clean up all transient resource usage before returning. */
+	if (ret < 0) {
+		ret = -EAGAIN;
+	} else if (attr._a_sched) {
+		ret = __syscall(SYS_sched_setscheduler,
+			new->tid, attr._a_policy, &attr._a_prio);
+		if (a_swap(&args->control, ret ? 3 : 0)==2)
+			__wake(&args->control, 1, 1);
+		if (ret)
+			__wait(&args->control, 0, 3, 0);
+	}
+
 	if (ret >= 0) {
 		new->next = self->next;
 		new->prev = self;
@@ -355,15 +368,7 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 
 	if (ret < 0) {
 		if (map) __munmap(map, size);
-		return EAGAIN;
-	}
-
-	if (attr._a_sched) {
-		int ret = -__syscall(SYS_sched_setscheduler, new->tid,
-			attr._a_policy, &attr._a_prio);
-		if (a_swap(&args->control, ret ? 3 : 0)==2)
-			__wake(&args->control, 1, 1);
-		if (ret) return ret;
+		return -ret;
 	}
 
 	*res = new;
-- 
2.21.0


  reply	other threads:[~2019-09-09 17:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09 13:57 zhaohang (F)
2019-09-09 14:01 ` 答复: " zhaohang (F)
2019-09-09 14:54 ` Szabolcs Nagy
2019-09-09 17:49   ` Rich Felker [this message]
2019-09-11 13:38     ` 答复: [musl] " zhaohang (F)
2019-09-11 13:52       ` Rich Felker
2019-09-11 17:29         ` Rich Felker
2019-09-16  2:27           ` 答复: [musl] " zhaohang (F)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190909174943.GN9017@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).