mailing list of musl libc
 help / color / mirror / code / Atom feed
* Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
@ 2019-09-09 13:57 zhaohang (F)
  2019-09-09 14:01 ` 答复: " zhaohang (F)
  2019-09-09 14:54 ` Szabolcs Nagy
  0 siblings, 2 replies; 8+ messages in thread
From: zhaohang (F) @ 2019-09-09 13:57 UTC (permalink / raw)
  To: musl

Thanks for your reply. Maybe this patch may work by switching to having the new thread just wait for the parent to tell it whether priority setup succeeded.

Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority
 inversion

--------

In pthread_create, caller wait for new thread to set its prio, and
then new thread wake caller up. It may cause priority inversion when
caller waits for new thread and the new thread waits for another thread
whose prio is lower than caller.

Just let caller set prio and affinity of new thread to fix it.

Signed-off-by: Zhao Hang <zhaohang14@huawei.com>
---
 src/thread/pthread_create.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 7d4dc2e..ae08c0f 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -181,15 +181,8 @@ static int start(void *p)
 {
        struct start_args *args = p;
        if (args->attr) {
-               pthread_t self = __pthread_self();
-               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
-                       args->attr->_a_policy, &args->attr->_a_prio);
-               if (a_swap(args->perr, ret)==-2)
-                       __wake(args->perr, 1, 1);
-               if (ret) {
-                       self->detach_state = DT_DETACHED;
-                       __pthread_exit(0);
-               }
+               if (a_cas(args->perr, -1, -2) == -1)
+                       __wait(args->perr, 0, -2, 1);
        }
        __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
        __pthread_exit(args->start_func(args->start_arg));
@@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
        }

        if (attr._a_sched) {
-               if (a_cas(&err, -1, -2)==-1)
-                       __wait(&err, 0, -2, 1);
-               ret = err;
-               if (ret) return ret;
+               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
+               if (ret) {
+                       new->detach_state = DT_DETACHED;
+                       pthread_cancel(new);
+                       return ret;
+               }
+               if (a_swap(&err, 0) == -2)
+                       __wake(&err, 1, 1);
        }

        *res = new;
--
2.7.4

-----邮件原件-----
发件人: Rich Felker [mailto:dalias@aerifal.cx] 代表 Rich Felker
发送时间: 2019年9月5日 21:34
收件人: zhaohang (F) <zhaohang14@huawei.com>
抄送: musl@lists.openwall.com
主题: Re: [musl] src/thread/pthread_create: Why prio of child thread is set by himself

On Thu, Sep 05, 2019 at 02:14:36AM +0000, zhaohang (F) wrote:
> In the function pthread_create, father thread will wait child if 
> attr._a_sched is set, after SYS_clone is finished.Child thread will 
> set his prio in entry 'start', and then wake father thread to 
> continue.
> 
> But consider this kind of situation, there are three threads: A with 
> prio 51, B with prio 30, and C with prio 20 created by A, and there is 
> only simplest sched policy 'FIFO'.
> 
> When system starts, A is running because A is higher than B, then A 
> uses pthread_create to create C. After C is cloned, A wait for C to 
> set prio and wake him up, but after C set his prio to 20, B will be 
> sched. And if B won't exit, A and C will never get sched, even if A is 
> higher than B. Maybe this is a kind of priority inversion.
> 
> So why prio of child is set by himself rather than father? If prio of 
> child is set by father, something will go wrong? Or other 
> considerations?

I think you're correct in your analysis of this problem; I'm going to look at it more in a bit to make sure.

Originally, pthread_create (in the caller) was responsible for setting priority; this changed in b8742f32602add243ee2ce74d804015463726899 and 40bae2d32fd6f3ffea437fa745ad38a1fe77b27e as part of trying to trim down the pthread structure and get init-time-only junk out of it.
However, 04335d9260c076cf4d9264bd93dd3b06c237a639 largely undid that already, and moved the extra start args to a struct on the new thread's stack so that it doesn't contribute to size/clutter in struct pthread. It should be easy to switch back to having the new thread just wait for the parent to tell it whether priority setup succeeded.

One related issue this also turned up is that exiting in detached state is probably a bad idea. Depending on priorities, the thread that failed to start could linger for a long time after pthread_create returns, potentially causing spurious transient resource exhaustion with no way to wait for it to subside. At some point we should probably switch from forcing detached exit to forcing joinable (or equivalent; forcing linking of pthread_join code is somewhat
undesirable) exit so that when a failed pthread_create returns it's not consuming any kernel task resources.

Thanks for the report.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* 答复: Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
  2019-09-09 13:57 Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion zhaohang (F)
@ 2019-09-09 14:01 ` zhaohang (F)
  2019-09-09 14:54 ` Szabolcs Nagy
  1 sibling, 0 replies; 8+ messages in thread
From: zhaohang (F) @ 2019-09-09 14:01 UTC (permalink / raw)
  To: musl

The version used by me is v1.1.22.

-----邮件原件-----
发件人: zhaohang (F) [mailto:zhaohang14@huawei.com] 
发送时间: 2019年9月9日 21:58
收件人: musl@lists.openwall.com
主题: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion

Thanks for your reply. Maybe this patch may work by switching to having the new thread just wait for the parent to tell it whether priority setup succeeded.

Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority  inversion

--------

In pthread_create, caller wait for new thread to set its prio, and then new thread wake caller up. It may cause priority inversion when caller waits for new thread and the new thread waits for another thread whose prio is lower than caller.

Just let caller set prio and affinity of new thread to fix it.

Signed-off-by: Zhao Hang <zhaohang14@huawei.com>
---
 src/thread/pthread_create.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index 7d4dc2e..ae08c0f 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -181,15 +181,8 @@ static int start(void *p)  {
        struct start_args *args = p;
        if (args->attr) {
-               pthread_t self = __pthread_self();
-               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
-                       args->attr->_a_policy, &args->attr->_a_prio);
-               if (a_swap(args->perr, ret)==-2)
-                       __wake(args->perr, 1, 1);
-               if (ret) {
-                       self->detach_state = DT_DETACHED;
-                       __pthread_exit(0);
-               }
+               if (a_cas(args->perr, -1, -2) == -1)
+                       __wait(args->perr, 0, -2, 1);
        }
        __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
        __pthread_exit(args->start_func(args->start_arg));
@@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
        }

        if (attr._a_sched) {
-               if (a_cas(&err, -1, -2)==-1)
-                       __wait(&err, 0, -2, 1);
-               ret = err;
-               if (ret) return ret;
+               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
+               if (ret) {
+                       new->detach_state = DT_DETACHED;
+                       pthread_cancel(new);
+                       return ret;
+               }
+               if (a_swap(&err, 0) == -2)
+                       __wake(&err, 1, 1);
        }

        *res = new;
--
2.7.4

-----邮件原件-----
发件人: Rich Felker [mailto:dalias@aerifal.cx] 代表 Rich Felker
发送时间: 2019年9月5日 21:34
收件人: zhaohang (F) <zhaohang14@huawei.com>
抄送: musl@lists.openwall.com
主题: Re: [musl] src/thread/pthread_create: Why prio of child thread is set by himself

On Thu, Sep 05, 2019 at 02:14:36AM +0000, zhaohang (F) wrote:
> In the function pthread_create, father thread will wait child if 
> attr._a_sched is set, after SYS_clone is finished.Child thread will 
> set his prio in entry 'start', and then wake father thread to 
> continue.
> 
> But consider this kind of situation, there are three threads: A with 
> prio 51, B with prio 30, and C with prio 20 created by A, and there is 
> only simplest sched policy 'FIFO'.
> 
> When system starts, A is running because A is higher than B, then A 
> uses pthread_create to create C. After C is cloned, A wait for C to 
> set prio and wake him up, but after C set his prio to 20, B will be 
> sched. And if B won't exit, A and C will never get sched, even if A is 
> higher than B. Maybe this is a kind of priority inversion.
> 
> So why prio of child is set by himself rather than father? If prio of 
> child is set by father, something will go wrong? Or other 
> considerations?

I think you're correct in your analysis of this problem; I'm going to look at it more in a bit to make sure.

Originally, pthread_create (in the caller) was responsible for setting priority; this changed in b8742f32602add243ee2ce74d804015463726899 and 40bae2d32fd6f3ffea437fa745ad38a1fe77b27e as part of trying to trim down the pthread structure and get init-time-only junk out of it.
However, 04335d9260c076cf4d9264bd93dd3b06c237a639 largely undid that already, and moved the extra start args to a struct on the new thread's stack so that it doesn't contribute to size/clutter in struct pthread. It should be easy to switch back to having the new thread just wait for the parent to tell it whether priority setup succeeded.

One related issue this also turned up is that exiting in detached state is probably a bad idea. Depending on priorities, the thread that failed to start could linger for a long time after pthread_create returns, potentially causing spurious transient resource exhaustion with no way to wait for it to subside. At some point we should probably switch from forcing detached exit to forcing joinable (or equivalent; forcing linking of pthread_join code is somewhat
undesirable) exit so that when a failed pthread_create returns it's not consuming any kernel task resources.

Thanks for the report.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
  2019-09-09 13:57 Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion zhaohang (F)
  2019-09-09 14:01 ` 答复: " zhaohang (F)
@ 2019-09-09 14:54 ` Szabolcs Nagy
  2019-09-09 17:49   ` Rich Felker
  1 sibling, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2019-09-09 14:54 UTC (permalink / raw)
  To: musl

* zhaohang (F) <zhaohang14@huawei.com> [2019-09-09 13:57:36 +0000]:
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index 7d4dc2e..ae08c0f 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -181,15 +181,8 @@ static int start(void *p)
>  {
>         struct start_args *args = p;
>         if (args->attr) {
> -               pthread_t self = __pthread_self();
> -               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
> -                       args->attr->_a_policy, &args->attr->_a_prio);
> -               if (a_swap(args->perr, ret)==-2)
> -                       __wake(args->perr, 1, 1);
> -               if (ret) {
> -                       self->detach_state = DT_DETACHED;
> -                       __pthread_exit(0);
> -               }
> +               if (a_cas(args->perr, -1, -2) == -1)
> +                       __wait(args->perr, 0, -2, 1);
>         }
>         __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
>         __pthread_exit(args->start_func(args->start_arg));
> @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
>         }
> 
>         if (attr._a_sched) {
> -               if (a_cas(&err, -1, -2)==-1)
> -                       __wait(&err, 0, -2, 1);
> -               ret = err;
> -               if (ret) return ret;
> +               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
> +               if (ret) {
> +                       new->detach_state = DT_DETACHED;
> +                       pthread_cancel(new);
> +                       return ret;

the child has the cancel signal blocked so it will never act on the signal.

but even if that's fixed, the detached child may not get scheduled to
handle the signal for a long time and will take up stack/tid resources.

i think Rich already has a solution that will deal with these issues.

> +               }
> +               if (a_swap(&err, 0) == -2)
> +                       __wake(&err, 1, 1);
>         }
> 
>         *res = new;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
  2019-09-09 14:54 ` Szabolcs Nagy
@ 2019-09-09 17:49   ` Rich Felker
  2019-09-11 13:38     ` 答复: [musl] " zhaohang (F)
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2019-09-09 17:49 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

On Mon, Sep 09, 2019 at 04:54:29PM +0200, Szabolcs Nagy wrote:
> * zhaohang (F) <zhaohang14@huawei.com> [2019-09-09 13:57:36 +0000]:
> > diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> > index 7d4dc2e..ae08c0f 100644
> > --- a/src/thread/pthread_create.c
> > +++ b/src/thread/pthread_create.c
> > @@ -181,15 +181,8 @@ static int start(void *p)
> >  {
> >         struct start_args *args = p;
> >         if (args->attr) {
> > -               pthread_t self = __pthread_self();
> > -               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
> > -                       args->attr->_a_policy, &args->attr->_a_prio);
> > -               if (a_swap(args->perr, ret)==-2)
> > -                       __wake(args->perr, 1, 1);
> > -               if (ret) {
> > -                       self->detach_state = DT_DETACHED;
> > -                       __pthread_exit(0);
> > -               }
> > +               if (a_cas(args->perr, -1, -2) == -1)
> > +                       __wait(args->perr, 0, -2, 1);
> >         }
> >         __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
> >         __pthread_exit(args->start_func(args->start_arg));
> > @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
> >         }
> > 
> >         if (attr._a_sched) {
> > -               if (a_cas(&err, -1, -2)==-1)
> > -                       __wait(&err, 0, -2, 1);
> > -               ret = err;
> > -               if (ret) return ret;
> > +               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
> > +               if (ret) {
> > +                       new->detach_state = DT_DETACHED;
> > +                       pthread_cancel(new);
> > +                       return ret;
> 
> the child has the cancel signal blocked so it will never act on the signal.

Also, pthread_create should not pull in cancellation. Aside from being
unnecessary amounts of code that increases lots of costs in static
linking (for example, cancellable syscall paths have to be used),
there's no reason to use cancellation for something like this where
it's not trying to work with arbitrary application code, just a fixed
piece of code that admits explicit negotiation of how to continue.

> but even if that's fixed, the detached child may not get scheduled to
> handle the signal for a long time and will take up stack/tid resources.

That's the side issue I noted which my third patch fixes.

> i think Rich already has a solution that will deal with these issues.

Yes, sorry for not posting it sooner. Attached are the drafts that I
plan to push soon. (If you see something wrong and they've already
been pushed, just let me know and I'll fix it.) Patch 2 is the one
that addresses the issue reported here.

Rich

[-- Attachment #2: 0001-fix-unsynchronized-decrement-of-thread-count-on-pthr.patch --]
[-- Type: text/plain, Size: 1127 bytes --]

From dd0a23dd9e157624347fdcc9dca452675b93da70 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Fri, 6 Sep 2019 15:52:00 -0400
Subject: [PATCH 1/4] fix unsynchronized decrement of thread count on
 pthread_create error

commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af wrongly documented
that all changes to libc.threads_minus_1 were guarded by the thread
list lock, but the decrement for failed SYS_clone took place after the
thread list lock was released.
---
 src/thread/pthread_create.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index ebf61ded..edaf9a6e 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -356,13 +356,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 		new->prev = self;
 		new->next->prev = new;
 		new->prev->next = new;
+	} else {
+		libc.threads_minus_1--;
 	}
 	__tl_unlock();
 	__restore_sigs(&set);
 	__release_ptc();
 
 	if (ret < 0) {
-		libc.threads_minus_1--;
 		if (map) __munmap(map, size);
 		return EAGAIN;
 	}
-- 
2.21.0


[-- Attachment #3: 0002-set-explicit-scheduling-for-new-thread-from-calling-.patch --]
[-- Type: text/plain, Size: 3296 bytes --]

From 022f27d541af68f047d367ad9d93f941135555c1 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Fri, 6 Sep 2019 15:26:44 -0400
Subject: [PATCH 2/4] set explicit scheduling for new thread from calling
 thread, not self

if setting scheduling properties succeeds, the new thread may end up
with lower priority than the caller, and may be unable to continue
running due to another intermediate-priority thread. this produces a
priority inversion situation for the thread calling pthread_create,
since it cannot return until the new thread reports success.

originally, the parent was responsible for setting the new thread's
priority; commits b8742f32602add243ee2ce74d804015463726899 and
40bae2d32fd6f3ffea437fa745ad38a1fe77b27e changed it as part of
trimming down the pthread structure. since then, commit
04335d9260c076cf4d9264bd93dd3b06c237a639 partly reversed the changes,
but did not switch responsibilities back. do that now.
---
 src/thread/pthread_create.c | 33 ++++++++++++---------------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index edaf9a6e..edcdf041 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -172,22 +172,19 @@ void __do_cleanup_pop(struct __ptcb *cb)
 struct start_args {
 	void *(*start_func)(void *);
 	void *start_arg;
-	pthread_attr_t *attr;
-	volatile int *perr;
+	volatile int control;
 	unsigned long sig_mask[_NSIG/8/sizeof(long)];
 };
 
 static int start(void *p)
 {
 	struct start_args *args = p;
-	if (args->attr) {
-		pthread_t self = __pthread_self();
-		int ret = -__syscall(SYS_sched_setscheduler, self->tid,
-			args->attr->_a_policy, &args->attr->_a_prio);
-		if (a_swap(args->perr, ret)==-2)
-			__wake(args->perr, 1, 1);
-		if (ret) {
-			self->detach_state = DT_DETACHED;
+	int state = args->control;
+	if (state) {
+		if (a_cas(&args->control, 1, 2)==1)
+			__wait(&args->control, 0, 2, 1);
+		if (args->control) {
+			__pthread_self()->detach_state = DT_DETACHED;
 			__pthread_exit(0);
 		}
 	}
@@ -233,7 +230,6 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 		| CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_DETACHED;
 	pthread_attr_t attr = { 0 };
 	sigset_t set;
-	volatile int err = -1;
 
 	if (!libc.can_do_threads) return ENOSYS;
 	self = __pthread_self();
@@ -325,13 +321,7 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	struct start_args *args = (void *)stack;
 	args->start_func = entry;
 	args->start_arg = arg;
-	if (attr._a_sched) {
-		args->attr = &attr;
-		args->perr = &err;
-	} else {
-		args->attr = 0;
-		args->perr = 0;
-	}
+	args->control = attr._a_sched ? 1 : 0;
 
 	/* Application signals (but not the synccall signal) must be
 	 * blocked before the thread list lock can be taken, to ensure
@@ -369,9 +359,10 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	}
 
 	if (attr._a_sched) {
-		if (a_cas(&err, -1, -2)==-1)
-			__wait(&err, 0, -2, 1);
-		ret = err;
+		int ret = -__syscall(SYS_sched_setscheduler, new->tid,
+			attr._a_policy, &attr._a_prio);
+		if (a_swap(&args->control, ret ? 3 : 0)==2)
+			__wake(&args->control, 1, 1);
 		if (ret) return ret;
 	}
 
-- 
2.21.0


[-- Attachment #4: 0003-synchronously-clean-up-pthread_create-failure-due-to.patch --]
[-- Type: text/plain, Size: 3257 bytes --]

From 8a544ee3a2a75af278145b09531177cab4939b41 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Fri, 6 Sep 2019 16:17:44 -0400
Subject: [PATCH 3/4] synchronously clean up pthread_create failure due to
 scheduling errors

previously, when pthread_create failed due to inability to set
explicit scheduling according to the requested attributes, the nascent
thread was detached and made responsible for its own cleanup via the
standard pthread_exit code path. this left it consuming resources
potentially well after pthread_create returned, in a way that the
application could not see or mitigate, and unnecessarily exposed its
existence to the rest of the implementation via the global thread
list.

instead, attempt explicit scheduling early and reuse the failure path
for __clone failure if it fails. the nascent thread's exit futex is
not needed for unlocking the thread list, since the thread calling
pthread_create holds the thread list lock the whole time, so it can be
repurposed to ensure the thread has finished exiting. no pthread_exit
is needed, and freeing the stack, if needed, can happen just as it
would if __clone failed.
---
 src/thread/pthread_create.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index edcdf041..5d00d765 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -184,8 +184,8 @@ static int start(void *p)
 		if (a_cas(&args->control, 1, 2)==1)
 			__wait(&args->control, 0, 2, 1);
 		if (args->control) {
-			__pthread_self()->detach_state = DT_DETACHED;
-			__pthread_exit(0);
+			__syscall(SYS_set_tid_address, &args->control);
+			return 0;
 		}
 	}
 	__syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
@@ -339,8 +339,21 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	libc.threads_minus_1++;
 	ret = __clone((c11 ? start_c11 : start), stack, flags, args, &new->tid, TP_ADJ(new), &__thread_list_lock);
 
-	/* If clone succeeded, new thread must be linked on the thread
-	 * list before unlocking it, even if scheduling may still fail. */
+	/* All clone failures translate to EAGAIN. If explicit scheduling
+	 * was requested, attempt it before unlocking the thread list so
+	 * that the failed thread is never exposed and so that we can
+	 * clean up all transient resource usage before returning. */
+	if (ret < 0) {
+		ret = -EAGAIN;
+	} else if (attr._a_sched) {
+		ret = __syscall(SYS_sched_setscheduler,
+			new->tid, attr._a_policy, &attr._a_prio);
+		if (a_swap(&args->control, ret ? 3 : 0)==2)
+			__wake(&args->control, 1, 1);
+		if (ret)
+			__wait(&args->control, 0, 3, 0);
+	}
+
 	if (ret >= 0) {
 		new->next = self->next;
 		new->prev = self;
@@ -355,15 +368,7 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 
 	if (ret < 0) {
 		if (map) __munmap(map, size);
-		return EAGAIN;
-	}
-
-	if (attr._a_sched) {
-		int ret = -__syscall(SYS_sched_setscheduler, new->tid,
-			attr._a_policy, &attr._a_prio);
-		if (a_swap(&args->control, ret ? 3 : 0)==2)
-			__wake(&args->control, 1, 1);
-		if (ret) return ret;
+		return -ret;
 	}
 
 	*res = new;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* 答复: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
  2019-09-09 17:49   ` Rich Felker
@ 2019-09-11 13:38     ` zhaohang (F)
  2019-09-11 13:52       ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: zhaohang (F) @ 2019-09-11 13:38 UTC (permalink / raw)
  To: musl

Thank you Rich for your patch. It helps me a lot.

But I find that 'return 0' is used to let child thread exit. In that case, a bad thing will happen that the return address of child thread maybe undefined, if caller set prio of child unsuccessfully.

For example, In my system of arm, PC is set artificially to force child thread to begin with "start" function, but LR(the return address if call 'return 0')  of child thread is undefined, so if something wrong happens when set prio, my system will crash.  

Maybe __syscall(SYS_exit) is a better idea?

-----邮件原件-----
发件人: Rich Felker [mailto:dalias@aerifal.cx] 代表 Rich Felker
发送时间: 2019年9月10日 1:50
收件人: musl@lists.openwall.com
主题: Re: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion

On Mon, Sep 09, 2019 at 04:54:29PM +0200, Szabolcs Nagy wrote:
> * zhaohang (F) <zhaohang14@huawei.com> [2019-09-09 13:57:36 +0000]:
> > diff --git a/src/thread/pthread_create.c 
> > b/src/thread/pthread_create.c index 7d4dc2e..ae08c0f 100644
> > --- a/src/thread/pthread_create.c
> > +++ b/src/thread/pthread_create.c
> > @@ -181,15 +181,8 @@ static int start(void *p)  {
> >         struct start_args *args = p;
> >         if (args->attr) {
> > -               pthread_t self = __pthread_self();
> > -               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
> > -                       args->attr->_a_policy, &args->attr->_a_prio);
> > -               if (a_swap(args->perr, ret)==-2)
> > -                       __wake(args->perr, 1, 1);
> > -               if (ret) {
> > -                       self->detach_state = DT_DETACHED;
> > -                       __pthread_exit(0);
> > -               }
> > +               if (a_cas(args->perr, -1, -2) == -1)
> > +                       __wait(args->perr, 0, -2, 1);
> >         }
> >         __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
> >         __pthread_exit(args->start_func(args->start_arg));
> > @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
> >         }
> > 
> >         if (attr._a_sched) {
> > -               if (a_cas(&err, -1, -2)==-1)
> > -                       __wait(&err, 0, -2, 1);
> > -               ret = err;
> > -               if (ret) return ret;
> > +               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
> > +               if (ret) {
> > +                       new->detach_state = DT_DETACHED;
> > +                       pthread_cancel(new);
> > +                       return ret;
> 
> the child has the cancel signal blocked so it will never act on the signal.

Also, pthread_create should not pull in cancellation. Aside from being unnecessary amounts of code that increases lots of costs in static linking (for example, cancellable syscall paths have to be used), there's no reason to use cancellation for something like this where it's not trying to work with arbitrary application code, just a fixed piece of code that admits explicit negotiation of how to continue.

> but even if that's fixed, the detached child may not get scheduled to 
> handle the signal for a long time and will take up stack/tid resources.

That's the side issue I noted which my third patch fixes.

> i think Rich already has a solution that will deal with these issues.

Yes, sorry for not posting it sooner. Attached are the drafts that I plan to push soon. (If you see something wrong and they've already been pushed, just let me know and I'll fix it.) Patch 2 is the one that addresses the issue reported here.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 答复: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
  2019-09-11 13:38     ` 答复: [musl] " zhaohang (F)
@ 2019-09-11 13:52       ` Rich Felker
  2019-09-11 17:29         ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2019-09-11 13:52 UTC (permalink / raw)
  To: musl

On Wed, Sep 11, 2019 at 01:38:38PM +0000, zhaohang (F) wrote:
> Thank you Rich for your patch. It helps me a lot.
> 
> But I find that 'return 0' is used to let child thread exit. In that
> case, a bad thing will happen that the return address of child
> thread maybe undefined, if caller set prio of child unsuccessfully.

The code in __clone is supposed to perform SYS_exit if the start
function returns; this actually matters for users of the public
clone() function, I think.

> For example, In my system of arm, PC is set artificially to force
> child thread to begin with "start" function, but LR(the return
> address if call 'return 0') of child thread is undefined, so if
> something wrong happens when set prio, my system will crash.

At one point this was broken for at least one arch (mips, I think?) so
maybe it's broken for arm too. I'll check.

> Maybe __syscall(SYS_exit) is a better idea?

If I can't confirm that the code in __clone is correct for all archs,
I'll make it explicitly do SYS_exit for now, and revisit after
release, since I don't want to risk introducing a nasty regression
like this. Thanks for catching it!

Rich

> -----邮件原件-----
> 发件人: Rich Felker [mailto:dalias@aerifal.cx] 代表 Rich Felker
> 发送时间: 2019年9月10日 1:50
> 收件人: musl@lists.openwall.com
> 主题: Re: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
> 
> On Mon, Sep 09, 2019 at 04:54:29PM +0200, Szabolcs Nagy wrote:
> > * zhaohang (F) <zhaohang14@huawei.com> [2019-09-09 13:57:36 +0000]:
> > > diff --git a/src/thread/pthread_create.c 
> > > b/src/thread/pthread_create.c index 7d4dc2e..ae08c0f 100644
> > > --- a/src/thread/pthread_create.c
> > > +++ b/src/thread/pthread_create.c
> > > @@ -181,15 +181,8 @@ static int start(void *p)  {
> > >         struct start_args *args = p;
> > >         if (args->attr) {
> > > -               pthread_t self = __pthread_self();
> > > -               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
> > > -                       args->attr->_a_policy, &args->attr->_a_prio);
> > > -               if (a_swap(args->perr, ret)==-2)
> > > -                       __wake(args->perr, 1, 1);
> > > -               if (ret) {
> > > -                       self->detach_state = DT_DETACHED;
> > > -                       __pthread_exit(0);
> > > -               }
> > > +               if (a_cas(args->perr, -1, -2) == -1)
> > > +                       __wait(args->perr, 0, -2, 1);
> > >         }
> > >         __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
> > >         __pthread_exit(args->start_func(args->start_arg));
> > > @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
> > >         }
> > > 
> > >         if (attr._a_sched) {
> > > -               if (a_cas(&err, -1, -2)==-1)
> > > -                       __wait(&err, 0, -2, 1);
> > > -               ret = err;
> > > -               if (ret) return ret;
> > > +               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
> > > +               if (ret) {
> > > +                       new->detach_state = DT_DETACHED;
> > > +                       pthread_cancel(new);
> > > +                       return ret;
> > 
> > the child has the cancel signal blocked so it will never act on the signal.
> 
> Also, pthread_create should not pull in cancellation. Aside from being unnecessary amounts of code that increases lots of costs in static linking (for example, cancellable syscall paths have to be used), there's no reason to use cancellation for something like this where it's not trying to work with arbitrary application code, just a fixed piece of code that admits explicit negotiation of how to continue.
> 
> > but even if that's fixed, the detached child may not get scheduled to 
> > handle the signal for a long time and will take up stack/tid resources.
> 
> That's the side issue I noted which my third patch fixes.
> 
> > i think Rich already has a solution that will deal with these issues.
> 
> Yes, sorry for not posting it sooner. Attached are the drafts that I plan to push soon. (If you see something wrong and they've already been pushed, just let me know and I'll fix it.) Patch 2 is the one that addresses the issue reported here.
> 
> Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: 答复: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
  2019-09-11 13:52       ` Rich Felker
@ 2019-09-11 17:29         ` Rich Felker
  2019-09-16  2:27           ` 答复: [musl] " zhaohang (F)
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2019-09-11 17:29 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

On Wed, Sep 11, 2019 at 09:52:00AM -0400, Rich Felker wrote:
> On Wed, Sep 11, 2019 at 01:38:38PM +0000, zhaohang (F) wrote:
> > Thank you Rich for your patch. It helps me a lot.
> > 
> > But I find that 'return 0' is used to let child thread exit. In that
> > case, a bad thing will happen that the return address of child
> > thread maybe undefined, if caller set prio of child unsuccessfully.
> 
> The code in __clone is supposed to perform SYS_exit if the start
> function returns; this actually matters for users of the public
> clone() function, I think.

I found the problem -- when clone.s is built as thumb, mov lr,pc is
invalid for saving the return address (it omits the thumb-mode bit).
I have a patch I'll push soon, attached. Thanks again for the report!

Rich

[-- Attachment #2: 0001-fix-code-path-where-child-function-returns-in-arm-__.patch --]
[-- Type: text/plain, Size: 1316 bytes --]

From 05870abeaac0588fb9115cfd11f96880a0af2108 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Wed, 11 Sep 2019 13:13:57 -0400
Subject: [PATCH 1/2] fix code path where child function returns in arm __clone
 built as thumb

mov lr,pc is not a valid way to save the return address in thumb mode
since it omits the thumb bit. use a chain of bl and bx to emulate blx.
this could be avoided by converting to a .S file with preprocessor
conditions to use blx if available, but the time cost here is
dominated by the syscall anyway.

while making this change, also remove the remnants of support for
pre-bx ISA levels. commit 9f290a49bf9ee247d540d3c83875288a7991699c
removed the hack from the parent code paths, but left the unnecessary
code in the child. keeping it would require rewriting two code paths
rather than one, and is useless for reasons described in that commit.
---
 src/thread/arm/clone.s | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/thread/arm/clone.s b/src/thread/arm/clone.s
index e16b1326..bb0965da 100644
--- a/src/thread/arm/clone.s
+++ b/src/thread/arm/clone.s
@@ -20,13 +20,9 @@ __clone:
 	bx lr
 
 1:	mov r0,r6
-	tst r5,#1
-	bne 1f
-	mov lr,pc
-	mov pc,r5
+	bl 3f
 2:	mov r7,#1
 	svc 0
-
-1:	mov lr,pc
-	bx r5
 	b 2b
+
+3:	bx r5
-- 
2.21.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* 答复: [musl] Re: 答复: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
  2019-09-11 17:29         ` Rich Felker
@ 2019-09-16  2:27           ` zhaohang (F)
  0 siblings, 0 replies; 8+ messages in thread
From: zhaohang (F) @ 2019-09-16  2:27 UTC (permalink / raw)
  To: musl

Thanks for your reply

-----邮件原件-----
发件人: Rich Felker [mailto:dalias@aerifal.cx] 代表 Rich Felker
发送时间: 2019年9月12日 1:29
收件人: musl@lists.openwall.com
主题: Re: [musl] Re: 答复: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion

On Wed, Sep 11, 2019 at 09:52:00AM -0400, Rich Felker wrote:
> On Wed, Sep 11, 2019 at 01:38:38PM +0000, zhaohang (F) wrote:
> > Thank you Rich for your patch. It helps me a lot.
> > 
> > But I find that 'return 0' is used to let child thread exit. In that 
> > case, a bad thing will happen that the return address of child 
> > thread maybe undefined, if caller set prio of child unsuccessfully.
> 
> The code in __clone is supposed to perform SYS_exit if the start 
> function returns; this actually matters for users of the public
> clone() function, I think.

I found the problem -- when clone.s is built as thumb, mov lr,pc is invalid for saving the return address (it omits the thumb-mode bit).
I have a patch I'll push soon, attached. Thanks again for the report!

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-09-16  2:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-09 13:57 Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion zhaohang (F)
2019-09-09 14:01 ` 答复: " zhaohang (F)
2019-09-09 14:54 ` Szabolcs Nagy
2019-09-09 17:49   ` Rich Felker
2019-09-11 13:38     ` 答复: [musl] " zhaohang (F)
2019-09-11 13:52       ` Rich Felker
2019-09-11 17:29         ` Rich Felker
2019-09-16  2:27           ` 答复: [musl] " zhaohang (F)

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).