From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14628 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion Date: Mon, 9 Sep 2019 13:49:43 -0400 Message-ID: <20190909174943.GN9017@brightrain.aerifal.cx> References: <59FB1E003EF3A943BD6BAD197ABD4D6A2B5D55@dggemi524-mbx.china.huawei.com> <20190909145429.GG22009@port70.net> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="z3ovhOgMYmj8MRdq" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="233811"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-14644-gllmg-musl=m.gmane.org@lists.openwall.com Mon Sep 09 19:50:00 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1i7Nnb-000yhw-Ln for gllmg-musl@m.gmane.org; Mon, 09 Sep 2019 19:49:59 +0200 Original-Received: (qmail 7416 invoked by uid 550); 9 Sep 2019 17:49:56 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 7392 invoked from network); 9 Sep 2019 17:49:56 -0000 Content-Disposition: inline In-Reply-To: <20190909145429.GG22009@port70.net> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14628 Archived-At: --z3ovhOgMYmj8MRdq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Sep 09, 2019 at 04:54:29PM +0200, Szabolcs Nagy wrote: > * zhaohang (F) [2019-09-09 13:57:36 +0000]: > > diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c > > index 7d4dc2e..ae08c0f 100644 > > --- a/src/thread/pthread_create.c > > +++ b/src/thread/pthread_create.c > > @@ -181,15 +181,8 @@ static int start(void *p) > > { > > struct start_args *args = p; > > if (args->attr) { > > - pthread_t self = __pthread_self(); > > - int ret = -__syscall(SYS_sched_setscheduler, self->tid, > > - args->attr->_a_policy, &args->attr->_a_prio); > > - if (a_swap(args->perr, ret)==-2) > > - __wake(args->perr, 1, 1); > > - if (ret) { > > - self->detach_state = DT_DETACHED; > > - __pthread_exit(0); > > - } > > + if (a_cas(args->perr, -1, -2) == -1) > > + __wait(args->perr, 0, -2, 1); > > } > > __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8); > > __pthread_exit(args->start_func(args->start_arg)); > > @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att > > } > > > > if (attr._a_sched) { > > - if (a_cas(&err, -1, -2)==-1) > > - __wait(&err, 0, -2, 1); > > - ret = err; > > - if (ret) return ret; > > + ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio); > > + if (ret) { > > + new->detach_state = DT_DETACHED; > > + pthread_cancel(new); > > + return ret; > > the child has the cancel signal blocked so it will never act on the signal. Also, pthread_create should not pull in cancellation. Aside from being unnecessary amounts of code that increases lots of costs in static linking (for example, cancellable syscall paths have to be used), there's no reason to use cancellation for something like this where it's not trying to work with arbitrary application code, just a fixed piece of code that admits explicit negotiation of how to continue. > but even if that's fixed, the detached child may not get scheduled to > handle the signal for a long time and will take up stack/tid resources. That's the side issue I noted which my third patch fixes. > i think Rich already has a solution that will deal with these issues. Yes, sorry for not posting it sooner. Attached are the drafts that I plan to push soon. (If you see something wrong and they've already been pushed, just let me know and I'll fix it.) Patch 2 is the one that addresses the issue reported here. Rich --z3ovhOgMYmj8MRdq Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0001-fix-unsynchronized-decrement-of-thread-count-on-pthr.patch" >From dd0a23dd9e157624347fdcc9dca452675b93da70 Mon Sep 17 00:00:00 2001 From: Rich Felker Date: Fri, 6 Sep 2019 15:52:00 -0400 Subject: [PATCH 1/4] fix unsynchronized decrement of thread count on pthread_create error commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af wrongly documented that all changes to libc.threads_minus_1 were guarded by the thread list lock, but the decrement for failed SYS_clone took place after the thread list lock was released. --- src/thread/pthread_create.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index ebf61ded..edaf9a6e 100644 --- a/src/thread/pthread_create.c +++ b/src/thread/pthread_create.c @@ -356,13 +356,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att new->prev = self; new->next->prev = new; new->prev->next = new; + } else { + libc.threads_minus_1--; } __tl_unlock(); __restore_sigs(&set); __release_ptc(); if (ret < 0) { - libc.threads_minus_1--; if (map) __munmap(map, size); return EAGAIN; } -- 2.21.0 --z3ovhOgMYmj8MRdq Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0002-set-explicit-scheduling-for-new-thread-from-calling-.patch" >From 022f27d541af68f047d367ad9d93f941135555c1 Mon Sep 17 00:00:00 2001 From: Rich Felker Date: Fri, 6 Sep 2019 15:26:44 -0400 Subject: [PATCH 2/4] set explicit scheduling for new thread from calling thread, not self if setting scheduling properties succeeds, the new thread may end up with lower priority than the caller, and may be unable to continue running due to another intermediate-priority thread. this produces a priority inversion situation for the thread calling pthread_create, since it cannot return until the new thread reports success. originally, the parent was responsible for setting the new thread's priority; commits b8742f32602add243ee2ce74d804015463726899 and 40bae2d32fd6f3ffea437fa745ad38a1fe77b27e changed it as part of trimming down the pthread structure. since then, commit 04335d9260c076cf4d9264bd93dd3b06c237a639 partly reversed the changes, but did not switch responsibilities back. do that now. --- src/thread/pthread_create.c | 33 ++++++++++++--------------------- 1 file changed, 12 insertions(+), 21 deletions(-) diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index edaf9a6e..edcdf041 100644 --- a/src/thread/pthread_create.c +++ b/src/thread/pthread_create.c @@ -172,22 +172,19 @@ void __do_cleanup_pop(struct __ptcb *cb) struct start_args { void *(*start_func)(void *); void *start_arg; - pthread_attr_t *attr; - volatile int *perr; + volatile int control; unsigned long sig_mask[_NSIG/8/sizeof(long)]; }; static int start(void *p) { struct start_args *args = p; - if (args->attr) { - pthread_t self = __pthread_self(); - int ret = -__syscall(SYS_sched_setscheduler, self->tid, - args->attr->_a_policy, &args->attr->_a_prio); - if (a_swap(args->perr, ret)==-2) - __wake(args->perr, 1, 1); - if (ret) { - self->detach_state = DT_DETACHED; + int state = args->control; + if (state) { + if (a_cas(&args->control, 1, 2)==1) + __wait(&args->control, 0, 2, 1); + if (args->control) { + __pthread_self()->detach_state = DT_DETACHED; __pthread_exit(0); } } @@ -233,7 +230,6 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_DETACHED; pthread_attr_t attr = { 0 }; sigset_t set; - volatile int err = -1; if (!libc.can_do_threads) return ENOSYS; self = __pthread_self(); @@ -325,13 +321,7 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att struct start_args *args = (void *)stack; args->start_func = entry; args->start_arg = arg; - if (attr._a_sched) { - args->attr = &attr; - args->perr = &err; - } else { - args->attr = 0; - args->perr = 0; - } + args->control = attr._a_sched ? 1 : 0; /* Application signals (but not the synccall signal) must be * blocked before the thread list lock can be taken, to ensure @@ -369,9 +359,10 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att } if (attr._a_sched) { - if (a_cas(&err, -1, -2)==-1) - __wait(&err, 0, -2, 1); - ret = err; + int ret = -__syscall(SYS_sched_setscheduler, new->tid, + attr._a_policy, &attr._a_prio); + if (a_swap(&args->control, ret ? 3 : 0)==2) + __wake(&args->control, 1, 1); if (ret) return ret; } -- 2.21.0 --z3ovhOgMYmj8MRdq Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="0003-synchronously-clean-up-pthread_create-failure-due-to.patch" >From 8a544ee3a2a75af278145b09531177cab4939b41 Mon Sep 17 00:00:00 2001 From: Rich Felker Date: Fri, 6 Sep 2019 16:17:44 -0400 Subject: [PATCH 3/4] synchronously clean up pthread_create failure due to scheduling errors previously, when pthread_create failed due to inability to set explicit scheduling according to the requested attributes, the nascent thread was detached and made responsible for its own cleanup via the standard pthread_exit code path. this left it consuming resources potentially well after pthread_create returned, in a way that the application could not see or mitigate, and unnecessarily exposed its existence to the rest of the implementation via the global thread list. instead, attempt explicit scheduling early and reuse the failure path for __clone failure if it fails. the nascent thread's exit futex is not needed for unlocking the thread list, since the thread calling pthread_create holds the thread list lock the whole time, so it can be repurposed to ensure the thread has finished exiting. no pthread_exit is needed, and freeing the stack, if needed, can happen just as it would if __clone failed. --- src/thread/pthread_create.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c index edcdf041..5d00d765 100644 --- a/src/thread/pthread_create.c +++ b/src/thread/pthread_create.c @@ -184,8 +184,8 @@ static int start(void *p) if (a_cas(&args->control, 1, 2)==1) __wait(&args->control, 0, 2, 1); if (args->control) { - __pthread_self()->detach_state = DT_DETACHED; - __pthread_exit(0); + __syscall(SYS_set_tid_address, &args->control); + return 0; } } __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8); @@ -339,8 +339,21 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att libc.threads_minus_1++; ret = __clone((c11 ? start_c11 : start), stack, flags, args, &new->tid, TP_ADJ(new), &__thread_list_lock); - /* If clone succeeded, new thread must be linked on the thread - * list before unlocking it, even if scheduling may still fail. */ + /* All clone failures translate to EAGAIN. If explicit scheduling + * was requested, attempt it before unlocking the thread list so + * that the failed thread is never exposed and so that we can + * clean up all transient resource usage before returning. */ + if (ret < 0) { + ret = -EAGAIN; + } else if (attr._a_sched) { + ret = __syscall(SYS_sched_setscheduler, + new->tid, attr._a_policy, &attr._a_prio); + if (a_swap(&args->control, ret ? 3 : 0)==2) + __wake(&args->control, 1, 1); + if (ret) + __wait(&args->control, 0, 3, 0); + } + if (ret >= 0) { new->next = self->next; new->prev = self; @@ -355,15 +368,7 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att if (ret < 0) { if (map) __munmap(map, size); - return EAGAIN; - } - - if (attr._a_sched) { - int ret = -__syscall(SYS_sched_setscheduler, new->tid, - attr._a_policy, &attr._a_prio); - if (a_swap(&args->control, ret ? 3 : 0)==2) - __wake(&args->control, 1, 1); - if (ret) return ret; + return -ret; } *res = new; -- 2.21.0 --z3ovhOgMYmj8MRdq--