From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14652 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: =?utf-8?B?562U5aSNOiBbbXVzbF0gU3ViamVj?= =?utf-8?Q?t=3A_=5BPATCH=5D_pthread?= =?utf-8?Q?=3A?= Fix bug that pthread_create may cause priority inversion Date: Wed, 11 Sep 2019 09:52:00 -0400 Message-ID: <20190911135200.GV9017@brightrain.aerifal.cx> References: <59FB1E003EF3A943BD6BAD197ABD4D6A2B5D55@dggemi524-mbx.china.huawei.com> <20190909145429.GG22009@port70.net> <20190909174943.GN9017@brightrain.aerifal.cx> <59FB1E003EF3A943BD6BAD197ABD4D6A2B7D7F@dggemi524-mbx.china.huawei.com> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="221277"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-14668-gllmg-musl=m.gmane.org@lists.openwall.com Wed Sep 11 15:52:16 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1i832c-000vQr-SQ for gllmg-musl@m.gmane.org; Wed, 11 Sep 2019 15:52:14 +0200 Original-Received: (qmail 5595 invoked by uid 550); 11 Sep 2019 13:52:12 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 5560 invoked from network); 11 Sep 2019 13:52:11 -0000 Content-Disposition: inline In-Reply-To: <59FB1E003EF3A943BD6BAD197ABD4D6A2B7D7F@dggemi524-mbx.china.huawei.com> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14652 Archived-At: On Wed, Sep 11, 2019 at 01:38:38PM +0000, zhaohang (F) wrote: > Thank you Rich for your patch. It helps me a lot. > > But I find that 'return 0' is used to let child thread exit. In that > case, a bad thing will happen that the return address of child > thread maybe undefined, if caller set prio of child unsuccessfully. The code in __clone is supposed to perform SYS_exit if the start function returns; this actually matters for users of the public clone() function, I think. > For example, In my system of arm, PC is set artificially to force > child thread to begin with "start" function, but LR(the return > address if call 'return 0') of child thread is undefined, so if > something wrong happens when set prio, my system will crash. At one point this was broken for at least one arch (mips, I think?) so maybe it's broken for arm too. I'll check. > Maybe __syscall(SYS_exit) is a better idea? If I can't confirm that the code in __clone is correct for all archs, I'll make it explicitly do SYS_exit for now, and revisit after release, since I don't want to risk introducing a nasty regression like this. Thanks for catching it! Rich > -----邮件原件----- > 发件人: Rich Felker [mailto:dalias@aerifal.cx] 代表 Rich Felker > 发送时间: 2019年9月10日 1:50 > 收件人: musl@lists.openwall.com > 主题: Re: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion > > On Mon, Sep 09, 2019 at 04:54:29PM +0200, Szabolcs Nagy wrote: > > * zhaohang (F) [2019-09-09 13:57:36 +0000]: > > > diff --git a/src/thread/pthread_create.c > > > b/src/thread/pthread_create.c index 7d4dc2e..ae08c0f 100644 > > > --- a/src/thread/pthread_create.c > > > +++ b/src/thread/pthread_create.c > > > @@ -181,15 +181,8 @@ static int start(void *p) { > > > struct start_args *args = p; > > > if (args->attr) { > > > - pthread_t self = __pthread_self(); > > > - int ret = -__syscall(SYS_sched_setscheduler, self->tid, > > > - args->attr->_a_policy, &args->attr->_a_prio); > > > - if (a_swap(args->perr, ret)==-2) > > > - __wake(args->perr, 1, 1); > > > - if (ret) { > > > - self->detach_state = DT_DETACHED; > > > - __pthread_exit(0); > > > - } > > > + if (a_cas(args->perr, -1, -2) == -1) > > > + __wait(args->perr, 0, -2, 1); > > > } > > > __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8); > > > __pthread_exit(args->start_func(args->start_arg)); > > > @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att > > > } > > > > > > if (attr._a_sched) { > > > - if (a_cas(&err, -1, -2)==-1) > > > - __wait(&err, 0, -2, 1); > > > - ret = err; > > > - if (ret) return ret; > > > + ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio); > > > + if (ret) { > > > + new->detach_state = DT_DETACHED; > > > + pthread_cancel(new); > > > + return ret; > > > > the child has the cancel signal blocked so it will never act on the signal. > > Also, pthread_create should not pull in cancellation. Aside from being unnecessary amounts of code that increases lots of costs in static linking (for example, cancellable syscall paths have to be used), there's no reason to use cancellation for something like this where it's not trying to work with arbitrary application code, just a fixed piece of code that admits explicit negotiation of how to continue. > > > but even if that's fixed, the detached child may not get scheduled to > > handle the signal for a long time and will take up stack/tid resources. > > That's the side issue I noted which my third patch fixes. > > > i think Rich already has a solution that will deal with these issues. > > Yes, sorry for not posting it sooner. Attached are the drafts that I plan to push soon. (If you see something wrong and they've already been pushed, just let me know and I'll fix it.) Patch 2 is the one that addresses the issue reported here. > > Rich