From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13363 Path: news.gmane.org!.POSTED!not-for-mail From: Florian Weimer Newsgroups: gmane.linux.lib.musl.general Subject: Re: Possible design for global thread list Date: Sun, 14 Oct 2018 19:32:54 +0200 Message-ID: <87bm7w5t89.fsf@mid.deneb.enyo.de> References: <20181010163306.GO17110@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1539538263 25701 195.159.176.226 (14 Oct 2018 17:31:03 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 14 Oct 2018 17:31:03 +0000 (UTC) Cc: musl@lists.openwall.com To: Rich Felker Original-X-From: musl-return-13379-gllmg-musl=m.gmane.org@lists.openwall.com Sun Oct 14 19:30:59 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gBkEF-0006ai-3S for gllmg-musl@m.gmane.org; Sun, 14 Oct 2018 19:30:59 +0200 Original-Received: (qmail 13626 invoked by uid 550); 14 Oct 2018 17:33:07 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13608 invoked from network); 14 Oct 2018 17:33:07 -0000 In-Reply-To: <20181010163306.GO17110@brightrain.aerifal.cx> (Rich Felker's message of "Wed, 10 Oct 2018 12:33:06 -0400") Xref: news.gmane.org gmane.linux.lib.musl.general:13363 Archived-At: * Rich Felker: > Of course, this futex wake is already used for pthread_join, which > would need another mechanism. This is solved simply: pthread_exit can > FUTEX_REQUEUE a waiting joiner to the thread-list lock. pthread_join > then has to wait on (but need not acquire) the thread-list lock after > waiting on the thread's own exit futex in order to ensure the exit has > actually finished. This is potentially subject to long waits if the > lock is under contention (lots of threads exiting or being created) > and retaken before pthread_join gets to run, but the probability of > collision can be made negligible (only possible under extremely rapid > tid reuse) by using the tid of the exiting thread as the wait value. > Alternatively, the tid of the joiner could be used, making collisions > impossible, but setting up to do this is more complex. I'm not sure if this is compatible with existing software which rapidly joins and creates many threads in succession because it looks to me that the pthread_join operation can return before the kernel resources are freed. As a result, applications will get impossible EAGAIN failures, even though the application never exceeds the thread limit. Depending on kernel version and cgroups configuration, this race can even be observed with the more usual join sequence because the kernel signals thread exit too early to user space.