From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 11274 invoked from network); 19 Sep 2022 15:29:53 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 19 Sep 2022 15:29:53 -0000 Received: (qmail 9802 invoked by uid 550); 19 Sep 2022 15:29:51 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 9767 invoked from network); 19 Sep 2022 15:29:50 -0000 Date: Mon, 19 Sep 2022 11:29:37 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20220919152937.GQ9709@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Illegal killlock skipping when transitioning to single-threaded state On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote: > Hi, > > While reading pthread_exit() implementation I noticed that it can > set "libc.need_locks" to -1 while still holding the killlock of the > exiting thread: > > if (!--libc.threads_minus_1) libc.need_locks = -1; > > If the remaining thread attempts to acquire the same killlock > concurrently (which is valid for joinable threads), it works fine > because LOCK() resets "libc.need_locks" only after a_cas(): > > int need_locks = libc.need_locks; > if (!need_locks) return; > /* fast path: INT_MIN for the lock, +1 for the congestion */ > int current = a_cas(l, 0, INT_MIN + 1); > if (need_locks < 0) libc.need_locks = 0; > if (!current) return; > > However, because "libc.need_locks" is reset when using LOCK() for > any other lock too, the following could happen (E is the exiting > thread, R is the remaining thread): > > E: libc.need_locks = -1 > R: LOCK(unrelated_lock) > R: libc.need_locks = 0 > R: UNLOCK(unrelated_lock) > R: LOCK(E->killlock) // skips locking, now both R and E think they > are holding the lock > R: UNLOCK(E->killlock) > E: UNLOCK(E->killlock) > > The lack of mutual exclusion means that the tid reuse problem that > killlock is supposed to prevent might occur. > > Moreover, when UNLOCK(E->killlock) is called concurrently by R and > E, a_fetch_add() could be done twice if timing is such that both > threads notice that "l[0] < 0": > > /* Check l[0] to see if we are multi-threaded. */ > if (l[0] < 0) { > if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) { > __wake(l, 1, 1); > } > } > > In that case E->killlock will be assigned to INT_MAX (i.e. "unlocked > with INT_MAX waiters"). This is a bad state because the next LOCK() > will wrap it to "unlocked" state instead of locking. Also, if more > than two threads attempt to use it, a deadlock will occur if two > supposedly-owners execute a_fetch_add() concurrently in UNLOCK() > after a third thread registered itself as a waiter because the lock > will wrap to INT_MIN. > > Reordering the "libc.need_locks = -1" assignment and > UNLOCK(E->killlock) and providing a store barrier between them > should fix the issue. I think this all sounds correct. I'm not sure what you mean by a store barrier between them, since all lock and unlock operations are already full barriers. I'm still a little bit concerned though that there's no real model for synchronization of the access to modify need_locks though. Conceptually, the store of -1 from the exiting thread and the store of 0 in the surviving thread are not ordered with respect to each other except by an unsynchronized causality relationship. I suspect what "should be" happening is that, if we observe need_locks==-1 (as a relaxed atomic load in our memory model), we take the thread list lock (as the lock controlling write access to these values) which ensures that the exiting thread has exited, then confirm that we are the only thread left (are there async-signal ways without UB that another thread could be created in the mean time?) before setting it to 0. But this is a rather heavy operation. If there's an assurance that no other threads can be created in the middle of LOCK (which I think there should be), we could just use __tl_sync(), which is much lighter, to conclude that we've synchronized with being the only remaining thread and that unsynchronized access is okay. Rich