From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 28241 invoked from network); 4 Oct 2022 16:01:19 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 4 Oct 2022 16:01:19 -0000 Received: (qmail 20095 invoked by uid 550); 4 Oct 2022 16:01:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 20063 invoked from network); 4 Oct 2022 16:01:16 -0000 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 7578A42F5DD5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1664899264; bh=34MSvDkniM3dWnwNKjaDtHKxOC9TC3zO9sbhwN3CS68=; h=Date:From:To:Subject:Reply-To:In-Reply-To:References:From; b=dSIFy2hG4mmtCUS+MoYKXI+hyaR+wsgOnaI18JJdICAZUu302/a5T40S7PiuzBZEj 5BQqyU/fgTlNzclqbHeKx33IRFUaFZXW11RTM62tU2l9JZM/TBG7lhPupFm1x1PPYQ ClvnkgzWr678s10rsBQQ9hqVZ37QtggXqt2eYZt4= MIME-Version: 1.0 Date: Tue, 04 Oct 2022 19:01:04 +0300 From: Alexey Izbyshev To: musl@lists.openwall.com Mail-Followup-To: musl@lists.openwall.com In-Reply-To: <2e77a700561a059e85daad5311306cfb@ispras.ru> References: <20220919152937.GQ9709@brightrain.aerifal.cx> <20221003132615.GF2158779@port70.net> <20221003212705.GG2158779@port70.net> <20221003225416.GG29905@brightrain.aerifal.cx> <20221003230505.GH29905@brightrain.aerifal.cx> <2e77a700561a059e85daad5311306cfb@ispras.ru> User-Agent: Roundcube Webmail/1.4.4 Message-ID: <11b02cf9c4a16cc512c8c38db2e39211@ispras.ru> X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [musl] Illegal killlock skipping when transitioning to single-threaded state On 2022-10-04 16:50, Alexey Izbyshev wrote: > On 2022-10-04 02:05, Rich Felker wrote: >> On Mon, Oct 03, 2022 at 06:54:17PM -0400, Rich Felker wrote: >>> On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote: >>> > * Szabolcs Nagy [2022-10-03 15:26:15 +0200]: >>> > >>> > > * Alexey Izbyshev [2022-10-03 09:16:03 +0300]: >>> > > > On 2022-09-19 18:29, Rich Felker wrote: >>> > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote: >>> > > ... >>> > > > > > Reordering the "libc.need_locks = -1" assignment and >>> > > > > > UNLOCK(E->killlock) and providing a store barrier between them >>> > > > > > should fix the issue. >>> > > > > >>> > > > > I think this all sounds correct. I'm not sure what you mean by a store >>> > > > > barrier between them, since all lock and unlock operations are already >>> > > > > full barriers. >>> > > > > >>> > > > >>> > > > Before sending the report I tried to infer the intended ordering semantics >>> > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't >>> > > > see why they would provide a full barrier (my reasoning is below), so I >>> > > > concluded that probably acquire/release semantics was intended in general >>> > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks >>> > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical >>> > > > section. >>> > > > >>> > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple >>> > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented >>> > > > via load-acquire/store-release instructions. Therefore, if we consider a >>> > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any >>> > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and >>> > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming >>> > > > the processor predicts that stlxr succeeds), and further, due to (a), with >>> > > > any memory access inside the critical section. Therefore, UNLOCK is not full >>> > > > barrier. Is this right? >>> > > >>> > > i dont think this is right. >>> > >>> > >>> > i think i was wrong and you are right. >>> > >>> > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and >>> > starting with 'something == 0' the exiting E and remaining R threads: >>> > >>> > E:something=1 // protected by killlock >>> > E:UNLOCK(killlock) >>> > E:need_locks=-1 >>> > >>> > R:LOCK(unrelated) // reads need_locks == -1 >>> > R:need_locks=0 >>> > R:UNLOCK(unrelated) >>> > R:LOCK(killlock) // does not lock >>> > R:read something // can it be 0 ? >>> > >>> > and here something can be 0 (ie. not protected by killlock) on aarch64 >>> > because >>> > >>> > T1 >>> > something=1 >>> > ldaxr ... killlock >>> > stlxr ... killlock >>> > need_locks=-1 >>> > >>> > T2 >>> > x=need_locks >>> > ldaxr ... unrelated >>> > stlxr ... unrelated >>> > y=something >>> > >>> > can end with x==-1 and y==0. >>> > >>> > and to fix it, both a_fetch_add and a_cas need an a_barrier. >>> > >>> > i need to think how to support such lock usage on aarch64 >>> > without adding too many dmb. >>> >>> I don't really understand this, but FWIW gcc emits >>> >>> ldxr >>> ... >>> stlxr >>> ... >>> dmb ish >>> >>> for __sync_val_compare_and_swap. So this is probably the right thing >>> we should have. And it seems to match what the kernel folks discussed >>> here: >>> >>> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229588.html >>> >>> I wondered if there are similar issues for any others archs which >>> need >>> review, but it looks like all the other llsc archs have explicit >>> pre/post barriers defined. >> >> Actually I don't understand what's going on with cmpxchg there. The >> patch I linked has it using ldxr/stxr (not stlxr) for cmpxchg. There's >> some follow-up in the thread I don't understand, about the case where >> the cas fails, but we already handle that by doing an explicit barrier >> in that case. >> > I think in that follow-up[1] they mean the following case (in musl > terms): > > volatile int x, flag; > > T1: > x = 1; > a_store(&flag, 1); > > T2: > while (!flag); > a_cas(&x, 0, 1); // can this fail? > I made a mistake here, this should be a_cas(&x, 1, 2). Everything else stands. > They want it to never fail. But if a_cas() is implemented as > ldrx/stlrx/dmb, this is not guaranteed because ldxr can be reordered > with the load of flag. > > Note that musl does *not* handle this now, because a_barrier() in the > failure path is after a_ll(). > > [1] > https://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229693.html > > Alexey