From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23456 invoked from network); 3 Oct 2022 21:27:21 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 3 Oct 2022 21:27:21 -0000 Received: (qmail 19694 invoked by uid 550); 3 Oct 2022 21:27:17 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 19655 invoked from network); 3 Oct 2022 21:27:17 -0000 Date: Mon, 3 Oct 2022 23:27:05 +0200 From: Szabolcs Nagy To: musl@lists.openwall.com, Alexey Izbyshev Message-ID: <20221003212705.GG2158779@port70.net> Mail-Followup-To: musl@lists.openwall.com, Alexey Izbyshev References: <20220919152937.GQ9709@brightrain.aerifal.cx> <20221003132615.GF2158779@port70.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221003132615.GF2158779@port70.net> Subject: Re: [musl] Illegal killlock skipping when transitioning to single-threaded state * Szabolcs Nagy [2022-10-03 15:26:15 +0200]: > * Alexey Izbyshev [2022-10-03 09:16:03 +0300]: > > On 2022-09-19 18:29, Rich Felker wrote: > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote: > ... > > > > Reordering the "libc.need_locks = -1" assignment and > > > > UNLOCK(E->killlock) and providing a store barrier between them > > > > should fix the issue. > > > > > > I think this all sounds correct. I'm not sure what you mean by a store > > > barrier between them, since all lock and unlock operations are already > > > full barriers. > > > > > > > Before sending the report I tried to infer the intended ordering semantics > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't > > see why they would provide a full barrier (my reasoning is below), so I > > concluded that probably acquire/release semantics was intended in general > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks > > = -1" store spelled after UNLOCK(E->killlock) back into the critical > > section. > > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented > > via load-acquire/store-release instructions. Therefore, if we consider a > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any > > such memory access can be reordered with the initial ldaxr in UNLOCK, and > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming > > the processor predicts that stlxr succeeds), and further, due to (a), with > > any memory access inside the critical section. Therefore, UNLOCK is not full > > barrier. Is this right? > > i dont think this is right. i think i was wrong and you are right. so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and starting with 'something == 0' the exiting E and remaining R threads: E:something=1 // protected by killlock E:UNLOCK(killlock) E:need_locks=-1 R:LOCK(unrelated) // reads need_locks == -1 R:need_locks=0 R:UNLOCK(unrelated) R:LOCK(killlock) // does not lock R:read something // can it be 0 ? and here something can be 0 (ie. not protected by killlock) on aarch64 because T1 something=1 ldaxr ... killlock stlxr ... killlock need_locks=-1 T2 x=need_locks ldaxr ... unrelated stlxr ... unrelated y=something can end with x==-1 and y==0. and to fix it, both a_fetch_add and a_cas need an a_barrier. i need to think how to support such lock usage on aarch64 without adding too many dmb. > > memory operations in the critical section cannot visibly happen after the > final stlxr. > > memory operations after the critical section cannot visibly happen before > the ldaxr of UNLOCK. > > the only issue can be inside the ll/sc loop in UNLOCK if there are independent > memory operations there, but there arent. > > > > > As for a store following UNLOCK, I'm not sure. A plain store following stlxr > > can be reordered with it, but here that store is conditional on stlxr > > success. So even if the processor predicts stlxr success, it can't make the > > following store visible before it's sure that stlxr succeeded. But I don't > > know whether the events "the processor knows that stlxr succeeded" and "the > > result of stlxr is globally visible" are separate or not, and if they are > > separate, what comes first. Depending on the answer, UNLOCK acts as a store > > barrier or not. > > > > UNLOCK on aarch64 acts as a full seqcst barrier as far as i can see. > > but looking into the arch implementations needs a detailed understanding > of the arch memory model (eg aarch64 stlxr is RCsc not RCpc like iso c > release store), but there is no need for that: the musl model is > simple seqcst synchronization everywhere.