From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 10252 invoked from network); 4 Oct 2022 05:00:10 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 4 Oct 2022 05:00:10 -0000 Received: (qmail 4081 invoked by uid 550); 4 Oct 2022 05:00:06 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 4046 invoked from network); 4 Oct 2022 05:00:06 -0000 Date: Tue, 4 Oct 2022 00:59:51 -0400 From: Rich Felker To: musl@lists.openwall.com, Alexey Izbyshev Message-ID: <20221004045951.GK29905@brightrain.aerifal.cx> References: <20220919152937.GQ9709@brightrain.aerifal.cx> <20221003132615.GF2158779@port70.net> <20221003212705.GG2158779@port70.net> <20221004025831.GI29905@brightrain.aerifal.cx> <20221004030020.GJ29905@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="nFreZHaLTZJo0R7j" Content-Disposition: inline In-Reply-To: <20221004030020.GJ29905@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Illegal killlock skipping when transitioning to single-threaded state --nFreZHaLTZJo0R7j Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Oct 03, 2022 at 11:00:21PM -0400, Rich Felker wrote: > On Mon, Oct 03, 2022 at 10:58:32PM -0400, Rich Felker wrote: > > On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote: > > > * Szabolcs Nagy [2022-10-03 15:26:15 +0200]: > > > > > > > * Alexey Izbyshev [2022-10-03 09:16:03 +0300]: > > > > > On 2022-09-19 18:29, Rich Felker wrote: > > > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote: > > > > ... > > > > > > > Reordering the "libc.need_locks = -1" assignment and > > > > > > > UNLOCK(E->killlock) and providing a store barrier between them > > > > > > > should fix the issue. > > > > > > > > > > > > I think this all sounds correct. I'm not sure what you mean by a store > > > > > > barrier between them, since all lock and unlock operations are already > > > > > > full barriers. > > > > > > > > > > > > > > > > Before sending the report I tried to infer the intended ordering semantics > > > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't > > > > > see why they would provide a full barrier (my reasoning is below), so I > > > > > concluded that probably acquire/release semantics was intended in general > > > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks > > > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical > > > > > section. > > > > > > > > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple > > > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented > > > > > via load-acquire/store-release instructions. Therefore, if we consider a > > > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any > > > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and > > > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming > > > > > the processor predicts that stlxr succeeds), and further, due to (a), with > > > > > any memory access inside the critical section. Therefore, UNLOCK is not full > > > > > barrier. Is this right? > > > > > > > > i dont think this is right. > > > > > > > > > i think i was wrong and you are right. > > > > > > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and > > > starting with 'something == 0' the exiting E and remaining R threads: > > > > > > E:something=1 // protected by killlock > > > E:UNLOCK(killlock) > > > E:need_locks=-1 > > > > > > R:LOCK(unrelated) // reads need_locks == -1 > > > R:need_locks=0 > > > R:UNLOCK(unrelated) > > > R:LOCK(killlock) // does not lock > > > R:read something // can it be 0 ? > > > > > > and here something can be 0 (ie. not protected by killlock) on aarch64 > > > because > > > > > > T1 > > > something=1 > > > ldaxr ... killlock > > > stlxr ... killlock > > > need_locks=-1 > > > > > > T2 > > > x=need_locks > > > ldaxr ... unrelated > > > stlxr ... unrelated > > > y=something > > > > > > can end with x==-1 and y==0. > > > > > > and to fix it, both a_fetch_add and a_cas need an a_barrier. > > > > > > i need to think how to support such lock usage on aarch64 > > > without adding too many dmb. > > > > OK, after reading a lot more, I think I'm starting to get what you're > > saying. Am I correct in my understanding that the problem is that the > > "R:LOCK(unrelated)" as implemented does not synchronize with the > > "E:UNLOCK(killlock)" because they're different objects? > > > > If so, I think this would be fully solved by using __tl_sync in the > > code path that resets need_locks to 0 after observing -1, by virtue of > > providing a common object (the thread list lock) to synchronize on. > > This is the "weaker memory model friendly" approach we should probably > > strive to achieve some day. > > > > However, all existing code in musl is written assuming what I call the > > "POSIX memory model" where the only operation is "synchronizes memory" > > and that underspecified phrase has to be interpreted as "is a full > > barrier" to admit any consistent model. Said differently, the code was > > written assuming every a_* synchronizes with every other a_*, without > > any regard for whether they act on the same objects. This likely even > > matters for how our waiter accounting works (which is probably a good > > argument for removing it and switching to waiter flags). So I think, > > if the issue as I understand it now exists, we do need to fix it. Then > > we can revisit this at some later time as part of a big project. > > Forgot, I should include links to the material I've been reading: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 > https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=f70fb3b635f9618c6d2ee3848ba836914f7951c2 > https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=ab876106eb689947cdd8203f8ecc6e8ac38bf5ba > > which is where the GCC folks seem to have encountered and fixed their > corresponding issue. Proposed patch attached. Rich --nFreZHaLTZJo0R7j Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="aarch64-barrier.diff" diff --git a/arch/aarch64/atomic_arch.h b/arch/aarch64/atomic_arch.h index 40fefc25..c01a3ab3 100644 --- a/arch/aarch64/atomic_arch.h +++ b/arch/aarch64/atomic_arch.h @@ -2,7 +2,7 @@ static inline int a_ll(volatile int *p) { int v; - __asm__ __volatile__ ("ldaxr %w0,%1" : "=r"(v) : "Q"(*p)); + __asm__ __volatile__ ("ldxr %w0,%1" : "=r"(v) : "Q"(*p)); return v; } @@ -20,25 +20,13 @@ static inline void a_barrier() __asm__ __volatile__ ("dmb ish" : : : "memory"); } -#define a_cas a_cas -static inline int a_cas(volatile int *p, int t, int s) -{ - int old; - do { - old = a_ll(p); - if (old != t) { - a_barrier(); - break; - } - } while (!a_sc(p, s)); - return old; -} +#define a_post_llsc a_barrier #define a_ll_p a_ll_p static inline void *a_ll_p(volatile void *p) { void *v; - __asm__ __volatile__ ("ldaxr %0, %1" : "=r"(v) : "Q"(*(void *volatile *)p)); + __asm__ __volatile__ ("ldxr %0, %1" : "=r"(v) : "Q"(*(void *volatile *)p)); return v; } @@ -50,20 +38,6 @@ static inline int a_sc_p(volatile int *p, void *v) return !r; } -#define a_cas_p a_cas_p -static inline void *a_cas_p(volatile void *p, void *t, void *s) -{ - void *old; - do { - old = a_ll_p(p); - if (old != t) { - a_barrier(); - break; - } - } while (!a_sc_p(p, s)); - return old; -} - #define a_ctz_64 a_ctz_64 static inline int a_ctz_64(uint64_t x) { --nFreZHaLTZJo0R7j--