Re: [musl] Illegal killlock skipping when transitioning to single-threaded state

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com, Alexey Izbyshev <izbyshev@ispras.ru>
Subject: Re: [musl] Illegal killlock skipping when transitioning to single-threaded state
Date: Tue, 4 Oct 2022 00:59:51 -0400	[thread overview]
Message-ID: <20221004045951.GK29905@brightrain.aerifal.cx> (raw)
In-Reply-To: <20221004030020.GJ29905@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 5018 bytes --]

On Mon, Oct 03, 2022 at 11:00:21PM -0400, Rich Felker wrote:
> On Mon, Oct 03, 2022 at 10:58:32PM -0400, Rich Felker wrote:
> > On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote:
> > > * Szabolcs Nagy <nsz@port70.net> [2022-10-03 15:26:15 +0200]:
> > > 
> > > > * Alexey Izbyshev <izbyshev@ispras.ru> [2022-10-03 09:16:03 +0300]:
> > > > > On 2022-09-19 18:29, Rich Felker wrote:
> > > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
> > > > ...
> > > > > > > Reordering the "libc.need_locks = -1" assignment and
> > > > > > > UNLOCK(E->killlock) and providing a store barrier between them
> > > > > > > should fix the issue.
> > > > > > 
> > > > > > I think this all sounds correct. I'm not sure what you mean by a store
> > > > > > barrier between them, since all lock and unlock operations are already
> > > > > > full barriers.
> > > > > > 
> > > > > 
> > > > > Before sending the report I tried to infer the intended ordering semantics
> > > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
> > > > > see why they would provide a full barrier (my reasoning is below), so I
> > > > > concluded that probably acquire/release semantics was intended in general
> > > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
> > > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical
> > > > > section.
> > > > > 
> > > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
> > > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
> > > > > via load-acquire/store-release instructions. Therefore, if we consider a
> > > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
> > > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and
> > > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
> > > > > the processor predicts that stlxr succeeds), and further, due to (a), with
> > > > > any memory access inside the critical section. Therefore, UNLOCK is not full
> > > > > barrier. Is this right?
> > > > 
> > > > i dont think this is right.
> > > 
> > > 
> > > i think i was wrong and you are right.
> > > 
> > > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
> > > starting with 'something == 0' the exiting E and remaining R threads:
> > > 
> > > E:something=1      // protected by killlock
> > > E:UNLOCK(killlock)
> > > E:need_locks=-1
> > > 
> > > R:LOCK(unrelated)  // reads need_locks == -1
> > > R:need_locks=0
> > > R:UNLOCK(unrelated)
> > > R:LOCK(killlock)   // does not lock
> > > R:read something   // can it be 0 ?
> > > 
> > > and here something can be 0 (ie. not protected by killlock) on aarch64
> > > because
> > > 
> > > T1
> > > 	something=1
> > > 	ldaxr ... killlock
> > > 	stlxr ... killlock
> > > 	need_locks=-1
> > > 
> > > T2
> > > 	x=need_locks
> > > 	ldaxr ... unrelated
> > > 	stlxr ... unrelated
> > > 	y=something
> > > 
> > > can end with x==-1 and y==0.
> > > 
> > > and to fix it, both a_fetch_add and a_cas need an a_barrier.
> > > 
> > > i need to think how to support such lock usage on aarch64
> > > without adding too many dmb.
> > 
> > OK, after reading a lot more, I think I'm starting to get what you're
> > saying. Am I correct in my understanding that the problem is that the
> > "R:LOCK(unrelated)" as implemented does not synchronize with the
> > "E:UNLOCK(killlock)" because they're different objects?
> > 
> > If so, I think this would be fully solved by using __tl_sync in the
> > code path that resets need_locks to 0 after observing -1, by virtue of
> > providing a common object (the thread list lock) to synchronize on.
> > This is the "weaker memory model friendly" approach we should probably
> > strive to achieve some day.
> > 
> > However, all existing code in musl is written assuming what I call the
> > "POSIX memory model" where the only operation is "synchronizes memory"
> > and that underspecified phrase has to be interpreted as "is a full
> > barrier" to admit any consistent model. Said differently, the code was
> > written assuming every a_* synchronizes with every other a_*, without
> > any regard for whether they act on the same objects. This likely even
> > matters for how our waiter accounting works (which is probably a good
> > argument for removing it and switching to waiter flags). So I think,
> > if the issue as I understand it now exists, we do need to fix it. Then
> > we can revisit this at some later time as part of a big project.
> 
> Forgot, I should include links to the material I've been reading:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=f70fb3b635f9618c6d2ee3848ba836914f7951c2
> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=ab876106eb689947cdd8203f8ecc6e8ac38bf5ba
> 
> which is where the GCC folks seem to have encountered and fixed their
> corresponding issue.

Proposed patch attached.

Rich

[-- Attachment #2: aarch64-barrier.diff --]
[-- Type: text/plain, Size: 1370 bytes --]

diff --git a/arch/aarch64/atomic_arch.h b/arch/aarch64/atomic_arch.h
index 40fefc25..c01a3ab3 100644
--- a/arch/aarch64/atomic_arch.h
+++ b/arch/aarch64/atomic_arch.h
@@ -2,7 +2,7 @@
 static inline int a_ll(volatile int *p)
 {
 	int v;
-	__asm__ __volatile__ ("ldaxr %w0,%1" : "=r"(v) : "Q"(*p));
+	__asm__ __volatile__ ("ldxr %w0,%1" : "=r"(v) : "Q"(*p));
 	return v;
 }
 
@@ -20,25 +20,13 @@ static inline void a_barrier()
 	__asm__ __volatile__ ("dmb ish" : : : "memory");
 }
 
-#define a_cas a_cas
-static inline int a_cas(volatile int *p, int t, int s)
-{
-	int old;
-	do {
-		old = a_ll(p);
-		if (old != t) {
-			a_barrier();
-			break;
-		}
-	} while (!a_sc(p, s));
-	return old;
-}
+#define a_post_llsc a_barrier
 
 #define a_ll_p a_ll_p
 static inline void *a_ll_p(volatile void *p)
 {
 	void *v;
-	__asm__ __volatile__ ("ldaxr %0, %1" : "=r"(v) : "Q"(*(void *volatile *)p));
+	__asm__ __volatile__ ("ldxr %0, %1" : "=r"(v) : "Q"(*(void *volatile *)p));
 	return v;
 }
 
@@ -50,20 +38,6 @@ static inline int a_sc_p(volatile int *p, void *v)
 	return !r;
 }
 
-#define a_cas_p a_cas_p
-static inline void *a_cas_p(volatile void *p, void *t, void *s)
-{
-	void *old;
-	do {
-		old = a_ll_p(p);
-		if (old != t) {
-			a_barrier();
-			break;
-		}
-	} while (!a_sc_p(p, s));
-	return old;
-}
-
 #define a_ctz_64 a_ctz_64
 static inline int a_ctz_64(uint64_t x)
 {

next prev parent reply	other threads:[~2022-10-04  5:00 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07  0:46 Alexey Izbyshev
2022-09-19 15:29 ` Rich Felker
2022-10-03  6:16   ` Alexey Izbyshev
2022-10-03 12:33     ` Rich Felker
2022-10-03 13:26     ` Szabolcs Nagy
2022-10-03 21:27       ` Szabolcs Nagy
2022-10-03 22:54         ` Rich Felker
2022-10-03 23:05           ` Rich Felker
2022-10-04 13:50             ` Alexey Izbyshev
2022-10-04 14:12               ` Rich Felker
2022-10-04 14:19                 ` Rich Felker
2022-10-04 15:43                   ` Alexey Izbyshev
2022-10-04 15:57                     ` Rich Felker
2022-10-04 18:15                       ` Alexey Izbyshev
2022-10-04 23:21                         ` Rich Felker
2022-10-04 16:24                 ` James Y Knight
2022-10-04 16:45                   ` Rich Felker
2022-10-05 13:52                     ` James Y Knight
2022-10-04 16:01               ` Alexey Izbyshev
2022-10-04  2:58         ` Rich Felker
2022-10-04  3:00           ` Rich Felker
2022-10-04  4:59             ` Rich Felker [this message]
2022-10-04  8:16               ` Szabolcs Nagy
2022-10-04 10:18               ` Alexey Izbyshev
2022-10-04  5:16         ` Alexey Izbyshev
2022-10-04  8:31           ` Szabolcs Nagy
2022-10-04 10:28             ` Alexey Izbyshev
2022-10-05  1:00 ` Rich Felker
2022-10-05 12:10   ` Alexey Izbyshev
2022-10-05 14:03     ` Rich Felker
2022-10-05 14:37       ` Rich Felker
2022-10-05 16:23         ` Alexey Izbyshev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221004045951.GK29905@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=izbyshev@ispras.ru \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).