From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 13499 invoked from network); 4 Oct 2022 14:13:02 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 4 Oct 2022 14:13:02 -0000
Received: (qmail 1650 invoked by uid 550); 4 Oct 2022 14:12:59 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 1618 invoked from network); 4 Oct 2022 14:12:58 -0000
Date: Tue, 4 Oct 2022 10:12:42 -0400
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Message-ID: <20221004141242.GL29905@brightrain.aerifal.cx>
References: <b78eff8d05538224665e71a2fd985c13@ispras.ru>
 <20220919152937.GQ9709@brightrain.aerifal.cx>
 <ae1b4d72f2de2fbf859ef1df9a2f7ff9@ispras.ru>
 <20221003132615.GF2158779@port70.net>
 <20221003212705.GG2158779@port70.net>
 <20221003225416.GG29905@brightrain.aerifal.cx>
 <20221003230505.GH29905@brightrain.aerifal.cx>
 <2e77a700561a059e85daad5311306cfb@ispras.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2e77a700561a059e85daad5311306cfb@ispras.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] Illegal killlock skipping when transitioning to
 single-threaded state

On Tue, Oct 04, 2022 at 04:50:00PM +0300, Alexey Izbyshev wrote:
> On 2022-10-04 02:05, Rich Felker wrote:
> >On Mon, Oct 03, 2022 at 06:54:17PM -0400, Rich Felker wrote:
> >>On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote:
> >>> * Szabolcs Nagy <nsz@port70.net> [2022-10-03 15:26:15 +0200]:
> >>>
> >>> > * Alexey Izbyshev <izbyshev@ispras.ru> [2022-10-03 09:16:03 +0300]:
> >>> > > On 2022-09-19 18:29, Rich Felker wrote:
> >>> > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
> >>> > ...
> >>> > > > > Reordering the "libc.need_locks = -1" assignment and
> >>> > > > > UNLOCK(E->killlock) and providing a store barrier between them
> >>> > > > > should fix the issue.
> >>> > > >
> >>> > > > I think this all sounds correct. I'm not sure what you mean by a store
> >>> > > > barrier between them, since all lock and unlock operations are already
> >>> > > > full barriers.
> >>> > > >
> >>> > >
> >>> > > Before sending the report I tried to infer the intended ordering semantics
> >>> > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
> >>> > > see why they would provide a full barrier (my reasoning is below), so I
> >>> > > concluded that probably acquire/release semantics was intended in general
> >>> > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
> >>> > > = -1" store spelled after UNLOCK(E->killlock) back into the critical
> >>> > > section.
> >>> > >
> >>> > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
> >>> > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
> >>> > > via load-acquire/store-release instructions. Therefore, if we consider a
> >>> > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
> >>> > > such memory access can be reordered with the initial ldaxr in UNLOCK, and
> >>> > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
> >>> > > the processor predicts that stlxr succeeds), and further, due to (a), with
> >>> > > any memory access inside the critical section. Therefore, UNLOCK is not full
> >>> > > barrier. Is this right?
> >>> >
> >>> > i dont think this is right.
> >>>
> >>>
> >>> i think i was wrong and you are right.
> >>>
> >>> so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
> >>> starting with 'something == 0' the exiting E and remaining R threads:
> >>>
> >>> E:something=1      // protected by killlock
> >>> E:UNLOCK(killlock)
> >>> E:need_locks=-1
> >>>
> >>> R:LOCK(unrelated)  // reads need_locks == -1
> >>> R:need_locks=0
> >>> R:UNLOCK(unrelated)
> >>> R:LOCK(killlock)   // does not lock
> >>> R:read something   // can it be 0 ?
> >>>
> >>> and here something can be 0 (ie. not protected by killlock) on aarch64
> >>> because
> >>>
> >>> T1
> >>> 	something=1
> >>> 	ldaxr ... killlock
> >>> 	stlxr ... killlock
> >>> 	need_locks=-1
> >>>
> >>> T2
> >>> 	x=need_locks
> >>> 	ldaxr ... unrelated
> >>> 	stlxr ... unrelated
> >>> 	y=something
> >>>
> >>> can end with x==-1 and y==0.
> >>>
> >>> and to fix it, both a_fetch_add and a_cas need an a_barrier.
> >>>
> >>> i need to think how to support such lock usage on aarch64
> >>> without adding too many dmb.
> >>
> >>I don't really understand this, but FWIW gcc emits
> >>
> >>    ldxr
> >>    ...
> >>    stlxr
> >>    ...
> >>    dmb ish
> >>
> >>for __sync_val_compare_and_swap. So this is probably the right thing
> >>we should have. And it seems to match what the kernel folks discussed
> >>here:
> >>
> >>http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229588.html
> >>
> >>I wondered if there are similar issues for any others archs which need
> >>review, but it looks like all the other llsc archs have explicit
> >>pre/post barriers defined.
> >
> >Actually I don't understand what's going on with cmpxchg there. The
> >patch I linked has it using ldxr/stxr (not stlxr) for cmpxchg. There's
> >some follow-up in the thread I don't understand, about the case where
> >the cas fails, but we already handle that by doing an explicit barrier
> >in that case.
> >
> I think in that follow-up[1] they mean the following case (in musl
> terms):
> 
> volatile int x, flag;
> 
> T1:
>     x = 1;
>     a_store(&flag, 1);
> 
> T2:
>     while (!flag);
>     a_cas(&x, 0, 1); // can this fail?
> 
> They want it to never fail. But if a_cas() is implemented as
> ldrx/stlrx/dmb, this is not guaranteed because ldxr can be reordered
> with the load of flag.
> 
> Note that musl does *not* handle this now, because a_barrier() in
> the failure path is after a_ll().
> 
> [1] https://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229693.html

OK, then indeed this too needs to be fixed -- the a_cas is failing to
synchronize with the a_store. How do we do that? Based on my
understanding, my proposed patch doesn't fix it.

Do we need ldarx/stlrx/dmb? Or perhaps relegate the extra
synchronization to a retry in the case where the comparison fails?

If this is actually the case, it's disturbing that GCC does not seem
to be getting it right either...

Rich