From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 18004 invoked from network); 4 Oct 2022 10:18:50 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 4 Oct 2022 10:18:50 -0000
Received: (qmail 15883 invoked by uid 550); 4 Oct 2022 10:18:47 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 15830 invoked from network); 4 Oct 2022 10:18:46 -0000
DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru D690140D403D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru;
	s=default; t=1664878711;
	bh=duCweAkniQuYUXwqE4jAdLsQxePaA/m84/7qRpkcPos=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=Y0+4/3wRddxjDOHNIJKBEwMrB85cC76mQGFSSQ2phmRF6ZfA8DP6cMmL2BolA2ZDs
	 b7D4kywugo2Jnytci0Jo/n8NrKEJYiWj3LU48HDqtVki9GpS0DNkp9MpRfZQ4MOmIe
	 YNpCpodmZUn3vX3kHoIqUhc3+5OmIIbdwVqgUOaw=
MIME-Version: 1.0
Date: Tue, 04 Oct 2022 13:18:31 +0300
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
Cc: musl@lists.openwall.com
In-Reply-To: <20221004045951.GK29905@brightrain.aerifal.cx>
References: <b78eff8d05538224665e71a2fd985c13@ispras.ru>
 <20220919152937.GQ9709@brightrain.aerifal.cx>
 <ae1b4d72f2de2fbf859ef1df9a2f7ff9@ispras.ru>
 <20221003132615.GF2158779@port70.net> <20221003212705.GG2158779@port70.net>
 <20221004025831.GI29905@brightrain.aerifal.cx>
 <20221004030020.GJ29905@brightrain.aerifal.cx>
 <20221004045951.GK29905@brightrain.aerifal.cx>
User-Agent: Roundcube Webmail/1.4.4
Message-ID: <b2628dbba894de61664271a42d320090@ispras.ru>
X-Sender: izbyshev@ispras.ru
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [musl] Illegal killlock skipping when transitioning to
 single-threaded state

On 2022-10-04 07:59, Rich Felker wrote:
> On Mon, Oct 03, 2022 at 11:00:21PM -0400, Rich Felker wrote:
>> On Mon, Oct 03, 2022 at 10:58:32PM -0400, Rich Felker wrote:
>> > On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote:
>> > > * Szabolcs Nagy <nsz@port70.net> [2022-10-03 15:26:15 +0200]:
>> > >
>> > > > * Alexey Izbyshev <izbyshev@ispras.ru> [2022-10-03 09:16:03 +0300]:
>> > > > > On 2022-09-19 18:29, Rich Felker wrote:
>> > > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
>> > > > ...
>> > > > > > > Reordering the "libc.need_locks = -1" assignment and
>> > > > > > > UNLOCK(E->killlock) and providing a store barrier between them
>> > > > > > > should fix the issue.
>> > > > > >
>> > > > > > I think this all sounds correct. I'm not sure what you mean by a store
>> > > > > > barrier between them, since all lock and unlock operations are already
>> > > > > > full barriers.
>> > > > > >
>> > > > >
>> > > > > Before sending the report I tried to infer the intended ordering semantics
>> > > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
>> > > > > see why they would provide a full barrier (my reasoning is below), so I
>> > > > > concluded that probably acquire/release semantics was intended in general
>> > > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
>> > > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical
>> > > > > section.
>> > > > >
>> > > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
>> > > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
>> > > > > via load-acquire/store-release instructions. Therefore, if we consider a
>> > > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
>> > > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and
>> > > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
>> > > > > the processor predicts that stlxr succeeds), and further, due to (a), with
>> > > > > any memory access inside the critical section. Therefore, UNLOCK is not full
>> > > > > barrier. Is this right?
>> > > >
>> > > > i dont think this is right.
>> > >
>> > >
>> > > i think i was wrong and you are right.
>> > >
>> > > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
>> > > starting with 'something == 0' the exiting E and remaining R threads:
>> > >
>> > > E:something=1      // protected by killlock
>> > > E:UNLOCK(killlock)
>> > > E:need_locks=-1
>> > >
>> > > R:LOCK(unrelated)  // reads need_locks == -1
>> > > R:need_locks=0
>> > > R:UNLOCK(unrelated)
>> > > R:LOCK(killlock)   // does not lock
>> > > R:read something   // can it be 0 ?
>> > >
>> > > and here something can be 0 (ie. not protected by killlock) on aarch64
>> > > because
>> > >
>> > > T1
>> > > 	something=1
>> > > 	ldaxr ... killlock
>> > > 	stlxr ... killlock
>> > > 	need_locks=-1
>> > >
>> > > T2
>> > > 	x=need_locks
>> > > 	ldaxr ... unrelated
>> > > 	stlxr ... unrelated
>> > > 	y=something
>> > >
>> > > can end with x==-1 and y==0.
>> > >
>> > > and to fix it, both a_fetch_add and a_cas need an a_barrier.
>> > >
>> > > i need to think how to support such lock usage on aarch64
>> > > without adding too many dmb.
>> >
>> > OK, after reading a lot more, I think I'm starting to get what you're
>> > saying. Am I correct in my understanding that the problem is that the
>> > "R:LOCK(unrelated)" as implemented does not synchronize with the
>> > "E:UNLOCK(killlock)" because they're different objects?
>> >
>> > If so, I think this would be fully solved by using __tl_sync in the
>> > code path that resets need_locks to 0 after observing -1, by virtue of
>> > providing a common object (the thread list lock) to synchronize on.
>> > This is the "weaker memory model friendly" approach we should probably
>> > strive to achieve some day.
>> >
>> > However, all existing code in musl is written assuming what I call the
>> > "POSIX memory model" where the only operation is "synchronizes memory"
>> > and that underspecified phrase has to be interpreted as "is a full
>> > barrier" to admit any consistent model. Said differently, the code was
>> > written assuming every a_* synchronizes with every other a_*, without
>> > any regard for whether they act on the same objects. This likely even
>> > matters for how our waiter accounting works (which is probably a good
>> > argument for removing it and switching to waiter flags). So I think,
>> > if the issue as I understand it now exists, we do need to fix it. Then
>> > we can revisit this at some later time as part of a big project.
>> 
>> Forgot, I should include links to the material I've been reading:
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
>> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=f70fb3b635f9618c6d2ee3848ba836914f7951c2
>> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=ab876106eb689947cdd8203f8ecc6e8ac38bf5ba
>> 
>> which is where the GCC folks seem to have encountered and fixed their
>> corresponding issue.

Thank you for the links.

This comment[1] addresses my doubts on whether a store following (the 
current implementation of) UNLOCK can become visible before stlxr 
despite the conditional branch instruction in between (yes, it can).

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697#c23
> 
> Proposed patch attached.
> 
The patch looks good to me.

Alexey