mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [musl] Illegal killlock skipping when transitioning to single-threaded state
Date: Tue, 4 Oct 2022 10:12:42 -0400	[thread overview]
Message-ID: <20221004141242.GL29905@brightrain.aerifal.cx> (raw)
In-Reply-To: <2e77a700561a059e85daad5311306cfb@ispras.ru>

On Tue, Oct 04, 2022 at 04:50:00PM +0300, Alexey Izbyshev wrote:
> On 2022-10-04 02:05, Rich Felker wrote:
> >On Mon, Oct 03, 2022 at 06:54:17PM -0400, Rich Felker wrote:
> >>On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote:
> >>> * Szabolcs Nagy <nsz@port70.net> [2022-10-03 15:26:15 +0200]:
> >>>
> >>> > * Alexey Izbyshev <izbyshev@ispras.ru> [2022-10-03 09:16:03 +0300]:
> >>> > > On 2022-09-19 18:29, Rich Felker wrote:
> >>> > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
> >>> > ...
> >>> > > > > Reordering the "libc.need_locks = -1" assignment and
> >>> > > > > UNLOCK(E->killlock) and providing a store barrier between them
> >>> > > > > should fix the issue.
> >>> > > >
> >>> > > > I think this all sounds correct. I'm not sure what you mean by a store
> >>> > > > barrier between them, since all lock and unlock operations are already
> >>> > > > full barriers.
> >>> > > >
> >>> > >
> >>> > > Before sending the report I tried to infer the intended ordering semantics
> >>> > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
> >>> > > see why they would provide a full barrier (my reasoning is below), so I
> >>> > > concluded that probably acquire/release semantics was intended in general
> >>> > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
> >>> > > = -1" store spelled after UNLOCK(E->killlock) back into the critical
> >>> > > section.
> >>> > >
> >>> > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
> >>> > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
> >>> > > via load-acquire/store-release instructions. Therefore, if we consider a
> >>> > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
> >>> > > such memory access can be reordered with the initial ldaxr in UNLOCK, and
> >>> > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
> >>> > > the processor predicts that stlxr succeeds), and further, due to (a), with
> >>> > > any memory access inside the critical section. Therefore, UNLOCK is not full
> >>> > > barrier. Is this right?
> >>> >
> >>> > i dont think this is right.
> >>>
> >>>
> >>> i think i was wrong and you are right.
> >>>
> >>> so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
> >>> starting with 'something == 0' the exiting E and remaining R threads:
> >>>
> >>> E:something=1      // protected by killlock
> >>> E:UNLOCK(killlock)
> >>> E:need_locks=-1
> >>>
> >>> R:LOCK(unrelated)  // reads need_locks == -1
> >>> R:need_locks=0
> >>> R:UNLOCK(unrelated)
> >>> R:LOCK(killlock)   // does not lock
> >>> R:read something   // can it be 0 ?
> >>>
> >>> and here something can be 0 (ie. not protected by killlock) on aarch64
> >>> because
> >>>
> >>> T1
> >>> 	something=1
> >>> 	ldaxr ... killlock
> >>> 	stlxr ... killlock
> >>> 	need_locks=-1
> >>>
> >>> T2
> >>> 	x=need_locks
> >>> 	ldaxr ... unrelated
> >>> 	stlxr ... unrelated
> >>> 	y=something
> >>>
> >>> can end with x==-1 and y==0.
> >>>
> >>> and to fix it, both a_fetch_add and a_cas need an a_barrier.
> >>>
> >>> i need to think how to support such lock usage on aarch64
> >>> without adding too many dmb.
> >>
> >>I don't really understand this, but FWIW gcc emits
> >>
> >>    ldxr
> >>    ...
> >>    stlxr
> >>    ...
> >>    dmb ish
> >>
> >>for __sync_val_compare_and_swap. So this is probably the right thing
> >>we should have. And it seems to match what the kernel folks discussed
> >>here:
> >>
> >>http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229588.html
> >>
> >>I wondered if there are similar issues for any others archs which need
> >>review, but it looks like all the other llsc archs have explicit
> >>pre/post barriers defined.
> >
> >Actually I don't understand what's going on with cmpxchg there. The
> >patch I linked has it using ldxr/stxr (not stlxr) for cmpxchg. There's
> >some follow-up in the thread I don't understand, about the case where
> >the cas fails, but we already handle that by doing an explicit barrier
> >in that case.
> >
> I think in that follow-up[1] they mean the following case (in musl
> terms):
> 
> volatile int x, flag;
> 
> T1:
>     x = 1;
>     a_store(&flag, 1);
> 
> T2:
>     while (!flag);
>     a_cas(&x, 0, 1); // can this fail?
> 
> They want it to never fail. But if a_cas() is implemented as
> ldrx/stlrx/dmb, this is not guaranteed because ldxr can be reordered
> with the load of flag.
> 
> Note that musl does *not* handle this now, because a_barrier() in
> the failure path is after a_ll().
> 
> [1] https://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229693.html

OK, then indeed this too needs to be fixed -- the a_cas is failing to
synchronize with the a_store. How do we do that? Based on my
understanding, my proposed patch doesn't fix it.

Do we need ldarx/stlrx/dmb? Or perhaps relegate the extra
synchronization to a retry in the case where the comparison fails?

If this is actually the case, it's disturbing that GCC does not seem
to be getting it right either...

Rich

  reply	other threads:[~2022-10-04 14:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07  0:46 Alexey Izbyshev
2022-09-19 15:29 ` Rich Felker
2022-10-03  6:16   ` Alexey Izbyshev
2022-10-03 12:33     ` Rich Felker
2022-10-03 13:26     ` Szabolcs Nagy
2022-10-03 21:27       ` Szabolcs Nagy
2022-10-03 22:54         ` Rich Felker
2022-10-03 23:05           ` Rich Felker
2022-10-04 13:50             ` Alexey Izbyshev
2022-10-04 14:12               ` Rich Felker [this message]
2022-10-04 14:19                 ` Rich Felker
2022-10-04 15:43                   ` Alexey Izbyshev
2022-10-04 15:57                     ` Rich Felker
2022-10-04 18:15                       ` Alexey Izbyshev
2022-10-04 23:21                         ` Rich Felker
2022-10-04 16:24                 ` James Y Knight
2022-10-04 16:45                   ` Rich Felker
2022-10-05 13:52                     ` James Y Knight
2022-10-04 16:01               ` Alexey Izbyshev
2022-10-04  2:58         ` Rich Felker
2022-10-04  3:00           ` Rich Felker
2022-10-04  4:59             ` Rich Felker
2022-10-04  8:16               ` Szabolcs Nagy
2022-10-04 10:18               ` Alexey Izbyshev
2022-10-04  5:16         ` Alexey Izbyshev
2022-10-04  8:31           ` Szabolcs Nagy
2022-10-04 10:28             ` Alexey Izbyshev
2022-10-05  1:00 ` Rich Felker
2022-10-05 12:10   ` Alexey Izbyshev
2022-10-05 14:03     ` Rich Felker
2022-10-05 14:37       ` Rich Felker
2022-10-05 16:23         ` Alexey Izbyshev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221004141242.GL29905@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).