From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
In-Reply-To: <a593342e60fac111d7ca7146cdca7ff8@ladd.quanstro.net>
References: <ac0a74a8fd1972cd1c72c7b5983fdc4e@ladd.quanstro.net>
	<CAFgOgC-fuknRtNM1mVK+S2g5LYTL+z3wCR0iZOdUSoqi3Fo51Q@mail.gmail.com>
	<a593342e60fac111d7ca7146cdca7ff8@ladd.quanstro.net>
Date: Mon, 19 May 2014 17:01:37 -0400
Message-ID: <CAFgOgC91+3NNYAfp0FYS6uEaktWhLfNfr=xcVWFGJBS_-_-i1Q@mail.gmail.com>
From: "Devon H. O'Dell" <devon.odell@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [9fans] waitfree
Topicbox-Message-UUID: e8dbad0c-ead8-11e9-9d60-3106f5b1d025

So you seem to be worried that N processors in a tight loop of LOCK
XADD could have a single processor. This isn't a problem because
locked instructions have total order. Section 8.2.3.8:

"The memory-ordering model ensures that all processors agree on a
single execution order of all locked instructions, including those
that are larger than 8 bytes or are not naturally aligned."



2014-05-19 16:21 GMT-04:00 erik quanstrom <quanstro@quanstro.net>:
>
> On Mon May 19 15:51:27 EDT 2014, devon.odell@gmail.com wrote:
> > The LOCK prefix is effectively a tiny, tiny mutex, but for all intents
> > and purposes, this is wait-free. The LOCK prefix forces N processors
> > to synchronize on bus and cache operations and this is how there is a
> > guarantee of an atomic read or an atomic write. For instructions like
> > cmpxchg and xadd where reads and writes are implied, the bus is locked
> > for the duration of the instruction.
> >
> > Wait-freedom provides a guarantee on an upper-bound for completion.
> > The pipeline will not be reordered around a LOCK instruction, so your
> > instruction will cause a pipeline stall. But you are guaranteed to be
> > bounded on the time to complete the instruction, and the instruction
> > decomposed into =CE=BCops will not be preempted as the =CE=BCops are ex=
ecuted.
>
> there is no bus.

What you mean is that the processor might not LOCK#, but it can. LOCK
will signal LOCK# on the bus if it thinks it has to. (And if whatever
is being ainc()'ed is ainc()'ed frequently, or whatever is being
ainc()'ed is just never in cache for whatever reason, that will
happen.)

> what LOCK really does is invoke part of the MSEI protocol.  the state
> diagrams i've seen do not specifiy how this is arbitrated if there are > =
1
> processor trying to gain exclusive access to the same cacheline.
>
> > Wait-freedom is defined by every operation having a bound on the
> > number of steps required before the operation completes. In this case,
> > you are bound by the number of =CE=BCops of XADDL + latency to memory. =
This
> > is a finite number, so this is wait-freedom.
>
> i'm worried about the bound on the number of MSEI rounds.  i don't see
> where the memory coherency protocol states that if there are n processors
> a cacheline will be acquired in at most f(n) rounds.

In Section 8.2.3.8 of the manual:

"8.2.3.8 Locked Instructions Have a Total Order

The memory-ordering model ensures that all processors agree on a
single execution order of all locked instructions, including those
that are larger than 8 bytes or are not naturally aligned."

Thus, LOCK-prefixed instructions will never cause processor
starvation, which seems to be your worry. They're wait-free.

--dho

> - erik
>