From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Mon, 19 May 2014 16:21:55 -0400
To: 9fans@9fans.net
Message-ID: <a593342e60fac111d7ca7146cdca7ff8@ladd.quanstro.net>
In-Reply-To: <CAFgOgC-fuknRtNM1mVK+S2g5LYTL+z3wCR0iZOdUSoqi3Fo51Q@mail.gmail.com>
References: <ac0a74a8fd1972cd1c72c7b5983fdc4e@ladd.quanstro.net>
	<CAFgOgC-fuknRtNM1mVK+S2g5LYTL+z3wCR0iZOdUSoqi3Fo51Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [9fans] waitfree
Topicbox-Message-UUID: e8c7f4b0-ead8-11e9-9d60-3106f5b1d025

On Mon May 19 15:51:27 EDT 2014, devon.odell@gmail.com wrote:
> The LOCK prefix is effectively a tiny, tiny mutex, but for all intents
> and purposes, this is wait-free. The LOCK prefix forces N processors
> to synchronize on bus and cache operations and this is how there is a
> guarantee of an atomic read or an atomic write. For instructions like
> cmpxchg and xadd where reads and writes are implied, the bus is locked
> for the duration of the instruction.
>=20
> Wait-freedom provides a guarantee on an upper-bound for completion.
> The pipeline will not be reordered around a LOCK instruction, so your
> instruction will cause a pipeline stall. But you are guaranteed to be
> bounded on the time to complete the instruction, and the instruction
> decomposed into =CE=BCops will not be preempted as the =CE=BCops are ex=
ecuted.

there is no bus.

what LOCK really does is invoke part of the MSEI protocol.  the state
diagrams i've seen do not specifiy how this is arbitrated if there are > =
1
processor trying to gain exclusive access to the same cacheline.

> Wait-freedom is defined by every operation having a bound on the
> number of steps required before the operation completes. In this case,
> you are bound by the number of =CE=BCops of XADDL + latency to memory. =
This
> is a finite number, so this is wait-freedom.

i'm worried about the bound on the number of MSEI rounds.  i don't see
where the memory coherency protocol states that if there are n processors
a cacheline will be acquired in at most f(n) rounds.

- erik