From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Tue, 20 May 2014 15:30:32 -0400
To: 9fans@9fans.net
Message-ID: <a510bd8992296bf5e9183dbfba97122c@ladd.quanstro.net>
In-Reply-To: <CAFgOgC8X+sX74OS3RxG0kE6h_Dn42QstWdAK8NHsGhaE1L0OAQ@mail.gmail.com>
References: <ac0a74a8fd1972cd1c72c7b5983fdc4e@ladd.quanstro.net>
	<CAFgOgC-fuknRtNM1mVK+S2g5LYTL+z3wCR0iZOdUSoqi3Fo51Q@mail.gmail.com>
	<a593342e60fac111d7ca7146cdca7ff8@ladd.quanstro.net>
	<CAFgOgC91+3NNYAfp0FYS6uEaktWhLfNfr=xcVWFGJBS_-_-i1Q@mail.gmail.com>
	<c50dff498502b93c8e9513c0cbbb1ffd@ivey>
	<CAFgOgC-=2z7JknR74-4geMwm2haV52FgtVA+U9wmOU11oTwrxQ@mail.gmail.com>
	<1203fd06e0a13df9c23d4a5e1ac1aad0@ladd.quanstro.net>
	<CAFgOgC-+doRSYxihs02gCR--_cPKnvMyEswLfyEZsEsYVd90MA@mail.gmail.com>
	<3d8d9b061e102356bda61520e1682072@ivey>
	<CAFgOgC8X+sX74OS3RxG0kE6h_Dn42QstWdAK8NHsGhaE1L0OAQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [9fans] waitfree
Topicbox-Message-UUID: ea594b1c-ead8-11e9-9d60-3106f5b1d025

> I can't think of any reason it should be implemented in that way as
> long as the cache protocol has a total order (which it must given that
> the =CE=BCops that generate the cache coherency protocol traffic have a
> total order), a state transition from X to E can be done in a bounded
> number of cycles.

my understanding is that in this context this only means that different
processors see the same order.  it doesn't say anything about fairness.

> The read function will try to find a value for addr in cache, then
> from memory. If the LOCK-prefixed instruction's decomposed read =CE=BCo=
p
> results in this behavior, a RFO miss can and will happen multiple
> times. This will stall the pipeline for multiple memory lookups. You
> can detect this with pipeline stall performance counters that will be
> measurably (with significance) higher on the starved threads.
> Otherwise, the pipeline stall counter should closely match the RFO
> miss and cache miss counters.

yes.

> For ainc() specifically, unless it was inlined (which ISTR the Plan 9
> C compilers don't do, but you'd know that way better than me), I can't
> imagine that screwing things up. The MOV's can't be LOCK-prepended
> anyway (nor do they deal with memory), and this gives other processors
> time to do cache coherency traffic.

it doesn't matter if this is hard to do.  if it is possible under any cir=
cumstances,
with any protcol-adhering implementation, then the assertion that amd64
lock is wait-free is false.

- erik