From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
In-Reply-To: <a510bd8992296bf5e9183dbfba97122c@ladd.quanstro.net>
References: <ac0a74a8fd1972cd1c72c7b5983fdc4e@ladd.quanstro.net>
	<CAFgOgC-fuknRtNM1mVK+S2g5LYTL+z3wCR0iZOdUSoqi3Fo51Q@mail.gmail.com>
	<a593342e60fac111d7ca7146cdca7ff8@ladd.quanstro.net>
	<CAFgOgC91+3NNYAfp0FYS6uEaktWhLfNfr=xcVWFGJBS_-_-i1Q@mail.gmail.com>
	<c50dff498502b93c8e9513c0cbbb1ffd@ivey>
	<CAFgOgC-=2z7JknR74-4geMwm2haV52FgtVA+U9wmOU11oTwrxQ@mail.gmail.com>
	<1203fd06e0a13df9c23d4a5e1ac1aad0@ladd.quanstro.net>
	<CAFgOgC-+doRSYxihs02gCR--_cPKnvMyEswLfyEZsEsYVd90MA@mail.gmail.com>
	<3d8d9b061e102356bda61520e1682072@ivey>
	<CAFgOgC8X+sX74OS3RxG0kE6h_Dn42QstWdAK8NHsGhaE1L0OAQ@mail.gmail.com>
	<a510bd8992296bf5e9183dbfba97122c@ladd.quanstro.net>
Date: Tue, 20 May 2014 16:32:43 -0400
Message-ID: <CAFgOgC_axKmER5SpBMAOzTvRM1BHEoAqz=S7d89iPyk0+bEAzQ@mail.gmail.com>
From: "Devon H. O'Dell" <devon.odell@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [9fans] waitfree
Topicbox-Message-UUID: ea941a12-ead8-11e9-9d60-3106f5b1d025

2014-05-20 15:30 GMT-04:00 erik quanstrom <quanstro@quanstro.net>:
>> I can't think of any reason it should be implemented in that way as
>> long as the cache protocol has a total order (which it must given that
>> the =CE=BCops that generate the cache coherency protocol traffic have a
>> total order), a state transition from X to E can be done in a bounded
>> number of cycles.
>
> my understanding is that in this context this only means that different
> processors see the same order.  it doesn't say anything about fairness.

You're right, it doesn't. [1] explores fairness of the GQ on Nehalem
and Westmere and concludes:

"[...] if the GQ is contended, the Nehalem mi-
croarchitecture is unfair towards local cores (vs. the QPI), as
the performance degradation local cores experience is larger
than that of the QPI. Still, this behavior is reasonable as the
GQ does not allow remote cores to starve, and thus it avoids
further aggravating the penalty of remote memory accesses.
Nevertheless, this property of the Nehalem is undocumented
and can be discovered only with experimental evaluation."

It continues to conclude that Westmere behaves in the same fashion.
However, although unfair, the research shows the performance is
bounded on available bandwidth of GQ and QPI. It shows these bounds
can favor remote access, leading us to wonder about cache locality
arguments under contention). However, that it shows there are bounds
is enough for it to show that the cache coherency protocol is
wait-free. Wait-freedom does not imply fairness.

See also [2] for more pretty graphs and some additional information if
you can view ppt slides.

>> For ainc() specifically, unless it was inlined (which ISTR the Plan 9
>> C compilers don't do, but you'd know that way better than me), I can't
>> imagine that screwing things up. The MOV's can't be LOCK-prepended
>> anyway (nor do they deal with memory), and this gives other processors
>> time to do cache coherency traffic.
>
> it doesn't matter if this is hard to do.  if it is possible under any cir=
cumstances,
> with any protcol-adhering implementation, then the assertion that amd64
> lock is wait-free is false.

I disagree if you are arguing that any general implementation is only
lock-free because the protocol can be wait-free. I will agree it is
possible for a specific implementation provided you can prove that
implementation can starve local or remote requests. But at least for
Westmere and Nehalem, [1] shows wait-freedom.

> - erik

--dho

[1] http://people.inf.ethz.ch/zmajo/publications/11-systor.pdf
[2] http://people.inf.ethz.ch/zmajo/public_talks/2011-05-31-systor_2011.ppt