From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: References: <1203fd06e0a13df9c23d4a5e1ac1aad0@ladd.quanstro.net> <3d8d9b061e102356bda61520e1682072@ivey> Date: Tue, 20 May 2014 16:32:43 -0400 Message-ID: From: "Devon H. O'Dell" To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [9fans] waitfree Topicbox-Message-UUID: ea941a12-ead8-11e9-9d60-3106f5b1d025 2014-05-20 15:30 GMT-04:00 erik quanstrom : >> I can't think of any reason it should be implemented in that way as >> long as the cache protocol has a total order (which it must given that >> the =CE=BCops that generate the cache coherency protocol traffic have a >> total order), a state transition from X to E can be done in a bounded >> number of cycles. > > my understanding is that in this context this only means that different > processors see the same order. it doesn't say anything about fairness. You're right, it doesn't. [1] explores fairness of the GQ on Nehalem and Westmere and concludes: "[...] if the GQ is contended, the Nehalem mi- croarchitecture is unfair towards local cores (vs. the QPI), as the performance degradation local cores experience is larger than that of the QPI. Still, this behavior is reasonable as the GQ does not allow remote cores to starve, and thus it avoids further aggravating the penalty of remote memory accesses. Nevertheless, this property of the Nehalem is undocumented and can be discovered only with experimental evaluation." It continues to conclude that Westmere behaves in the same fashion. However, although unfair, the research shows the performance is bounded on available bandwidth of GQ and QPI. It shows these bounds can favor remote access, leading us to wonder about cache locality arguments under contention). However, that it shows there are bounds is enough for it to show that the cache coherency protocol is wait-free. Wait-freedom does not imply fairness. See also [2] for more pretty graphs and some additional information if you can view ppt slides. >> For ainc() specifically, unless it was inlined (which ISTR the Plan 9 >> C compilers don't do, but you'd know that way better than me), I can't >> imagine that screwing things up. The MOV's can't be LOCK-prepended >> anyway (nor do they deal with memory), and this gives other processors >> time to do cache coherency traffic. > > it doesn't matter if this is hard to do. if it is possible under any cir= cumstances, > with any protcol-adhering implementation, then the assertion that amd64 > lock is wait-free is false. I disagree if you are arguing that any general implementation is only lock-free because the protocol can be wait-free. I will agree it is possible for a specific implementation provided you can prove that implementation can starve local or remote requests. But at least for Westmere and Nehalem, [1] shows wait-freedom. > - erik --dho [1] http://people.inf.ethz.ch/zmajo/publications/11-systor.pdf [2] http://people.inf.ethz.ch/zmajo/public_talks/2011-05-31-systor_2011.ppt