From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Tue, 20 May 2014 15:30:32 -0400 To: 9fans@9fans.net Message-ID: In-Reply-To: References: <1203fd06e0a13df9c23d4a5e1ac1aad0@ladd.quanstro.net> <3d8d9b061e102356bda61520e1682072@ivey> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [9fans] waitfree Topicbox-Message-UUID: ea594b1c-ead8-11e9-9d60-3106f5b1d025 > I can't think of any reason it should be implemented in that way as > long as the cache protocol has a total order (which it must given that > the =CE=BCops that generate the cache coherency protocol traffic have a > total order), a state transition from X to E can be done in a bounded > number of cycles. my understanding is that in this context this only means that different processors see the same order. it doesn't say anything about fairness. > The read function will try to find a value for addr in cache, then > from memory. If the LOCK-prefixed instruction's decomposed read =CE=BCo= p > results in this behavior, a RFO miss can and will happen multiple > times. This will stall the pipeline for multiple memory lookups. You > can detect this with pipeline stall performance counters that will be > measurably (with significance) higher on the starved threads. > Otherwise, the pipeline stall counter should closely match the RFO > miss and cache miss counters. yes. > For ainc() specifically, unless it was inlined (which ISTR the Plan 9 > C compilers don't do, but you'd know that way better than me), I can't > imagine that screwing things up. The MOV's can't be LOCK-prepended > anyway (nor do they deal with memory), and this gives other processors > time to do cache coherency traffic. it doesn't matter if this is hard to do. if it is possible under any cir= cumstances, with any protcol-adhering implementation, then the assertion that amd64 lock is wait-free is false. - erik