From mboxrd@z Thu Jan  1 00:00:00 1970
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
In-reply-to: Your message of "Sat, 07 May 2011 18:47:54 EDT."
	<e1a9cc06540e2f7da4e2c5cf2389a00f@ladd.quanstro.net>
References: <f7a852a029ef39ffa039b8ff51293722@brasstown.quanstro.net>
	<30A0D4B5-1AAB-4D95-9B9F-FD09CB796E6D@bitblocks.com>
	<e1a9cc06540e2f7da4e2c5cf2389a00f@ladd.quanstro.net>
Date: Sat,  7 May 2011 16:10:19 -0700
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20110507231019.298C0B827@mail.bitblocks.com>
Subject: Re: [9fans] _xinc vs ainc
Topicbox-Message-UUID: e0de91f2-ead6-11e9-9d60-3106f5b1d025

On Sat, 07 May 2011 18:47:54 EDT erik quanstrom <quanstro@quanstro.net>  wrote:
> > Just guessing. May be the new code allows more concurrency? If the
> > value is not in the processor cache, will the old code block other
> > processors for much longer? The new code forces caching with the first
> > read so may be high likelyhood cmpxchg will finish faster. I haven't
> > studied x86 cache behavior so this guess could be completely wrong.
> > Suggest asking on comp.arch where people like Andy Glew can give you a
> > definitive answer.
>
> according to intel, this is a myth.  search for "myth" in this page.
>
> http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-f
> or-multi-core-intel-em64t-and-ia32-architectures/
>
> and this stands to reason, since both techniques revolve around a
> LOCK'd instruction, thus invoking the x86 architectural MESI(f)
> protocol.
>
> the difference, and my main point is that the loop in ainc means
> that it is not a wait-free algorithm.  this is not only sub optimal,
> but also could lead to incorrect behavior.

I think a more likely possibility for the change is to have a
*copy* of what was incremented. lock incl 0(ax) won't tell you
what the value was when it was incremented.

But I don't see how the change will lead to an incorrect behavior.