9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] _xinc vs ainc
@ 2011-05-07 13:05 erik quanstrom
  2011-05-07 19:33 ` Bakul Shah
  2011-05-08  6:00 ` ron minnich
  0 siblings, 2 replies; 9+ messages in thread
From: erik quanstrom @ 2011-05-07 13:05 UTC (permalink / raw)
  To: 9fans

i'm confused by the recent change to the thread library.
the old code was simply to do a locked incl.  the new code
does a locked exchange /within a loop/ until it's seen that
nobody else has updated the value at the same time, thus
insuring that the value has indeed been updated.

since the expensive operation is the MESI(F) negotiation
behind the scenes to get exclusive access to the cacheline,
i don't understand the motiviation is for replacing _xinc
with ainc.  since ainc can loop on an expensive lock instruction.

that is, i think the old version was wait free, and the new version
is not.

can someone explain what i'm missing here?

thanks!

- erik

----

TEXT	_xinc(SB),$0	/* void _xinc(long *); */

	MOVL	l+0(FP),AX
	LOCK
	INCL	0(AX)
	RET

----

TEXT ainc(SB), $0	/* long ainc(long *); */
	MOVL	addr+0(FP), BX
ainclp:
	MOVL	(BX), AX
	MOVL	AX, CX
	INCL	CX
	LOCK
	BYTE	$0x0F; BYTE $0xB1; BYTE $0x0B	/* CMPXCHGL CX, (BX) */
	JNZ	ainclp
	MOVL	CX, AX
	RET



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-07 13:05 [9fans] _xinc vs ainc erik quanstrom
@ 2011-05-07 19:33 ` Bakul Shah
  2011-05-07 22:47   ` erik quanstrom
  2011-05-08  6:00 ` ron minnich
  1 sibling, 1 reply; 9+ messages in thread
From: Bakul Shah @ 2011-05-07 19:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On May 7, 2011, at 6:05 AM, erik quanstrom <quanstro@quanstro.net>
wrote:

> i'm confused by the recent change to the thread library.
> the old code was simply to do a locked incl.  the new code
> does a locked exchange /within a loop/ until it's seen that
> nobody else has updated the value at the same time, thus
> insuring that the value has indeed been updated.
>
> since the expensive operation is the MESI(F) negotiation
> behind the scenes to get exclusive access to the cacheline,
> i don't understand the motiviation is for replacing _xinc
> with ainc.  since ainc can loop on an expensive lock instruction.
>
> that is, i think the old version was wait free, and the new version
> is not.
>
> can someone explain what i'm missing here?

> thanks!
>
> - erik
>
> ----
>
> TEXT    _xinc(SB),$0    /* void _xinc(long *); */
>
>    MOVL    l+0(FP),AX
>    LOCK
>    INCL    0(AX)
>    RET
>
> ----
>
> TEXT ainc(SB), $0    /* long ainc(long *); */
>    MOVL    addr+0(FP), BX
> ainclp:
>    MOVL    (BX), AX
>    MOVL    AX, CX
>    INCL    CX
>    LOCK
>    BYTE    $0x0F; BYTE $0xB1; BYTE $0x0B    /* CMPXCHGL CX, (BX) */
>    JNZ    ainclp
>    MOVL    CX, AX
>    RET
>

Just guessing. May be the new code allows more concurrency? If the
value is not in the processor cache, will the old code block other
processors for much longer? The new code forces caching with the first
read so may be high likelyhood cmpxchg will finish faster. I haven't
studied x86 cache behavior so this guess could be completely wrong.
Suggest asking on comp.arch where people like Andy Glew can give you a
definitive answer.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-07 19:33 ` Bakul Shah
@ 2011-05-07 22:47   ` erik quanstrom
  2011-05-07 23:10     ` Bakul Shah
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2011-05-07 22:47 UTC (permalink / raw)
  To: 9fans

> Just guessing. May be the new code allows more concurrency? If the
> value is not in the processor cache, will the old code block other
> processors for much longer? The new code forces caching with the first
> read so may be high likelyhood cmpxchg will finish faster. I haven't
> studied x86 cache behavior so this guess could be completely wrong.
> Suggest asking on comp.arch where people like Andy Glew can give you a
> definitive answer.

according to intel, this is a myth.  search for "myth" in this page.

http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures/

and this stands to reason, since both techniques revolve around a
LOCK'd instruction, thus invoking the x86 architectural MESI(f)
protocol.

the difference, and my main point is that the loop in ainc means
that it is not a wait-free algorithm.  this is not only sub optimal,
but also could lead to incorrect behavior.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-07 22:47   ` erik quanstrom
@ 2011-05-07 23:10     ` Bakul Shah
  2011-05-08  0:25       ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Bakul Shah @ 2011-05-07 23:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 07 May 2011 18:47:54 EDT erik quanstrom <quanstro@quanstro.net>  wrote:
> > Just guessing. May be the new code allows more concurrency? If the
> > value is not in the processor cache, will the old code block other
> > processors for much longer? The new code forces caching with the first
> > read so may be high likelyhood cmpxchg will finish faster. I haven't
> > studied x86 cache behavior so this guess could be completely wrong.
> > Suggest asking on comp.arch where people like Andy Glew can give you a
> > definitive answer.
>
> according to intel, this is a myth.  search for "myth" in this page.
>
> http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-f
> or-multi-core-intel-em64t-and-ia32-architectures/
>
> and this stands to reason, since both techniques revolve around a
> LOCK'd instruction, thus invoking the x86 architectural MESI(f)
> protocol.
>
> the difference, and my main point is that the loop in ainc means
> that it is not a wait-free algorithm.  this is not only sub optimal,
> but also could lead to incorrect behavior.

I think a more likely possibility for the change is to have a
*copy* of what was incremented. lock incl 0(ax) won't tell you
what the value was when it was incremented.

But I don't see how the change will lead to an incorrect behavior.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-07 23:10     ` Bakul Shah
@ 2011-05-08  0:25       ` erik quanstrom
  2011-05-08  1:24         ` Bakul Shah
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2011-05-08  0:25 UTC (permalink / raw)
  To: 9fans

> > the difference, and my main point is that the loop in ainc means
> > that it is not a wait-free algorithm.  this is not only sub optimal,
> > but also could lead to incorrect behavior.
>
> I think a more likely possibility for the change is to have a
> *copy* of what was incremented. lock incl 0(ax) won't tell you
> what the value was when it was incremented.

you can read the code.  that value is not used by the thread library.

> But I don't see how the change will lead to an incorrect behavior.

could.

imagine you have two threads entering ainc.  the loser will
loop.  imagine that before the loser completes his loop a
third thread enters aintr and becomes a two-time loser.  by
induction it's possible that the loser never completes in n
loops for any given n.

this of course is basically the definition of a waiting algorithm.

if your program depends on time-bounded behavior from
the thread library, you could have trouble with a non-wait-free
algorithm like this.

perhaps my concern is unfounded.  i'd like to hear the argument.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-08  0:25       ` erik quanstrom
@ 2011-05-08  1:24         ` Bakul Shah
  2011-05-08  2:44           ` Venkatesh Srinivas
  0 siblings, 1 reply; 9+ messages in thread
From: Bakul Shah @ 2011-05-08  1:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 07 May 2011 20:25:25 EDT erik quanstrom <quanstro@quanstro.net>  wrote:
> > > the difference, and my main point is that the loop in ainc means
> > > that it is not a wait-free algorithm.  this is not only sub optimal,
> > > but also could lead to incorrect behavior.
> >
> > I think a more likely possibility for the change is to have a
> > *copy* of what was incremented. lock incl 0(ax) won't tell you
> > what the value was when it was incremented.
>
> you can read the code.  that value is not used by the thread library.

If you want to use the value being atomically incremented,
there is no more efficient way on x86. May not be used now but
may be it can be used to make some synchronization primitive
more efficient?

> if your program depends on time-bounded behavior from
> the thread library, you could have trouble with a non-wait-free
> algorithm like this.

Yes, but I think associating time-bounded behavior with any
shared memory access is iffy.  You always have this
possibility on processors that provide nothing stronger than
LL/SC (load-linked/stored-conditional).



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-08  1:24         ` Bakul Shah
@ 2011-05-08  2:44           ` Venkatesh Srinivas
  0 siblings, 0 replies; 9+ messages in thread
From: Venkatesh Srinivas @ 2011-05-08  2:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, May 7, 2011 at 9:24 PM, Bakul Shah <bakul@bitblocks.com> wrote:
> On Sat, 07 May 2011 20:25:25 EDT erik quanstrom <quanstro@quanstro.net>  wrote:
>> > > the difference, and my main point is that the loop in ainc means
>> > > that it is not a wait-free algorithm.  this is not only sub optimal,
>> > > but also could lead to incorrect behavior.
>> >
>> > I think a more likely possibility for the change is to have a
>> > *copy* of what was incremented. lock incl 0(ax) won't tell you
>> > what the value was when it was incremented.
>>
>> you can read the code.  that value is not used by the thread library.
>
> If you want to use the value being atomically incremented,
> there is no more efficient way on x86. May not be used now but
> may be it can be used to make some synchronization primitive
> more efficient?

XADD

-- vs



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-07 13:05 [9fans] _xinc vs ainc erik quanstrom
  2011-05-07 19:33 ` Bakul Shah
@ 2011-05-08  6:00 ` ron minnich
  2011-05-08 13:14   ` erik quanstrom
  1 sibling, 1 reply; 9+ messages in thread
From: ron minnich @ 2011-05-08  6:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

The type signature reveals all: ainc returns a long, and xinc is void.

You really can't test the value of the long * after you call xinc
because somebody else might have done an xinc after your xinc but
before you test the value.

I think, among others, floren and I needed ainc for devtrace years
ago. I am sure that there are even better reasons than that.

ainc is correct as written, as far as I can tell. I don't think the
definition of ainc really requires fairness, since life doesn't either
:-)

ron



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] _xinc vs ainc
  2011-05-08  6:00 ` ron minnich
@ 2011-05-08 13:14   ` erik quanstrom
  0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2011-05-08 13:14 UTC (permalink / raw)
  To: 9fans

On Sun May  8 02:01:57 EDT 2011, rminnich@gmail.com wrote:
> The type signature reveals all: ainc returns a long, and xinc is void.
>
> You really can't test the value of the long * after you call xinc
> because somebody else might have done an xinc after your xinc but
> before you test the value.

sorry ron, i don't see the motivation.  the thread library
does not use the return value.  if it did then clearly it would
have been wrong before.  i would think that to change the
code we would need a good reason.  and i see no reason
at all yet.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-05-08 13:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-07 13:05 [9fans] _xinc vs ainc erik quanstrom
2011-05-07 19:33 ` Bakul Shah
2011-05-07 22:47   ` erik quanstrom
2011-05-07 23:10     ` Bakul Shah
2011-05-08  0:25       ` erik quanstrom
2011-05-08  1:24         ` Bakul Shah
2011-05-08  2:44           ` Venkatesh Srinivas
2011-05-08  6:00 ` ron minnich
2011-05-08 13:14   ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).