caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* STM support in OCaml
@ 2006-03-07 16:18 Asfand Yar Qazi
  2006-03-07 16:50 ` [Caml-list] " Sebastian Egner
  2006-03-07 17:15 ` [Caml-list] STM support in OCaml skaller
  0 siblings, 2 replies; 29+ messages in thread
From: Asfand Yar Qazi @ 2006-03-07 16:18 UTC (permalink / raw)
  To: caml-list

Hi,

I've temporarily decided to leave off learning OCaml (although I still intend 
to learn it at some point) and start learning Haskell due to its support for 
Software Transactional Memory and lock-free concurrent programming.

Are there any plans to add STM to OCaml?  And I'm talking highly integrated 
into the language, not just some bolt-on library.

Thanks


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 16:18 STM support in OCaml Asfand Yar Qazi
@ 2006-03-07 16:50 ` Sebastian Egner
  2006-03-07 17:44   ` Michael Hicks
  2006-03-07 17:15 ` [Caml-list] STM support in OCaml skaller
  1 sibling, 1 reply; 29+ messages in thread
From: Sebastian Egner @ 2006-03-07 16:50 UTC (permalink / raw)
  To: caml-list; +Cc: Asfand Yar Qazi

[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]

> Are there any plans to add STM to OCaml?  And I'm talking highly 
integrated 
> into the language, not just some bolt-on library.

I implemented 'some bolt-on library' as you call it for
myself (for understanding the details of STM) but did
not release it. The main lesseons I learned, however, are

a) A library will be nearly as good as something 'highly
integrated into the language'. The main difference is
that you cannot annotate fields to be transactional,
and define transactional record types; you need to
work with explicit transactional reference boxes and arrays.
This appears tolerable, and it is a lot simpler, both in
terms of implementation and in terms of using it.

b) The version of STM as designed for Haskell makes essential 
use of laziness for composing transactions in a very clean
way. You can construct more and more complex transactions
by not yet enclosing them into 'atomic'. This is probably
the most important benefit of STMs at all. Unfortunately,
this property is essentially impossible in OCaml: No matter
how highly integrated STMs are in the language, it will 
always be easy to write a program that uses ordinary (non-STM)
state to keep a record of something it shouldn't---messing
up semantics of course.

Sorry, I know this is bad news, but that is what I found.
I invite everybody to verify this for him- or herself,
or prove me wrong.

Sebastian.

[-- Attachment #2: Type: text/html, Size: 2249 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 16:18 STM support in OCaml Asfand Yar Qazi
  2006-03-07 16:50 ` [Caml-list] " Sebastian Egner
@ 2006-03-07 17:15 ` skaller
  2006-03-07 19:05   ` Asfand Yar Qazi
  1 sibling, 1 reply; 29+ messages in thread
From: skaller @ 2006-03-07 17:15 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

On Tue, 2006-03-07 at 16:18 +0000, Asfand Yar Qazi wrote:
> Hi,
> 
> I've temporarily decided to leave off learning OCaml (although I still intend 
> to learn it at some point) and start learning Haskell due to its support for 
> Software Transactional Memory and lock-free concurrent programming.

AFAIK STM is not lock free. It simply limits the locking period
to a bounded time, at the expense of the whole transaction
taking unbounded time. The final compare/write/retry must
be atomic and is therefore protected by a mutex under the hood.

Sebastian Egner said in another post the main advantage
of STMs: they have a combinator form, that is, they
can be composed.

However, despite the lack of safety in a bolt on 
addition for Ocaml .. the real problem is that STM
isn't that useful unless you have 

(a) a lot of processors
(b) a lot of variables

so that the risk of contention is low and the cost
of long exclusions is high. Ocaml fails to satisfy
property (a) since it doesn't support multi-processing.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 16:50 ` [Caml-list] " Sebastian Egner
@ 2006-03-07 17:44   ` Michael Hicks
  2006-03-08  0:37     ` Asfand Yar Qazi
  2006-03-11 19:43     ` Deadlock free locking scheme (was: Re: [Caml-list] STM support in OCaml) David MENTRE
  0 siblings, 2 replies; 29+ messages in thread
From: Michael Hicks @ 2006-03-07 17:44 UTC (permalink / raw)
  To: caml-list

There's a longer answer, but one short answer is: check out AtomCaml  
at http://www.cs.washington.edu/homes/miker/atomcaml/

-Mike

On Mar 7, 2006, at 11:50 AM, Sebastian Egner wrote:

>
> > Are there any plans to add STM to OCaml?  And I'm talking highly  
> integrated
> > into the language, not just some bolt-on library.
>
> I implemented 'some bolt-on library' as you call it for
> myself (for understanding the details of STM) but did
> not release it. The main lesseons I learned, however, are
>
> a) A library will be nearly as good as something 'highly
> integrated into the language'. The main difference is
> that you cannot annotate fields to be transactional,
> and define transactional record types; you need to
> work with explicit transactional reference boxes and arrays.
> This appears tolerable, and it is a lot simpler, both in
> terms of implementation and in terms of using it.
>
> b) The version of STM as designed for Haskell makes essential
> use of laziness for composing transactions in a very clean
> way. You can construct more and more complex transactions
> by not yet enclosing them into 'atomic'. This is probably
> the most important benefit of STMs at all. Unfortunately,
> this property is essentially impossible in OCaml: No matter
> how highly integrated STMs are in the language, it will
> always be easy to write a program that uses ordinary (non-STM)
> state to keep a record of something it shouldn't---messing
> up semantics of course.
>
> Sorry, I know this is bad news, but that is what I found.
> I invite everybody to verify this for him- or herself,
> or prove me wrong.
>
> Sebastian.
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 17:15 ` [Caml-list] STM support in OCaml skaller
@ 2006-03-07 19:05   ` Asfand Yar Qazi
  2006-03-08  0:52     ` skaller
  0 siblings, 1 reply; 29+ messages in thread
From: Asfand Yar Qazi @ 2006-03-07 19:05 UTC (permalink / raw)
  To: caml-list

skaller wrote:
> On Tue, 2006-03-07 at 16:18 +0000, Asfand Yar Qazi wrote:
> 
>>Hi,
>>
>>I've temporarily decided to leave off learning OCaml (although I still intend 
>>to learn it at some point) and start learning Haskell due to its support for 
>>Software Transactional Memory and lock-free concurrent programming.
> 
> 
> AFAIK STM is not lock free. It simply limits the locking period
> to a bounded time, at the expense of the whole transaction
> taking unbounded time. The final compare/write/retry must
> be atomic and is therefore protected by a mutex under the hood.
> 
> Sebastian Egner said in another post the main advantage
> of STMs: they have a combinator form, that is, they
> can be composed.
> 
> However, despite the lack of safety in a bolt on 
> addition for Ocaml .. the real problem is that STM
> isn't that useful unless you have 
> 
> (a) a lot of processors
> (b) a lot of variables
> 
> so that the risk of contention is low and the cost
> of long exclusions is high. Ocaml fails to satisfy
> property (a) since it doesn't support multi-processing.
> 

You make several claims:

STM is not lock free.
STM is not useful on a small number of processors

With all due respect, these papers refutes these claims:

http://research.microsoft.com/~simonpj/papers/stm/index.htm

http://research.microsoft.com/~simonpj/papers/stm/lock-free.htm

(That's right - the premier research on STM is being done by Micro$oft - yuck.)

As for claim 1.  "Lock-free" doesn't mean what you think it does.  Quote from 
first paper, page 11:

"The STM implementation guarantees that one transaction
can force another to abort only when the first one commits.
As a result, the STM implementation is lock-free in the sense
that it guarantees at any time that some running transaction
can successfully commit."

As for claim 2, note the two-processor performance graphs of their tests 
(second paper, "lock-free.htm", pages 14-15): STM blows the hell out of 
conventional lock-based parallel processing.

As for the lot of variables bit, I intend to have a lot of the buggers :-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 17:44   ` Michael Hicks
@ 2006-03-08  0:37     ` Asfand Yar Qazi
  2006-03-08  5:05       ` Erick Tryzelaar
  2006-03-11 19:43     ` Deadlock free locking scheme (was: Re: [Caml-list] STM support in OCaml) David MENTRE
  1 sibling, 1 reply; 29+ messages in thread
From: Asfand Yar Qazi @ 2006-03-08  0:37 UTC (permalink / raw)
  To: caml-list

Michael Hicks wrote:
> There's a longer answer, but one short answer is: check out AtomCaml  at 
> http://www.cs.washington.edu/homes/miker/atomcaml/
> 
> -Mike
> 

That is actually really really useful just on its own!

Now there are 2 things missing:

- Multi-threaded support
- Optional lazy evaluation

and we have a Haskell beater :-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 19:05   ` Asfand Yar Qazi
@ 2006-03-08  0:52     ` skaller
  2006-03-08  7:08       ` Bardur Arantsson
                         ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: skaller @ 2006-03-08  0:52 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

On Tue, 2006-03-07 at 19:05 +0000, Asfand Yar Qazi wrote:

> You make several claims:
> 
> STM is not lock free.
> STM is not useful on a small number of processors
> 
> As for claim 1.  "Lock-free" doesn't mean what you think it does. 

I know what STM does, thank you: I intend to implement it
myself in my own programming language. Maybe you should
read more carefully.

I said "protected by a mutex under the hood." which means
sure, the programmer is not writing locks, but they're used
in the implementation and the associated costs are still paid.

I really hate it when people try to throw papers against
simple logic. I said what the tradeoffs where:

"It simply limits the locking period
 to a bounded time, at the expense of the whole transaction
taking unbounded time."

and then elaborated the conditions under which this
made sense.

Long locking period on a Uniprocessor not only do not
cause problems they can actually IMPROVE performance by preventing
expensive context switches. 

A paper is cached here on my website, probably one of the
ones you cited.

http://felix.sourceforge.net/papers/ea8-composablememory_stm.pdf

It's quite interesting and I've bought a dual core CPU specifically
to test it out. The only numbers I can give you are based on a simple
lock test on a dual core G5 incrementing an integer: 15x SLOWER
on a dual processor than a uniprocessor with two threads.

No doubt because of the weak support provided by Linux.
Windows may do better, haven't tried yet, but I doubt anything
older than Vista has suitable API support.

In the end, fast concurrency is going to depend on both CPU and 
board design and OS support. The point of the above paper is 
not performance: the point is as I said, Sebastian said, 
AND the paper emphasises: it provides a model which 
supports composition.

I point out that in fact, under the right conditions -- lots
of processors and lots of variables -- it will probably provide better
performance too. However this is hard to test -- not many
of us have access to >2 cores on the same board. There certainly
no way POSIX can deliver good performance: mutexes have to be
synchronisation points and that requires ALL the CPUs to 
flush their caches -- it doesn't scale. Message passing does,
since sender and receiver only need to sync the message.
Explicit coupling, and both the subset of processor and
memory are limited.

Oh, and Ocaml supports message passing between processes .. :)

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08  0:37     ` Asfand Yar Qazi
@ 2006-03-08  5:05       ` Erick Tryzelaar
  0 siblings, 0 replies; 29+ messages in thread
From: Erick Tryzelaar @ 2006-03-08  5:05 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

Asfand Yar Qazi wrote:
> Now there are 2 things missing:
>
> - Multi-threaded support
> - Optional lazy evaluation
>
> and we have a Haskell beater :-)
>
only one thing left, as ocaml has said optional lazy evaluation:

http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#htoc97

:)

-e


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: STM support in OCaml
  2006-03-08  0:52     ` skaller
@ 2006-03-08  7:08       ` Bardur Arantsson
  2006-03-08 10:38       ` [Caml-list] " Asfand Yar Qazi
  2006-03-08 19:36       ` William Lovas
  2 siblings, 0 replies; 29+ messages in thread
From: Bardur Arantsson @ 2006-03-08  7:08 UTC (permalink / raw)
  To: caml-list

skaller wrote:
> On Tue, 2006-03-07 at 19:05 +0000, Asfand Yar Qazi wrote:
[--snip--]
> I point out that in fact, under the right conditions -- lots
> of processors and lots of variables -- it will probably provide better
> performance too. However this is hard to test -- not many
> of us have access to >2 cores on the same board. There certainly
> no way POSIX can deliver good performance: mutexes have to be
> synchronisation points and that requires ALL the CPUs to 
> flush their caches -- it doesn't scale. 

Interestingly, DragonflyBSD seems to be moving toward a slightly weaker
(relative to mutex) form of synchronisation which seems somewhat similar
to STMs:

   http://en.wikipedia.org/wiki/Serializing_tokens

I haven't look at it in detail, but it might be possible to use these to
implement STM in a mutex-free (cheap) way. (Though you might need some
level of hardware support unless you're content with page granularity
'exclusion').

Just thought I'd throw that in there. :)

Cheers,

-- 
Bardur Arantsson
<bardurREMOVE@THISimada.sdu.dk>
<bardurREMOVE@THISscientician.net>

- Am I paying for this abuse or is it extra?
                                  Edmund Blackadder, 'Blackadder'


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08  0:52     ` skaller
  2006-03-08  7:08       ` Bardur Arantsson
@ 2006-03-08 10:38       ` Asfand Yar Qazi
  2006-03-08 19:36       ` William Lovas
  2 siblings, 0 replies; 29+ messages in thread
From: Asfand Yar Qazi @ 2006-03-08 10:38 UTC (permalink / raw)
  To: caml-list

skaller wrote:
> On Tue, 2006-03-07 at 19:05 +0000, Asfand Yar Qazi wrote:
> 
> 
>>You make several claims:
>>
>>STM is not lock free.
>>STM is not useful on a small number of processors
>>
>>As for claim 1.  "Lock-free" doesn't mean what you think it does. 
> 
> 
> I know what STM does, thank you: I intend to implement it
> myself in my own programming language. Maybe you should
> read more carefully.
> 
> I said "protected by a mutex under the hood." which means
> sure, the programmer is not writing locks, but they're used
> in the implementation and the associated costs are still paid.
> 
> I really hate it when people try to throw papers against
> simple logic. I said what the tradeoffs where:
> 
> "It simply limits the locking period
>  to a bounded time, at the expense of the whole transaction
> taking unbounded time."
> 
> and then elaborated the conditions under which this
> made sense.
> 
> Long locking period on a Uniprocessor not only do not
> cause problems they can actually IMPROVE performance by preventing
> expensive context switches. 
> 
> A paper is cached here on my website, probably one of the
> ones you cited.
> 
> http://felix.sourceforge.net/papers/ea8-composablememory_stm.pdf
> 
> It's quite interesting and I've bought a dual core CPU specifically
> to test it out. The only numbers I can give you are based on a simple
> lock test on a dual core G5 incrementing an integer: 15x SLOWER
> on a dual processor than a uniprocessor with two threads.
> 
> No doubt because of the weak support provided by Linux.
> Windows may do better, haven't tried yet, but I doubt anything
> older than Vista has suitable API support.
> 
> In the end, fast concurrency is going to depend on both CPU and 
> board design and OS support. The point of the above paper is 
> not performance: the point is as I said, Sebastian said, 
> AND the paper emphasises: it provides a model which 
> supports composition.
> 
> I point out that in fact, under the right conditions -- lots
> of processors and lots of variables -- it will probably provide better
> performance too. However this is hard to test -- not many
> of us have access to >2 cores on the same board. There certainly
> no way POSIX can deliver good performance: mutexes have to be
> synchronisation points and that requires ALL the CPUs to 
> flush their caches -- it doesn't scale. Message passing does,
> since sender and receiver only need to sync the message.
> Explicit coupling, and both the subset of processor and
> memory are limited.
> 
> Oh, and Ocaml supports message passing between processes .. :)
> 


Bad form on my part old chap - didn't realise your level of expertise.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08  0:52     ` skaller
  2006-03-08  7:08       ` Bardur Arantsson
  2006-03-08 10:38       ` [Caml-list] " Asfand Yar Qazi
@ 2006-03-08 19:36       ` William Lovas
  2006-03-08 20:45         ` Brian Hurt
  2 siblings, 1 reply; 29+ messages in thread
From: William Lovas @ 2006-03-08 19:36 UTC (permalink / raw)
  To: caml-list

On Wed, Mar 08, 2006 at 11:52:05AM +1100, skaller wrote:
> On Tue, 2006-03-07 at 19:05 +0000, Asfand Yar Qazi wrote:
> > As for claim 1.  "Lock-free" doesn't mean what you think it does. 
> 
> I know what STM does, thank you: I intend to implement it
> myself in my own programming language. Maybe you should
> read more carefully.
> 
> I said "protected by a mutex under the hood." which means
> sure, the programmer is not writing locks, but they're used
> in the implementation and the associated costs are still paid.

As the quoted paper said, though, some running transaction can always
commit -- so not all of the associated costs are still paid.  In
particular, the cost of potential non-termination due to deadlock or
livelock is not paid.  It doesn't matter that there's a mutex under the
hood, since it's used only in such a way as to preserve the property that
some transaction can always complete.

> I really hate it when people try to throw papers against
> simple logic. I said what the tradeoffs where:

In this case, i think the paper was right :)  Its statement of
"lock-free"ness was with regards to semantics, not performance.

You may be correct that there is some other property, regarding
performance, that the STM Haskell implementation does not have,
but it is not what their paper calls "lock-free".

William


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 19:36       ` William Lovas
@ 2006-03-08 20:45         ` Brian Hurt
  2006-03-08 21:14           ` Paul Snively
  2006-03-08 22:06           ` skaller
  0 siblings, 2 replies; 29+ messages in thread
From: Brian Hurt @ 2006-03-08 20:45 UTC (permalink / raw)
  To: William Lovas; +Cc: caml-list



On Wed, 8 Mar 2006, William Lovas wrote:

> As the quoted paper said, though, some running transaction can always
> commit -- so not all of the associated costs are still paid.  In
> particular, the cost of potential non-termination due to deadlock or
> livelock is not paid.  It doesn't matter that there's a mutex under the
> hood, since it's used only in such a way as to preserve the property that
> some transaction can always complete.

One comment I will make is that a mutex is expensive, but not *that* 
expensive.  I just wrote a quick program (available if anyone cares) in 
GNU C that measures the cost, in clocks, of locking and unlocking a posix 
mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm getting a cost 
of like 44 clock cycles.  Which makes it less expensive than an L2 cache 
miss.

At this point correctness is a much bigger concern of mine than absolute 
performance.  A fair bit of conservative or unnecessary locking is 
acceptable, given than I can write a complicated and highly scalable 
application and know that I don't have deadlocks or livelocks.  Not having 
race conditions would also be nice, but I don't think that's possible with 
mutable data.

Brian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 20:45         ` Brian Hurt
@ 2006-03-08 21:14           ` Paul Snively
  2006-03-08 22:06           ` skaller
  1 sibling, 0 replies; 29+ messages in thread
From: Paul Snively @ 2006-03-08 21:14 UTC (permalink / raw)
  To: Brian Hurt; +Cc: William Lovas, caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You folks might find Tim Sweeney's slides from POPL '06, which are  
linked to at <http://lambda-the-ultimate.org/node/1277>, interesting.  
Tim gives some statistics about Gears of War, a title with about  
250,000 lines of combined C++ and UnrealScript code for the title  
itself, and another 250,000 lines of C++ for the Unreal Engine 3, and  
goes on to discuss some issues that he has. For example, he has about  
10,000 objects active at any given time. Whenever one of these  
objects is modified, it typically touches 5-10 other objects. And all  
of this has to happen 30-60 times per second.

On slide 50, Tim talks about STM, and says:

"~2-4X STM performance overhead is acceptable: if it enables our  
state-intensive code to scale to many threads, it's still a win."

Food for thought.

Best regards,
Paul

On Mar 8, 2006, at 12:45 PM, Brian Hurt wrote:

>
>
> On Wed, 8 Mar 2006, William Lovas wrote:
>
>> As the quoted paper said, though, some running transaction can always
>> commit -- so not all of the associated costs are still paid.  In
>> particular, the cost of potential non-termination due to deadlock or
>> livelock is not paid.  It doesn't matter that there's a mutex  
>> under the
>> hood, since it's used only in such a way as to preserve the  
>> property that
>> some transaction can always complete.
>
> One comment I will make is that a mutex is expensive, but not  
> *that* expensive.  I just wrote a quick program (available if  
> anyone cares) in GNU C that measures the cost, in clocks, of  
> locking and unlocking a posix mutex.  On my desktop box (AMD Athlon  
> XP 2200+ 1.8GHz), I'm getting a cost of like 44 clock cycles.   
> Which makes it less expensive than an L2 cache miss.
>
> At this point correctness is a much bigger concern of mine than  
> absolute performance.  A fair bit of conservative or unnecessary  
> locking is acceptable, given than I can write a complicated and  
> highly scalable application and know that I don't have deadlocks or  
> livelocks.  Not having race conditions would also be nice, but I  
> don't think that's possible with mutable data.
>
> Brian
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iEYEARECAAYFAkQPSSoACgkQS6yIxITC5OqqrACeK1Se7ZKbyKP9bFZSJXcoe6t/
P7IAoLr9Z/1IL2341P2ARcnMyEj+yZ1G
=Lse8
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 20:45         ` Brian Hurt
  2006-03-08 21:14           ` Paul Snively
@ 2006-03-08 22:06           ` skaller
  2006-03-08 22:10             ` Gerd Stolpmann
                               ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: skaller @ 2006-03-08 22:06 UTC (permalink / raw)
  To: Brian Hurt; +Cc: William Lovas, caml-list

On Wed, 2006-03-08 at 14:45 -0600, Brian Hurt wrote:

> One comment I will make is that a mutex is expensive, but not *that* 
> expensive.  I just wrote a quick program (available if anyone cares) in 
> GNU C that measures the cost, in clocks, of locking and unlocking a posix 
> mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm getting a cost 
> of like 44 clock cycles.  Which makes it less expensive than an L2 cache 
> miss.

Ahem. Now try that on an AMDx2 (dual core). The cost goes through
the roof if one process has a thread on each core. Because each
core has its own cache and both caches have to be flushed/
synchronised. And those caches are BIG!

I had hoped to compare dual CPU with dual core by buying a 
two CPU box (i.e. with two dual cores on it) but they're 2-3
times more expensive because it seems there's no board that
supports Athlons (you need Opterons which cost a lot more).

I have no idea if Linux, for example, running SMP kernel,
is smart enough to know if a mutex is shared between two
processing units or not: AFAIK Linux doesn't support
interprocess mutex. Windows does. Be interesting to
compare.

As mentioned before the only data I have at the moment
is a two thread counter increment experiment on a dual
CPU G5 box, where the speed up from 2 CPUs vs 1 was
a factor of 15 .. times SLOWER.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:06           ` skaller
@ 2006-03-08 22:10             ` Gerd Stolpmann
  2006-03-08 23:48               ` skaller
  2006-03-09  7:45               ` Andrae Muys
  2006-03-08 22:11             ` Brian Hurt
  2006-03-11 15:26             ` [Caml-list] " Florian Weimer
  2 siblings, 2 replies; 29+ messages in thread
From: Gerd Stolpmann @ 2006-03-08 22:10 UTC (permalink / raw)
  To: skaller; +Cc: Brian Hurt, caml-list, William Lovas

Am Donnerstag, den 09.03.2006, 09:06 +1100 schrieb skaller:
> On Wed, 2006-03-08 at 14:45 -0600, Brian Hurt wrote:
> 
> > One comment I will make is that a mutex is expensive, but not *that* 
> > expensive.  I just wrote a quick program (available if anyone cares) in 
> > GNU C that measures the cost, in clocks, of locking and unlocking a posix 
> > mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm getting a cost 
> > of like 44 clock cycles.  Which makes it less expensive than an L2 cache 
> > miss.

> I have no idea if Linux, for example, running SMP kernel,
> is smart enough to know if a mutex is shared between two
> processing units or not: AFAIK Linux doesn't support
> interprocess mutex. Windows does. Be interesting to
> compare.

Of course POSIX supports interprocess mutexes: Mutexes are inherited
across fork(). And Linux even implements threads as processes, so you
can even place mutexes in shared memory (but do not ask me how).

Anyway, measuring the costs of a mutex is probably not simple. They are
highly optimized for the frequently occurring cases. And today "SMP"
behaves often more like NUMA, especially for multi-core CPUs.

> As mentioned before the only data I have at the moment
> is a two thread counter increment experiment on a dual
> CPU G5 box, where the speed up from 2 CPUs vs 1 was
> a factor of 15 .. times SLOWER.

You mean for a highly congested mutex you saw that slowdown. 

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:06           ` skaller
  2006-03-08 22:10             ` Gerd Stolpmann
@ 2006-03-08 22:11             ` Brian Hurt
  2006-03-08 23:05               ` Lodewijk Vöge
                                 ` (2 more replies)
  2006-03-11 15:26             ` [Caml-list] " Florian Weimer
  2 siblings, 3 replies; 29+ messages in thread
From: Brian Hurt @ 2006-03-08 22:11 UTC (permalink / raw)
  To: skaller; +Cc: William Lovas, caml-list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1608 bytes --]



On Thu, 9 Mar 2006, skaller wrote:

> Ahem. Now try that on an AMDx2 (dual core). The cost goes through
> the roof if one process has a thread on each core. Because each
> core has its own cache and both caches have to be flushed/
> synchronised. And those caches are BIG!

Love to.  Wanna buy me the box?  :-}  Seriously- my code is attached, if 
someone wants to run it on other boxes and post the results, feel free. 
It's GNU-C/x86 specific, as I'm using GNU C's inline assembler and the 
rdtsc instruction to get accurate cycle counts.

As to the cache comment: the whole caches don't have to be flushed, just 
the line the mutex is on.  Which makes it approximately the cost of a 
cache miss- that's a good approximation of the cost of getting an 
uncontended lock.

>
> I have no idea if Linux, for example, running SMP kernel,
> is smart enough to know if a mutex is shared between two
> processing units or not: AFAIK Linux doesn't support
> interprocess mutex. Windows does. Be interesting to
> compare.

It doesn't look like the mutex software is even going into the kernel. 
I don't think the Linux kernel even knows the mutex *exists*, let alone 
what threads are competing for it.  On the x86, at least, lock 
instructions are not priveledged.

>
> As mentioned before the only data I have at the moment
> is a two thread counter increment experiment on a dual
> CPU G5 box, where the speed up from 2 CPUs vs 1 was
> a factor of 15 .. times SLOWER.

If you're ping-ponging a cache line between two CPUs (and the AMD dual 
cores count as two CPUs), then I can easily beleive that.

So?

Brian

[-- Attachment #2: Type: TEXT/X-CSRC, Size: 1847 bytes --]

#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>

#if !defined(__GNUC__) && !defined(__i386__)
#error This code only works with GCC/i386.
#endif

/* The reason this only works under GNU C and the x86 is we're using the
 * rdtsc instruction.
 */
static inline unsigned long long rdtsc() {
	unsigned long long rval;

	asm volatile ("rdtsc" : "=A" (rval));
	return rval;
}

static sem_t waiting_thread_semaphore;
static pthread_mutex_t mutex;

void * waiting_thread_func(void * param __attribute__((unused))) {
	sem_wait(&waiting_thread_semaphore);
	return NULL;
}

int main(void) {
	int i;
	pthread_t waiting_thread;
	void * trash;
	unsigned long long start, stop, time, min;


	/* Create a thread to force us to actually do multi-threaded work */
	sem_init(&waiting_thread_semaphore, 1, 0);
	pthread_create(&waiting_thread, NULL, waiting_thread_func, NULL);

	pthread_mutex_init(&mutex, NULL);

	/* Time how long a rdtsc takes- we do this ten times and take the 
	 * cheapest run.
	 */
	min = ~0ull;
	for (i = 0; i < 10; ++i) {
		start = rdtsc();
		stop = rdtsc();
		time = stop - start;
		if (time < min) {
			min = time;
		}
	}
	printf("Minimum time for a rdtsc instruction (in clocks): %llu\n", min);

	/* Now time how long the pair of mutex lock + unlock take */
	min = ~0ull;
	for (i = 0; i < 10; ++i) {
		start = rdtsc();
		pthread_mutex_lock(&mutex);
		pthread_mutex_unlock(&mutex);
		stop = rdtsc();
		time = stop - start;
		if (time < min) {
			min = time;
		}
	}
	printf("Minimum time for a mutex lock+unlock + rdtsc (in clocks): %llu\n", min);

	/* Clean up the waiting thread we spawned just to be multithreaded. */
	sem_post(&waiting_thread_semaphore);
	pthread_join(waiting_thread, &trash);
	sem_destroy(&waiting_thread_semaphore);
	return 0;
}


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:11             ` Brian Hurt
@ 2006-03-08 23:05               ` Lodewijk Vöge
  2006-03-09  3:13                 ` Brian Hurt
  2006-03-08 23:45               ` Robert Roessler
  2006-03-09  0:23               ` skaller
  2 siblings, 1 reply; 29+ messages in thread
From: Lodewijk Vöge @ 2006-03-08 23:05 UTC (permalink / raw)
  To: caml-list

On 8-mrt-2006, at 23:11, Brian Hurt wrote:

> Love to.  Wanna buy me the box?  :-}  Seriously- my code is  
> attached, if someone wants to run it on other boxes and post the  
> results, feel free. It's GNU-C/x86 specific, as I'm using GNU C's  
> inline assembler and the rdtsc instruction to get accurate cycle  
> counts.

I don't know if rdtsc is accurate on 64bit, but this is what running  
it on a dual opteron 250 running 64bit linux gives:

$ ./lock
Minimum time for a rdtsc instruction (in clocks): 144099185
Minimum time for a mutex lock+unlock + rdtsc (in clocks): 24

multiple runs hover between this and 37 clocks for the mutex, and  
more than twice for the rdtsc.

Lodewijk


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:11             ` Brian Hurt
  2006-03-08 23:05               ` Lodewijk Vöge
@ 2006-03-08 23:45               ` Robert Roessler
  2006-03-09  0:23               ` skaller
  2 siblings, 0 replies; 29+ messages in thread
From: Robert Roessler @ 2006-03-08 23:45 UTC (permalink / raw)
  To: Caml-list

Brian Hurt wrote:
> On Thu, 9 Mar 2006, skaller wrote:
> 
>> Ahem. Now try that on an AMDx2 (dual core). The cost goes through
>> the roof if one process has a thread on each core. Because each
>> core has its own cache and both caches have to be flushed/
>> synchronised. And those caches are BIG!
> 
> Love to.  Wanna buy me the box?  :-}  Seriously- my code is attached, if 
> someone wants to run it on other boxes and post the results, feel free. 
> It's GNU-C/x86 specific, as I'm using GNU C's inline assembler and the 
> rdtsc instruction to get accurate cycle counts.
> 
> As to the cache comment: the whole caches don't have to be flushed, just 
> the line the mutex is on.  Which makes it approximately the cost of a 
> cache miss- that's a good approximation of the cost of getting an 
> uncontended lock.
> 
>>
>> I have no idea if Linux, for example, running SMP kernel,
>> is smart enough to know if a mutex is shared between two
>> processing units or not: AFAIK Linux doesn't support
>> interprocess mutex. Windows does. Be interesting to
>> compare.
> 
> It doesn't look like the mutex software is even going into the kernel. I 
> don't think the Linux kernel even knows the mutex *exists*, let alone 
> what threads are competing for it.  On the x86, at least, lock 
> instructions are not priveledged.

Brian [, this may be fairly off-topic, but], could you possibly
comment on how these results may or may not pertain to g_mutex_lock
(rather than pthread_mutex_lock)?

I have been going to some trouble to operate a glyph-string cache
without needing any locking for the *lookup* case - which is doable,
only needing locking for *adding*.  OTOH, for the flushing operation,
it looks a bit tougher... ;)

But if g_mutex_lock/g_mutex_unlock is not all that grim...

Robert Roessler
roessler@rftp.com
http://www.rftp.com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:10             ` Gerd Stolpmann
@ 2006-03-08 23:48               ` skaller
  2006-03-09  7:45               ` Andrae Muys
  1 sibling, 0 replies; 29+ messages in thread
From: skaller @ 2006-03-08 23:48 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Brian Hurt, caml-list, William Lovas

On Wed, 2006-03-08 at 23:10 +0100, Gerd Stolpmann wrote:
> Am Donnerstag, den 09.03.2006, 09:06 +1100 schrieb skaller:

> > I have no idea if Linux, for example, running SMP kernel,
> > is smart enough to know if a mutex is shared between two
> > processing units or not: AFAIK Linux doesn't support
> > interprocess mutex. Windows does. Be interesting to
> > compare.
> 
> Of course POSIX supports interprocess mutexes: Mutexes are inherited
> across fork().

How does that work with the address space being copied?

On Windows, the mutex lives in kernel memory, the user
only gets a handle to it. But at least with Xavier's pthread
implementation (I assume this is Posix) you can do

       pthread_mutex_t fastmutex = PTHREAD_MUTEX_INITIALIZER;

which indicates the mutex lives in user address space.

Fork() copies the user address space (perhaps lazily).
So it seems to me mutexes cannot work across fork()?

>  And Linux even implements threads as processes, so you
> can even place mutexes in shared memory (but do not ask me how).

I was thinking of Windows, where you can actually
create inter-process global objects by naming them, and then find
them in another unrelated process using the string name.

[Placing a mutex is shared memory is the same as any other memory
I guess: you just call pthread_mutex_init]

> Anyway, measuring the costs of a mutex is probably not simple. 

Agreed!

> > is a two thread counter increment experiment on a dual
> > CPU G5 box, where the speed up from 2 CPUs vs 1 was
> > a factor of 15 .. times SLOWER.
> 
> You mean for a highly congested mutex you saw that slowdown. 

Yes, just one trivial experiment. Enough to suggest that
merely upgrading from one core to two won't necessarily
buy you a major performance increase. Probably better
can be done with the right software, low level stuff using
kernel calls, or even processor/board specific patches.
AMD64 for example seems to have reasonably sophisticated
ways of handling cache coherence which can't be provided
generically by an OS like Linux that has to run on
processors with different cache management architectures.

Although these are low level issues, the really interesting
ones are high level. Stuff like STM, control inversion,
garbage collection, etc may be more portable and yield more
benefits (especially if it is portable). But that is also
hard to know without some significant applications and
a lot of measurements.

Anyhow I personally won't be writing off Ocaml any time soon!

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:11             ` Brian Hurt
  2006-03-08 23:05               ` Lodewijk Vöge
  2006-03-08 23:45               ` Robert Roessler
@ 2006-03-09  0:23               ` skaller
  2006-03-09  3:19                 ` Brian Hurt
  2 siblings, 1 reply; 29+ messages in thread
From: skaller @ 2006-03-09  0:23 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list, William Lovas

On Wed, 2006-03-08 at 16:11 -0600, Brian Hurt wrote:

> As to the cache comment: the whole caches don't have to be flushed, just 
> the line the mutex is on. 

Unfortunately that isn't correct ;( Consider:

	lock mutex

	read some data
	write back that data

	unlock mutex

The problem with two threads is that 'some data' has to
be invalidated too. Otherwise there
is no way existing Posix software would work on a
multi-processor.

I couldn't believe it! That's why we tried the experiment:
did the OS really invalidate the 'whole' caches, or did our
trivial increment program fail, because the memory being
incremented wasn't coherent?

Either answer is BAD: its a lose/lose situation. The result
is -- we got the right answers and the 15x slowdown which indicates
that yes, really the whole** cache on both processors is
being invalidated every mutex lock/unlock.

Interestingly .. it should work much better with a pure FPL,
there's nothing to synchronise unless the compiler did some
nasty optimisations like reusing a closure for self-tail-rec
functions, in which case the compiler could tell the OS
which store is dirty. I wouldn't expect that to work
with a C program though :)

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 23:05               ` Lodewijk Vöge
@ 2006-03-09  3:13                 ` Brian Hurt
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Hurt @ 2006-03-09  3:13 UTC (permalink / raw)
  To: Lodewijk Vöge; +Cc: caml-list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 680 bytes --]



On Thu, 9 Mar 2006, Lodewijk Vöge wrote:

> On 8-mrt-2006, at 23:11, Brian Hurt wrote:
>
>> Love to.  Wanna buy me the box?  :-}  Seriously- my code is attached, if 
>> someone wants to run it on other boxes and post the results, feel free. 
>> It's GNU-C/x86 specific, as I'm using GNU C's inline assembler and the 
>> rdtsc instruction to get accurate cycle counts.
>
> I don't know if rdtsc is accurate on 64bit, but this is what running it on a 
> dual opteron 250 running 64bit linux gives:

On 64-bit, I think rdtsc only sets the RAX register.  At least that's what 
I'm assuming is going on- 144 million clocks for a rdtsc seems a bit 
extreme.

Brian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  0:23               ` skaller
@ 2006-03-09  3:19                 ` Brian Hurt
  2006-03-09  4:32                   ` skaller
  0 siblings, 1 reply; 29+ messages in thread
From: Brian Hurt @ 2006-03-09  3:19 UTC (permalink / raw)
  To: skaller; +Cc: caml-list, William Lovas



On Thu, 9 Mar 2006, skaller wrote:

> The problem with two threads is that 'some data' has to
> be invalidated too. Otherwise there
> is no way existing Posix software would work on a
> multi-processor.
>
> I couldn't believe it! That's why we tried the experiment:
> did the OS really invalidate the 'whole' caches, or did our
> trivial increment program fail, because the memory being
> incremented wasn't coherent?

Generally both the lock get and lock remove need to do a memory barrier, 
and force all writes to complete to cache.  Once in cache, the cache's do 
cache coherency.  The caches snoop the memory traffic- and if someone is 
reading memory that the cache knows is more up to date in that cache than 
in memory, the read is satisified from that cache and not from memory- and 
the cache updates memory as well at the same time.  I'm simplifying 
enormously here, but the end result is that no, you don't need to flush 
the entire cache on lock release.  Google "cache coherency" or "MESI" for 
more information.

Brian


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  3:19                 ` Brian Hurt
@ 2006-03-09  4:32                   ` skaller
  2006-03-09 10:38                     ` John Chu
  2006-03-09 16:53                     ` Stefan Monnier
  0 siblings, 2 replies; 29+ messages in thread
From: skaller @ 2006-03-09  4:32 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list, William Lovas

On Wed, 2006-03-08 at 21:19 -0600, Brian Hurt wrote:

> > I couldn't believe it! That's why we tried the experiment:
> > did the OS really invalidate the 'whole' caches, or did our
> > trivial increment program fail, because the memory being
> > incremented wasn't coherent?

>  I'm simplifying 
> enormously here, 

I know, so was I -- I've read the AMD64 specs.

Suppose I have a thread that creates an array
which fits roughly in the cache. Then I hand it
over to another thread to fold. The first thread
has to load stuff from RAM to populate the array, 
which slows it down.

Then it has to write the whole array back to RAM,
and the second thread reloads the whole array from
RAM into the cache.

In that case the whole cache has to be not only 
flushed to RAM, it has to be reloaded into another
cache. Pretty nasty .. on a MP machine.  

On a uniprocessor, the same CPU will do the fold,
and the cache doesn't have to be either saved
or reloaded. Yes, that's an extreme case!

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:10             ` Gerd Stolpmann
  2006-03-08 23:48               ` skaller
@ 2006-03-09  7:45               ` Andrae Muys
  2006-03-09  9:18                 ` David Brown
  1 sibling, 1 reply; 29+ messages in thread
From: Andrae Muys @ 2006-03-09  7:45 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: skaller, Brian Hurt, caml-list, William Lovas


On 09/03/2006, at 8:10 AM, Gerd Stolpmann wrote:

> Am Donnerstag, den 09.03.2006, 09:06 +1100 schrieb skaller:
>> On Wed, 2006-03-08 at 14:45 -0600, Brian Hurt wrote:
>>
>>> One comment I will make is that a mutex is expensive, but not *that*
>>> expensive.  I just wrote a quick program (available if anyone  
>>> cares) in
>>> GNU C that measures the cost, in clocks, of locking and unlocking  
>>> a posix
>>> mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm  
>>> getting a cost
>>> of like 44 clock cycles.  Which makes it less expensive than an  
>>> L2 cache
>>> miss.
>
>> I have no idea if Linux, for example, running SMP kernel,
>> is smart enough to know if a mutex is shared between two
>> processing units or not: AFAIK Linux doesn't support
>> interprocess mutex. Windows does. Be interesting to
>> compare.
>
> Of course POSIX supports interprocess mutexes: Mutexes are inherited
> across fork().

As skeller suggested, I can confirm this does not work.  Mutexes are  
not OS resources, they are initialised regions of memory (generally a  
word).  As fork() does copy-on-write, this leads to the mutex being  
duplicated NOT shared; and yes given that fork() only duplicates the  
calling thread, that does mean you can end-up with mutexes locked in  
the child with no  controlling thread available to unlock them.

fork() safety is an important issue to be aware of when working with  
posix-threads.  The best reference I am familiar with is Programming  
With Threads by Kleiman, Shah, and Smaalders, which includes a whole  
chapter on the this and related topics.

> And Linux even implements threads as processes, so you
> can even place mutexes in shared memory (but do not ask me how).

I've done this in an embedded app I wrote a few years back.  It's not  
hard, you create the shared-memory segment, cast some of it to be of  
type pthread_mutex_t and pass it to pthread_mutex_init()  (I'm  
working from memory here, so the function/type names may be off).

> Anyway, measuring the costs of a mutex is probably not simple. They  
> are
> highly optimized for the frequently occurring cases. And today "SMP"
> behaves often more like NUMA, especially for multi-core CPUs.

Agreed, however in my experience, if optimistic locking isn't going  
to work for you then neither are mutex-based shared-memory approaches.

Andrae Muys

-- 
Andrae Muys
andrae@netymon.com
Principal Kowari Consultant
Netymon Pty Ltd



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  7:45               ` Andrae Muys
@ 2006-03-09  9:18                 ` David Brown
  0 siblings, 0 replies; 29+ messages in thread
From: David Brown @ 2006-03-09  9:18 UTC (permalink / raw)
  To: Andrae Muys; +Cc: Gerd Stolpmann, Brian Hurt, caml-list, William Lovas, skaller

On Thu, Mar 09, 2006 at 05:45:10PM +1000, Andrae Muys wrote:

> >>I have no idea if Linux, for example, running SMP kernel,
> >>is smart enough to know if a mutex is shared between two
> >>processing units or not: AFAIK Linux doesn't support
> >>interprocess mutex. Windows does. Be interesting to
> >>compare.
> >
> >Of course POSIX supports interprocess mutexes: Mutexes are inherited
> >across fork().
> 
> As skeller suggested, I can confirm this does not work.  Mutexes are  
> not OS resources, they are initialised regions of memory (generally a  
> word).  As fork() does copy-on-write, this leads to the mutex being  
> duplicated NOT shared; and yes given that fork() only duplicates the  
> calling thread, that does mean you can end-up with mutexes locked in  
> the child with no  controlling thread available to unlock them.

The mutex gets duplicated, including the data structures that indicate the
process ids of who might have been waiting for it.  fork() is an
outstanding way to very thorougly hose an pthread application.

Windows has the "advantage" of just not having fork.  Well, not exactly,
since Cygwin demonstrates that it can be faked.

The Linux mutex code does use kernel scheduling resources (to go to sleep
and signal each other), but the initial tests are done through the shared
memory.

Dave


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  4:32                   ` skaller
@ 2006-03-09 10:38                     ` John Chu
  2006-03-09 16:53                     ` Stefan Monnier
  1 sibling, 0 replies; 29+ messages in thread
From: John Chu @ 2006-03-09 10:38 UTC (permalink / raw)
  To: caml-list

> I know, so was I -- I've read the AMD64 specs.
[snip]
> The first thread
> has to load stuff from RAM to populate the array,
> which slows it down.
>
> Then it has to write the whole array back to RAM,
> and the second thread reloads the whole array from
> RAM into the cache.

If you've read the AMD64 specs, then you know about the Owned state in 
the cache coherence protocol AMD chips use.  It allows one processor to 
update another processor without having to update main memory. i.e., 
one processor goes from M->O, the other goes from I->S. The O state is 
there to remind the first processor that on an eviction, it can't 
simply drop the cache line (as it could a line in the S state). It must 
write back to main memory as if it were in the M state.

You're obviously right that two processes on two cores use the same 
data, the data will obviously need to be copied from the cache 
hierarchy of one core to the other core. What I'm saying is that it 
does not need to write into main memory first.

						john


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: STM support in OCaml
  2006-03-09  4:32                   ` skaller
  2006-03-09 10:38                     ` John Chu
@ 2006-03-09 16:53                     ` Stefan Monnier
  1 sibling, 0 replies; 29+ messages in thread
From: Stefan Monnier @ 2006-03-09 16:53 UTC (permalink / raw)
  To: caml-list

> In that case the whole cache has to be not only
> flushed to RAM, it has to be reloaded into another
> cache. Pretty nasty .. on a MP machine.

Yes, of course, the data needs to copied between the CPUs.  But the cache
never needs to be flushed.


        Stefan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:06           ` skaller
  2006-03-08 22:10             ` Gerd Stolpmann
  2006-03-08 22:11             ` Brian Hurt
@ 2006-03-11 15:26             ` Florian Weimer
  2 siblings, 0 replies; 29+ messages in thread
From: Florian Weimer @ 2006-03-11 15:26 UTC (permalink / raw)
  To: skaller; +Cc: Brian Hurt, caml-list, William Lovas

> I have no idea if Linux, for example, running SMP kernel,
> is smart enough to know if a mutex is shared between two
> processing units or not: AFAIK Linux doesn't support
> interprocess mutex.

Uhm, Linux and GNU libc do support them for quite some time (many
years if you used one of the enterprise variants which backported them
to Linux 2.4).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Deadlock free locking scheme (was: Re: [Caml-list] STM support in OCaml)
  2006-03-07 17:44   ` Michael Hicks
  2006-03-08  0:37     ` Asfand Yar Qazi
@ 2006-03-11 19:43     ` David MENTRE
  1 sibling, 0 replies; 29+ messages in thread
From: David MENTRE @ 2006-03-11 19:43 UTC (permalink / raw)
  To: caml-list; +Cc: Michael Hicks

Hello,

Michael Hicks <mwh@cs.umd.edu> writes:

> There's a longer answer, but one short answer is: check out AtomCaml  at
> http://www.cs.washington.edu/homes/miker/atomcaml/

Thank you for the interesting paper, I did not know about that work. Was
there an announcement on caml-list?



By the way, as a reply to initial Asfand Yar Quazi's question, there is
a simple way to implement deadlock free locking scheme, both for byte
and native code: call a locking routine that always try to lock the
needed locks *in the same order*. By using the high-order properties of
OCaml, this is very easy for the programmer to use it.

Such a locking scheme can be used in the following way (imagine you have
four bases, named like "participant" or "position", each one protected
with a multiple-reader/single-writer lock):

let do_atomic_work () =
  let unlock =
     lock_bases { no_locks with participant = Read_lock; 
                                position = Read_lock; } in
  let result = do_processing () in
  unlock ();
  result

Compared to AtomCaml approach, it is like if the do_processing code is
executed in a Commit phase.

Please find below the code that imlements this scheme (using Reader and
Writer mutex[1]). It could probably be optimized for speed (using
precomputed locking and unlocking functions stored in a hash table) or
improved to support more locks (using an ordered set as input to lock
function), but the general scheme is there.




\chapter{Control of data bases access}

Module [[Basesctrl]] controls the locks to access the data bases. 

Each of the four data bases (Participants, Classification, Delegation,
Position) can be accessed for reading or for writing. A reader/writer
lock (see chapter \ref{module:rwmutex}) protects each one of them.

All locking and unlocking of those bases are done through this
module. We can thus control the locking and unlocking order and thus
guarantee the absence of deadlocks.

The locking of the bases is done through the [[lock]] function. It locks
bases as desired and then return an unlocking function that should be
called to cancel locking.

\section{Data structures}

Each lock can either be taken for reading ([[Read_lock]]), for writing
([[Write_lock]]) or not taken at all ([[No_lock]]).

<<basesctrl.ml>>=
type lock_type =
  | Read_lock
  | Write_lock
  | No_lock
@ 

We define a data structure [[t]] that indicates, for each action on the
database, the desired (un)locking for each base.

<<basesctrl.ml>>=
type t = {
    participant : lock_type;
    classification : lock_type;
    delegation : lock_type;
    position : lock_type;
  }
@ 

We define a default lock state [[no_locks]] where no locks are taken.

<<basesctrl.ml>>=
let no_locks = {
  participant = No_lock;
  classification = No_lock;
  delegation = No_lock;
  position = No_lock;
}
@ 

We also define the actual set of locks that protect bases. We chose a
[[Writer_preference]] to give priority to data base modifications that
should be less frequent that data base reading.

\nextchunklabel{code:basesctrl:locks}
<<basesctrl.ml>>=
let pref = Rwmutex.Writer_preference

let participant_lock = Rwmutex.create pref
let classification_lock = Rwmutex.create pref
let delegation_lock = Rwmutex.create pref
let position_lock = Rwmutex.create pref
@ 

\section{Base unlocking}


Helper function [[unlock_a_base]] is used to unlock [[lock]] with
[[desired]] unlocking.

<<basesctrl.ml>>=
let unlock_a_base desired lock =
  match desired with
  | Read_lock -> Rwmutex.read_unlock lock
  | Write_lock -> Rwmutex.write_unlock lock
  | No_lock -> ()
@ 

Function [[create_unlock]] returns a function that, when called,
unlocks the bases as specified in the [[what]] argument.

\nextchunklabel{code:create_unlock}
<<basesctrl.ml>>=
let create_unlock what =
  fun () -> 
    unlock_a_base what.position position_lock;
    unlock_a_base what.delegation delegation_lock;
    unlock_a_base what.classification classification_lock;
    unlock_a_base what.participant participant_lock
@ 

\section{Base locking}

Helper function [[lock_a_base]] is used to lock [[lock]] with
[[desired]] locking.

<<basesctrl.ml>>=
let lock_a_base desired lock =
  match desired with
  | Read_lock -> Rwmutex.read_lock lock
  | Write_lock -> Rwmutex.write_lock lock
  | No_lock -> ()
@ 

Function [[lock_bases]] is called to lock bases. The [[what]] argument
gives for each base the desired locking. This function returns a
function that should be called to unlock the bases.

To avoid deadlocks, the locking order is the reverse as used in the
unlocking procedure (see \codechunkref{code:create_unlock}).

<<basesctrl.ml>>=
let lock_bases what =
  lock_a_base what.participant participant_lock;
  lock_a_base what.classification classification_lock;
  lock_a_base what.delegation delegation_lock;
  lock_a_base what.position position_lock;
  create_unlock what
@ 



Best wishes,
david

Footnotes: 
[1]  http://www.linux-france.org/~dmentre/code/ocaml_thread_synchro.tar.gz

-- 
pub  1024D/A3AD7A2A 2004-10-03 David MENTRE <dmentre@linux-france.org>
 5996 CC46 4612 9CA4 3562  D7AC 6C67 9E96 A3AD 7A2A


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2006-03-11 19:42 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-07 16:18 STM support in OCaml Asfand Yar Qazi
2006-03-07 16:50 ` [Caml-list] " Sebastian Egner
2006-03-07 17:44   ` Michael Hicks
2006-03-08  0:37     ` Asfand Yar Qazi
2006-03-08  5:05       ` Erick Tryzelaar
2006-03-11 19:43     ` Deadlock free locking scheme (was: Re: [Caml-list] STM support in OCaml) David MENTRE
2006-03-07 17:15 ` [Caml-list] STM support in OCaml skaller
2006-03-07 19:05   ` Asfand Yar Qazi
2006-03-08  0:52     ` skaller
2006-03-08  7:08       ` Bardur Arantsson
2006-03-08 10:38       ` [Caml-list] " Asfand Yar Qazi
2006-03-08 19:36       ` William Lovas
2006-03-08 20:45         ` Brian Hurt
2006-03-08 21:14           ` Paul Snively
2006-03-08 22:06           ` skaller
2006-03-08 22:10             ` Gerd Stolpmann
2006-03-08 23:48               ` skaller
2006-03-09  7:45               ` Andrae Muys
2006-03-09  9:18                 ` David Brown
2006-03-08 22:11             ` Brian Hurt
2006-03-08 23:05               ` Lodewijk Vöge
2006-03-09  3:13                 ` Brian Hurt
2006-03-08 23:45               ` Robert Roessler
2006-03-09  0:23               ` skaller
2006-03-09  3:19                 ` Brian Hurt
2006-03-09  4:32                   ` skaller
2006-03-09 10:38                     ` John Chu
2006-03-09 16:53                     ` Stefan Monnier
2006-03-11 15:26             ` [Caml-list] " Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).