caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] STM support in OCaml
@ 2006-03-08 10:11 yoann padioleau
  2006-03-08 10:41 ` Asfand Yar Qazi
  2006-03-08 11:32 ` Gerd Stolpmann
  0 siblings, 2 replies; 34+ messages in thread
From: yoann padioleau @ 2006-03-08 10:11 UTC (permalink / raw)
  To: skaller, Asfand Yar Qazi; +Cc: caml-list


> > You make several claims:
> > 
> > STM is not lock free.
> > STM is not useful on a small number of processors
> > 
> > As for claim 1.  "Lock-free" doesn't mean what you think it does. 
> 
> I know what STM does, thank you: I intend to implement it
> myself in my own programming language. Maybe you should
> read more carefully.
> 
> I said "protected by a mutex under the hood." which means
> sure, the programmer is not writing locks, but they're used
> in the implementation and the associated costs are still paid.

I have read (very fastly) the atomcaml paper and I don't think they use "mutex under the hood".
It seems that they just log stuff and rollback. I guess that when you want to implement 
STM on multiprocessor you may need to mutex stuff (the internal data structures maintained by the STM runtime),
but as ocaml runtime does not use multiprocessor, they dont need it.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 10:11 [Caml-list] STM support in OCaml yoann padioleau
@ 2006-03-08 10:41 ` Asfand Yar Qazi
  2006-03-08 12:23   ` skaller
  2006-03-08 11:32 ` Gerd Stolpmann
  1 sibling, 1 reply; 34+ messages in thread
From: Asfand Yar Qazi @ 2006-03-08 10:41 UTC (permalink / raw)
  To: caml-list

yoann padioleau wrote:
>>>You make several claims:
>>>
>>>STM is not lock free.
>>>STM is not useful on a small number of processors
>>>
>>>As for claim 1.  "Lock-free" doesn't mean what you think it does. 
>>
>>I know what STM does, thank you: I intend to implement it
>>myself in my own programming language. Maybe you should
>>read more carefully.
>>
>>I said "protected by a mutex under the hood." which means
>>sure, the programmer is not writing locks, but they're used
>>in the implementation and the associated costs are still paid.
> 
> 
> I have read (very fastly) the atomcaml paper and I don't think they use "mutex under the hood".
> It seems that they just log stuff and rollback. I guess that when you want to implement 
> STM on multiprocessor you may need to mutex stuff (the internal data structures maintained by the STM runtime),
> but as ocaml runtime does not use multiprocessor, they dont need it.
> 

Also, from what I remember, STM is "optimistic", while conventional lock-based 
design is "pessimistic" - thereby allowing STM based code to spend less time 
checking for locks or something, which apparently makes it quicker.

But, I'll lets the experts explain it :-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 10:11 [Caml-list] STM support in OCaml yoann padioleau
  2006-03-08 10:41 ` Asfand Yar Qazi
@ 2006-03-08 11:32 ` Gerd Stolpmann
  2006-03-08 12:04   ` skaller
  1 sibling, 1 reply; 34+ messages in thread
From: Gerd Stolpmann @ 2006-03-08 11:32 UTC (permalink / raw)
  To: padator; +Cc: skaller, Asfand Yar Qazi, caml-list

Am Mittwoch, den 08.03.2006, 11:11 +0100 schrieb yoann padioleau:
> > > You make several claims:
> > > 
> > > STM is not lock free.
> > > STM is not useful on a small number of processors
> > > 
> > > As for claim 1.  "Lock-free" doesn't mean what you think it does. 
> > 
> > I know what STM does, thank you: I intend to implement it
> > myself in my own programming language. Maybe you should
> > read more carefully.
> > 
> > I said "protected by a mutex under the hood." which means
> > sure, the programmer is not writing locks, but they're used
> > in the implementation and the associated costs are still paid.
> 
> I have read (very fastly) the atomcaml paper and I don't think they use "mutex under the hood".
> It seems that they just log stuff and rollback. I guess that when you want to implement 
> STM on multiprocessor you may need to mutex stuff (the internal data structures maintained by the STM runtime),
> but as ocaml runtime does not use multiprocessor, they dont need it.

In Atomcaml, the question whether there is a mutex or not simply does
not matter. They have their own scheduler (for bytecode), so there is no
need to have additional mutexes to protect the data structures Atomcaml
needs for itself, e.g. the log buffer. Being the scheduler is better
than any mutual exclusion...

If you ported Atomcaml to a kernel-level scheduler, you would surely
need mutexes (even in the uniprocessor case), simply because you need to
synchronize the kernel scheduler with user-level atomicity management.
So yes, for any real-world implementation, there are "mutexes under the
hood".

Anyway, I think this question is misleading. The point is not to
establish locking schemes that use the hardware better (i.e. that are
faster), but to help programmers writing correct code.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 11:32 ` Gerd Stolpmann
@ 2006-03-08 12:04   ` skaller
  2006-03-08 19:22     ` Dan Grossman
  0 siblings, 1 reply; 34+ messages in thread
From: skaller @ 2006-03-08 12:04 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: padator, Asfand Yar Qazi, caml-list

On Wed, 2006-03-08 at 12:32 +0100, Gerd Stolpmann wrote:

> Anyway, I think this question is misleading. The point is not to
> establish locking schemes that use the hardware better (i.e. that are
> faster), but to help programmers writing correct code.

My aim would be to do both -- they're not independent.
Who cares about concurrency if not to try to do some job faster?
And of course it's no good if you do the wrong thing faster :)

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 10:41 ` Asfand Yar Qazi
@ 2006-03-08 12:23   ` skaller
  2006-03-08 23:02     ` Asfand Yar Qazi
  0 siblings, 1 reply; 34+ messages in thread
From: skaller @ 2006-03-08 12:23 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

On Wed, 2006-03-08 at 10:41 +0000, Asfand Yar Qazi wrote:

> Also, from what I remember, STM is "optimistic", while conventional lock-based 
> design is "pessimistic" - thereby allowing STM based code to spend less time 
> checking for locks or something, which apparently makes it quicker.

> But, I'll lets the experts explain it :-)

It isn't explanation that is needed but experience
actually using it. The performance tradeoffs -- including
developer time -- won't be evident until significant
applications are developed using it.

For example I wonder if the technique is worthwhile
*without* the Haskell type system to enforce correctness?

And given GHC is currently not generating very good code,
will it matter anyhow?

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 12:04   ` skaller
@ 2006-03-08 19:22     ` Dan Grossman
  2006-03-08 22:10       ` skaller
  0 siblings, 1 reply; 34+ messages in thread
From: Dan Grossman @ 2006-03-08 19:22 UTC (permalink / raw)
  To: caml-list


Just to chime in on this thread briefly as an AtomCaml author:

* The posts have described AtomCaml fairly accurately (exploits 
uniprocessor, works only for bytecode, is integrated with the bytecode 
thread scheduler).

* AtomCaml is not "officially" supported; it demonstrates that STM 
support in OCaml is "not that hard".  That said, constructive feedback 
is welcome.

* From my perspective, the things missing are:
  1. native-code support (and yes this involves communicating with the 
kernel scheduler)
  2. STM-Haskell style combinators (I see no major problems here).
  3?. We should be clear that OCaml does not provide "true parallelism" 
(i.e., multiprocessing).  The point of AtomCaml is to show how fast STM 
can be given that.

As for "why concurrency if not to run faster", there _are_ other reasons 
for certain applications:
  * Multiple threads are a natural match for computations that are 
naturally decomposed into multiple control contexts.
  * Multiple threads can provide isolation (if one task must abort it 
can throw an exception which terminates only one thread).

--Dan

skaller wrote:
> On Wed, 2006-03-08 at 12:32 +0100, Gerd Stolpmann wrote:
> 
> 
>>Anyway, I think this question is misleading. The point is not to
>>establish locking schemes that use the hardware better (i.e. that are
>>faster), but to help programmers writing correct code.
> 
> 
> My aim would be to do both -- they're not independent.
> Who cares about concurrency if not to try to do some job faster?
> And of course it's no good if you do the wrong thing faster :)
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 19:22     ` Dan Grossman
@ 2006-03-08 22:10       ` skaller
  0 siblings, 0 replies; 34+ messages in thread
From: skaller @ 2006-03-08 22:10 UTC (permalink / raw)
  To: Dan Grossman; +Cc: caml-list

On Wed, 2006-03-08 at 11:22 -0800, Dan Grossman wrote:

> As for "why concurrency if not to run faster", there _are_ other reasons 
> for certain applications:
>   * Multiple threads are a natural match for computations that are 
> naturally decomposed into multiple control contexts.

Yes, but one only uses *pre-emptive* threads here if the language
fails to provide control inversion/synchronous threads.
With synchronous threads you have interleaving, but no
concurrency. Felix, MLton, and Haskell provide synchronous
threads. 

Pre-emptive threads are less useful by comparison
due to their exceptionally poor performance on most OS,
limitations to their numbers, and the need for use of
interlocking primitives (mutex etc) which are generally
not required with synchronous threads. The latter not
only reduces performance, it also makes programming
very much more difficult. In particular, STM isn't as
useful with synchronous threads.

In fact most uses of 'multiple control contexts' call
for control inversion NOT pre-emption or concurrency:
what's required for the 'natural match' is to turn
the program 'inside out'.

Actually the idea extends to asynchronous control as well:
asynchronous callbacks, signals, or interruptions, can also
be interleaved and represented in control inverted form.
Felix is now doing both.

The effect is that as in C you would write 'read channel data'
or 'write channel data' in multiple threads. But actually
by control inverting this is turned into non-concurrent
callback/event driven code. There are indeed 'logical threads
of control' but no concurrency.

In Felix for example, we do use one thread for sockets
to synchronise the OS with the program: the end user, however,
can ignore this and write a web server supporting a million
active connections and not use a single bit of 'concurrency'.
[We could use pollling .. and indeed within the thread we
are: with epoll, kqueue, or io completion ports]

I do agree with what you're saying -- yes threads are a
natural fit for many problems. But 'concurrency' is generally
NOT what is required, and is in fact a serious obstacle to
both performance and ease of programming.

But it all depends on what you mean by 'concurrent'. The word
of course really does imply parallelism: con-current means
literally 'at the same time' which is generally impossible
on a uniprocessor (except perhaps for DMA, H/W accelerated
graphics, etc .. but then really you do have more than one
processor :)

Anyhow all this is just to say: lets not confuse the
need for Concurrency with the need for Control Inversion.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 12:23   ` skaller
@ 2006-03-08 23:02     ` Asfand Yar Qazi
  2006-03-09  0:36       ` skaller
  0 siblings, 1 reply; 34+ messages in thread
From: Asfand Yar Qazi @ 2006-03-08 23:02 UTC (permalink / raw)
  To: caml-list

skaller wrote:
> On Wed, 2006-03-08 at 10:41 +0000, Asfand Yar Qazi wrote:
> 
> 
>>Also, from what I remember, STM is "optimistic", while conventional lock-based 
>>design is "pessimistic" - thereby allowing STM based code to spend less time 
>>checking for locks or something, which apparently makes it quicker.
> 
> 
>>But, I'll lets the experts explain it :-)
> 
> 
> It isn't explanation that is needed but experience
> actually using it. The performance tradeoffs -- including
> developer time -- won't be evident until significant
> applications are developed using it.
> 
> For example I wonder if the technique is worthwhile
> *without* the Haskell type system to enforce correctness?
> 
> And given GHC is currently not generating very good code,
> will it matter anyhow?
> 


To tell the truth, I just want to be on the bleeding edge - hence my desire to 
get away from this horrid imperative programming I've been indoctrinated with, 
and to learn how to get the most out of tomorrows processors, which will all 
be multi-core.

The answer to these problems is obvious - functional programming and 
concurrent programming.  I intend to get experience in as many functional 
languages as possible so that whatever comes up in the future, I'll be able to 
get to grips with it.  And (according to Tim Sweeny anyway) STM allows writing 
multithreaded apps "almost as easily as single threaded ones", so would be the 
solution to the headaches one normally associates with concurrent programming.

But, we'll just have to wait and see.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 23:02     ` Asfand Yar Qazi
@ 2006-03-09  0:36       ` skaller
  0 siblings, 0 replies; 34+ messages in thread
From: skaller @ 2006-03-09  0:36 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

On Wed, 2006-03-08 at 23:02 +0000, Asfand Yar Qazi wrote:

> To tell the truth, I just want to be on the bleeding edge - hence my desire to 
> get away from this horrid imperative programming I've been indoctrinated with, 
> and to learn how to get the most out of tomorrows processors, which will all 
> be multi-core.

hehe .. but procedural programming IS the bleeding edge now.

> (according to Tim Sweeny anyway) STM allows writing 
> multithreaded apps "almost as easily as single threaded ones", so would be the 
> solution to the headaches one normally associates with concurrent programming.

Yes, I tend to feel this is so, because it provides composition.
Another way of doing this is to use channels. As a programmer
I'd like a choice of techniques .. so I'd like both (and locks
as well .. just in case. I still use 'goto' :)

You can try out channels in Ocaml, see the Event module.
This is also 'lock free' in the sense the programmer
writes no locks. You also have composition, but a different
kind to STM (in fact, almost looks like they're dual/control
inverse).

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:06           ` skaller
  2006-03-08 22:10             ` Gerd Stolpmann
  2006-03-08 22:11             ` Brian Hurt
@ 2006-03-11 15:26             ` Florian Weimer
  2 siblings, 0 replies; 34+ messages in thread
From: Florian Weimer @ 2006-03-11 15:26 UTC (permalink / raw)
  To: skaller; +Cc: Brian Hurt, caml-list, William Lovas

> I have no idea if Linux, for example, running SMP kernel,
> is smart enough to know if a mutex is shared between two
> processing units or not: AFAIK Linux doesn't support
> interprocess mutex.

Uhm, Linux and GNU libc do support them for quite some time (many
years if you used one of the enterprise variants which backported them
to Linux 2.4).


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  4:32                   ` skaller
@ 2006-03-09 10:38                     ` John Chu
  0 siblings, 0 replies; 34+ messages in thread
From: John Chu @ 2006-03-09 10:38 UTC (permalink / raw)
  To: caml-list

> I know, so was I -- I've read the AMD64 specs.
[snip]
> The first thread
> has to load stuff from RAM to populate the array,
> which slows it down.
>
> Then it has to write the whole array back to RAM,
> and the second thread reloads the whole array from
> RAM into the cache.

If you've read the AMD64 specs, then you know about the Owned state in 
the cache coherence protocol AMD chips use.  It allows one processor to 
update another processor without having to update main memory. i.e., 
one processor goes from M->O, the other goes from I->S. The O state is 
there to remind the first processor that on an eviction, it can't 
simply drop the cache line (as it could a line in the S state). It must 
write back to main memory as if it were in the M state.

You're obviously right that two processes on two cores use the same 
data, the data will obviously need to be copied from the cache 
hierarchy of one core to the other core. What I'm saying is that it 
does not need to write into main memory first.

						john


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  7:45               ` Andrae Muys
@ 2006-03-09  9:18                 ` David Brown
  0 siblings, 0 replies; 34+ messages in thread
From: David Brown @ 2006-03-09  9:18 UTC (permalink / raw)
  To: Andrae Muys; +Cc: Gerd Stolpmann, Brian Hurt, caml-list, William Lovas, skaller

On Thu, Mar 09, 2006 at 05:45:10PM +1000, Andrae Muys wrote:

> >>I have no idea if Linux, for example, running SMP kernel,
> >>is smart enough to know if a mutex is shared between two
> >>processing units or not: AFAIK Linux doesn't support
> >>interprocess mutex. Windows does. Be interesting to
> >>compare.
> >
> >Of course POSIX supports interprocess mutexes: Mutexes are inherited
> >across fork().
> 
> As skeller suggested, I can confirm this does not work.  Mutexes are  
> not OS resources, they are initialised regions of memory (generally a  
> word).  As fork() does copy-on-write, this leads to the mutex being  
> duplicated NOT shared; and yes given that fork() only duplicates the  
> calling thread, that does mean you can end-up with mutexes locked in  
> the child with no  controlling thread available to unlock them.

The mutex gets duplicated, including the data structures that indicate the
process ids of who might have been waiting for it.  fork() is an
outstanding way to very thorougly hose an pthread application.

Windows has the "advantage" of just not having fork.  Well, not exactly,
since Cygwin demonstrates that it can be faked.

The Linux mutex code does use kernel scheduling resources (to go to sleep
and signal each other), but the initial tests are done through the shared
memory.

Dave


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:10             ` Gerd Stolpmann
  2006-03-08 23:48               ` skaller
@ 2006-03-09  7:45               ` Andrae Muys
  2006-03-09  9:18                 ` David Brown
  1 sibling, 1 reply; 34+ messages in thread
From: Andrae Muys @ 2006-03-09  7:45 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: skaller, Brian Hurt, caml-list, William Lovas


On 09/03/2006, at 8:10 AM, Gerd Stolpmann wrote:

> Am Donnerstag, den 09.03.2006, 09:06 +1100 schrieb skaller:
>> On Wed, 2006-03-08 at 14:45 -0600, Brian Hurt wrote:
>>
>>> One comment I will make is that a mutex is expensive, but not *that*
>>> expensive.  I just wrote a quick program (available if anyone  
>>> cares) in
>>> GNU C that measures the cost, in clocks, of locking and unlocking  
>>> a posix
>>> mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm  
>>> getting a cost
>>> of like 44 clock cycles.  Which makes it less expensive than an  
>>> L2 cache
>>> miss.
>
>> I have no idea if Linux, for example, running SMP kernel,
>> is smart enough to know if a mutex is shared between two
>> processing units or not: AFAIK Linux doesn't support
>> interprocess mutex. Windows does. Be interesting to
>> compare.
>
> Of course POSIX supports interprocess mutexes: Mutexes are inherited
> across fork().

As skeller suggested, I can confirm this does not work.  Mutexes are  
not OS resources, they are initialised regions of memory (generally a  
word).  As fork() does copy-on-write, this leads to the mutex being  
duplicated NOT shared; and yes given that fork() only duplicates the  
calling thread, that does mean you can end-up with mutexes locked in  
the child with no  controlling thread available to unlock them.

fork() safety is an important issue to be aware of when working with  
posix-threads.  The best reference I am familiar with is Programming  
With Threads by Kleiman, Shah, and Smaalders, which includes a whole  
chapter on the this and related topics.

> And Linux even implements threads as processes, so you
> can even place mutexes in shared memory (but do not ask me how).

I've done this in an embedded app I wrote a few years back.  It's not  
hard, you create the shared-memory segment, cast some of it to be of  
type pthread_mutex_t and pass it to pthread_mutex_init()  (I'm  
working from memory here, so the function/type names may be off).

> Anyway, measuring the costs of a mutex is probably not simple. They  
> are
> highly optimized for the frequently occurring cases. And today "SMP"
> behaves often more like NUMA, especially for multi-core CPUs.

Agreed, however in my experience, if optimistic locking isn't going  
to work for you then neither are mutex-based shared-memory approaches.

Andrae Muys

-- 
Andrae Muys
andrae@netymon.com
Principal Kowari Consultant
Netymon Pty Ltd



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  3:19                 ` Brian Hurt
@ 2006-03-09  4:32                   ` skaller
  2006-03-09 10:38                     ` John Chu
  0 siblings, 1 reply; 34+ messages in thread
From: skaller @ 2006-03-09  4:32 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list, William Lovas

On Wed, 2006-03-08 at 21:19 -0600, Brian Hurt wrote:

> > I couldn't believe it! That's why we tried the experiment:
> > did the OS really invalidate the 'whole' caches, or did our
> > trivial increment program fail, because the memory being
> > incremented wasn't coherent?

>  I'm simplifying 
> enormously here, 

I know, so was I -- I've read the AMD64 specs.

Suppose I have a thread that creates an array
which fits roughly in the cache. Then I hand it
over to another thread to fold. The first thread
has to load stuff from RAM to populate the array, 
which slows it down.

Then it has to write the whole array back to RAM,
and the second thread reloads the whole array from
RAM into the cache.

In that case the whole cache has to be not only 
flushed to RAM, it has to be reloaded into another
cache. Pretty nasty .. on a MP machine.  

On a uniprocessor, the same CPU will do the fold,
and the cache doesn't have to be either saved
or reloaded. Yes, that's an extreme case!

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-09  0:23               ` skaller
@ 2006-03-09  3:19                 ` Brian Hurt
  2006-03-09  4:32                   ` skaller
  0 siblings, 1 reply; 34+ messages in thread
From: Brian Hurt @ 2006-03-09  3:19 UTC (permalink / raw)
  To: skaller; +Cc: caml-list, William Lovas



On Thu, 9 Mar 2006, skaller wrote:

> The problem with two threads is that 'some data' has to
> be invalidated too. Otherwise there
> is no way existing Posix software would work on a
> multi-processor.
>
> I couldn't believe it! That's why we tried the experiment:
> did the OS really invalidate the 'whole' caches, or did our
> trivial increment program fail, because the memory being
> incremented wasn't coherent?

Generally both the lock get and lock remove need to do a memory barrier, 
and force all writes to complete to cache.  Once in cache, the cache's do 
cache coherency.  The caches snoop the memory traffic- and if someone is 
reading memory that the cache knows is more up to date in that cache than 
in memory, the read is satisified from that cache and not from memory- and 
the cache updates memory as well at the same time.  I'm simplifying 
enormously here, but the end result is that no, you don't need to flush 
the entire cache on lock release.  Google "cache coherency" or "MESI" for 
more information.

Brian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 23:05               ` Lodewijk Vöge
@ 2006-03-09  3:13                 ` Brian Hurt
  0 siblings, 0 replies; 34+ messages in thread
From: Brian Hurt @ 2006-03-09  3:13 UTC (permalink / raw)
  To: Lodewijk Vöge; +Cc: caml-list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed, Size: 680 bytes --]



On Thu, 9 Mar 2006, Lodewijk Vöge wrote:

> On 8-mrt-2006, at 23:11, Brian Hurt wrote:
>
>> Love to.  Wanna buy me the box?  :-}  Seriously- my code is attached, if 
>> someone wants to run it on other boxes and post the results, feel free. 
>> It's GNU-C/x86 specific, as I'm using GNU C's inline assembler and the 
>> rdtsc instruction to get accurate cycle counts.
>
> I don't know if rdtsc is accurate on 64bit, but this is what running it on a 
> dual opteron 250 running 64bit linux gives:

On 64-bit, I think rdtsc only sets the RAX register.  At least that's what 
I'm assuming is going on- 144 million clocks for a rdtsc seems a bit 
extreme.

Brian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:11             ` Brian Hurt
  2006-03-08 23:05               ` Lodewijk Vöge
  2006-03-08 23:45               ` Robert Roessler
@ 2006-03-09  0:23               ` skaller
  2006-03-09  3:19                 ` Brian Hurt
  2 siblings, 1 reply; 34+ messages in thread
From: skaller @ 2006-03-09  0:23 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list, William Lovas

On Wed, 2006-03-08 at 16:11 -0600, Brian Hurt wrote:

> As to the cache comment: the whole caches don't have to be flushed, just 
> the line the mutex is on. 

Unfortunately that isn't correct ;( Consider:

	lock mutex

	read some data
	write back that data

	unlock mutex

The problem with two threads is that 'some data' has to
be invalidated too. Otherwise there
is no way existing Posix software would work on a
multi-processor.

I couldn't believe it! That's why we tried the experiment:
did the OS really invalidate the 'whole' caches, or did our
trivial increment program fail, because the memory being
incremented wasn't coherent?

Either answer is BAD: its a lose/lose situation. The result
is -- we got the right answers and the 15x slowdown which indicates
that yes, really the whole** cache on both processors is
being invalidated every mutex lock/unlock.

Interestingly .. it should work much better with a pure FPL,
there's nothing to synchronise unless the compiler did some
nasty optimisations like reusing a closure for self-tail-rec
functions, in which case the compiler could tell the OS
which store is dirty. I wouldn't expect that to work
with a C program though :)

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:10             ` Gerd Stolpmann
@ 2006-03-08 23:48               ` skaller
  2006-03-09  7:45               ` Andrae Muys
  1 sibling, 0 replies; 34+ messages in thread
From: skaller @ 2006-03-08 23:48 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Brian Hurt, caml-list, William Lovas

On Wed, 2006-03-08 at 23:10 +0100, Gerd Stolpmann wrote:
> Am Donnerstag, den 09.03.2006, 09:06 +1100 schrieb skaller:

> > I have no idea if Linux, for example, running SMP kernel,
> > is smart enough to know if a mutex is shared between two
> > processing units or not: AFAIK Linux doesn't support
> > interprocess mutex. Windows does. Be interesting to
> > compare.
> 
> Of course POSIX supports interprocess mutexes: Mutexes are inherited
> across fork().

How does that work with the address space being copied?

On Windows, the mutex lives in kernel memory, the user
only gets a handle to it. But at least with Xavier's pthread
implementation (I assume this is Posix) you can do

       pthread_mutex_t fastmutex = PTHREAD_MUTEX_INITIALIZER;

which indicates the mutex lives in user address space.

Fork() copies the user address space (perhaps lazily).
So it seems to me mutexes cannot work across fork()?

>  And Linux even implements threads as processes, so you
> can even place mutexes in shared memory (but do not ask me how).

I was thinking of Windows, where you can actually
create inter-process global objects by naming them, and then find
them in another unrelated process using the string name.

[Placing a mutex is shared memory is the same as any other memory
I guess: you just call pthread_mutex_init]

> Anyway, measuring the costs of a mutex is probably not simple. 

Agreed!

> > is a two thread counter increment experiment on a dual
> > CPU G5 box, where the speed up from 2 CPUs vs 1 was
> > a factor of 15 .. times SLOWER.
> 
> You mean for a highly congested mutex you saw that slowdown. 

Yes, just one trivial experiment. Enough to suggest that
merely upgrading from one core to two won't necessarily
buy you a major performance increase. Probably better
can be done with the right software, low level stuff using
kernel calls, or even processor/board specific patches.
AMD64 for example seems to have reasonably sophisticated
ways of handling cache coherence which can't be provided
generically by an OS like Linux that has to run on
processors with different cache management architectures.

Although these are low level issues, the really interesting
ones are high level. Stuff like STM, control inversion,
garbage collection, etc may be more portable and yield more
benefits (especially if it is portable). But that is also
hard to know without some significant applications and
a lot of measurements.

Anyhow I personally won't be writing off Ocaml any time soon!

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:11             ` Brian Hurt
  2006-03-08 23:05               ` Lodewijk Vöge
@ 2006-03-08 23:45               ` Robert Roessler
  2006-03-09  0:23               ` skaller
  2 siblings, 0 replies; 34+ messages in thread
From: Robert Roessler @ 2006-03-08 23:45 UTC (permalink / raw)
  To: Caml-list

Brian Hurt wrote:
> On Thu, 9 Mar 2006, skaller wrote:
> 
>> Ahem. Now try that on an AMDx2 (dual core). The cost goes through
>> the roof if one process has a thread on each core. Because each
>> core has its own cache and both caches have to be flushed/
>> synchronised. And those caches are BIG!
> 
> Love to.  Wanna buy me the box?  :-}  Seriously- my code is attached, if 
> someone wants to run it on other boxes and post the results, feel free. 
> It's GNU-C/x86 specific, as I'm using GNU C's inline assembler and the 
> rdtsc instruction to get accurate cycle counts.
> 
> As to the cache comment: the whole caches don't have to be flushed, just 
> the line the mutex is on.  Which makes it approximately the cost of a 
> cache miss- that's a good approximation of the cost of getting an 
> uncontended lock.
> 
>>
>> I have no idea if Linux, for example, running SMP kernel,
>> is smart enough to know if a mutex is shared between two
>> processing units or not: AFAIK Linux doesn't support
>> interprocess mutex. Windows does. Be interesting to
>> compare.
> 
> It doesn't look like the mutex software is even going into the kernel. I 
> don't think the Linux kernel even knows the mutex *exists*, let alone 
> what threads are competing for it.  On the x86, at least, lock 
> instructions are not priveledged.

Brian [, this may be fairly off-topic, but], could you possibly
comment on how these results may or may not pertain to g_mutex_lock
(rather than pthread_mutex_lock)?

I have been going to some trouble to operate a glyph-string cache
without needing any locking for the *lookup* case - which is doable,
only needing locking for *adding*.  OTOH, for the flushing operation,
it looks a bit tougher... ;)

But if g_mutex_lock/g_mutex_unlock is not all that grim...

Robert Roessler
roessler@rftp.com
http://www.rftp.com


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:11             ` Brian Hurt
@ 2006-03-08 23:05               ` Lodewijk Vöge
  2006-03-09  3:13                 ` Brian Hurt
  2006-03-08 23:45               ` Robert Roessler
  2006-03-09  0:23               ` skaller
  2 siblings, 1 reply; 34+ messages in thread
From: Lodewijk Vöge @ 2006-03-08 23:05 UTC (permalink / raw)
  To: caml-list

On 8-mrt-2006, at 23:11, Brian Hurt wrote:

> Love to.  Wanna buy me the box?  :-}  Seriously- my code is  
> attached, if someone wants to run it on other boxes and post the  
> results, feel free. It's GNU-C/x86 specific, as I'm using GNU C's  
> inline assembler and the rdtsc instruction to get accurate cycle  
> counts.

I don't know if rdtsc is accurate on 64bit, but this is what running  
it on a dual opteron 250 running 64bit linux gives:

$ ./lock
Minimum time for a rdtsc instruction (in clocks): 144099185
Minimum time for a mutex lock+unlock + rdtsc (in clocks): 24

multiple runs hover between this and 37 clocks for the mutex, and  
more than twice for the rdtsc.

Lodewijk


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:06           ` skaller
  2006-03-08 22:10             ` Gerd Stolpmann
@ 2006-03-08 22:11             ` Brian Hurt
  2006-03-08 23:05               ` Lodewijk Vöge
                                 ` (2 more replies)
  2006-03-11 15:26             ` Florian Weimer
  2 siblings, 3 replies; 34+ messages in thread
From: Brian Hurt @ 2006-03-08 22:11 UTC (permalink / raw)
  To: skaller; +Cc: William Lovas, caml-list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1608 bytes --]



On Thu, 9 Mar 2006, skaller wrote:

> Ahem. Now try that on an AMDx2 (dual core). The cost goes through
> the roof if one process has a thread on each core. Because each
> core has its own cache and both caches have to be flushed/
> synchronised. And those caches are BIG!

Love to.  Wanna buy me the box?  :-}  Seriously- my code is attached, if 
someone wants to run it on other boxes and post the results, feel free. 
It's GNU-C/x86 specific, as I'm using GNU C's inline assembler and the 
rdtsc instruction to get accurate cycle counts.

As to the cache comment: the whole caches don't have to be flushed, just 
the line the mutex is on.  Which makes it approximately the cost of a 
cache miss- that's a good approximation of the cost of getting an 
uncontended lock.

>
> I have no idea if Linux, for example, running SMP kernel,
> is smart enough to know if a mutex is shared between two
> processing units or not: AFAIK Linux doesn't support
> interprocess mutex. Windows does. Be interesting to
> compare.

It doesn't look like the mutex software is even going into the kernel. 
I don't think the Linux kernel even knows the mutex *exists*, let alone 
what threads are competing for it.  On the x86, at least, lock 
instructions are not priveledged.

>
> As mentioned before the only data I have at the moment
> is a two thread counter increment experiment on a dual
> CPU G5 box, where the speed up from 2 CPUs vs 1 was
> a factor of 15 .. times SLOWER.

If you're ping-ponging a cache line between two CPUs (and the AMD dual 
cores count as two CPUs), then I can easily beleive that.

So?

Brian

[-- Attachment #2: Type: TEXT/X-CSRC, Size: 1847 bytes --]

#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>

#if !defined(__GNUC__) && !defined(__i386__)
#error This code only works with GCC/i386.
#endif

/* The reason this only works under GNU C and the x86 is we're using the
 * rdtsc instruction.
 */
static inline unsigned long long rdtsc() {
	unsigned long long rval;

	asm volatile ("rdtsc" : "=A" (rval));
	return rval;
}

static sem_t waiting_thread_semaphore;
static pthread_mutex_t mutex;

void * waiting_thread_func(void * param __attribute__((unused))) {
	sem_wait(&waiting_thread_semaphore);
	return NULL;
}

int main(void) {
	int i;
	pthread_t waiting_thread;
	void * trash;
	unsigned long long start, stop, time, min;


	/* Create a thread to force us to actually do multi-threaded work */
	sem_init(&waiting_thread_semaphore, 1, 0);
	pthread_create(&waiting_thread, NULL, waiting_thread_func, NULL);

	pthread_mutex_init(&mutex, NULL);

	/* Time how long a rdtsc takes- we do this ten times and take the 
	 * cheapest run.
	 */
	min = ~0ull;
	for (i = 0; i < 10; ++i) {
		start = rdtsc();
		stop = rdtsc();
		time = stop - start;
		if (time < min) {
			min = time;
		}
	}
	printf("Minimum time for a rdtsc instruction (in clocks): %llu\n", min);

	/* Now time how long the pair of mutex lock + unlock take */
	min = ~0ull;
	for (i = 0; i < 10; ++i) {
		start = rdtsc();
		pthread_mutex_lock(&mutex);
		pthread_mutex_unlock(&mutex);
		stop = rdtsc();
		time = stop - start;
		if (time < min) {
			min = time;
		}
	}
	printf("Minimum time for a mutex lock+unlock + rdtsc (in clocks): %llu\n", min);

	/* Clean up the waiting thread we spawned just to be multithreaded. */
	sem_post(&waiting_thread_semaphore);
	pthread_join(waiting_thread, &trash);
	sem_destroy(&waiting_thread_semaphore);
	return 0;
}


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 22:06           ` skaller
@ 2006-03-08 22:10             ` Gerd Stolpmann
  2006-03-08 23:48               ` skaller
  2006-03-09  7:45               ` Andrae Muys
  2006-03-08 22:11             ` Brian Hurt
  2006-03-11 15:26             ` Florian Weimer
  2 siblings, 2 replies; 34+ messages in thread
From: Gerd Stolpmann @ 2006-03-08 22:10 UTC (permalink / raw)
  To: skaller; +Cc: Brian Hurt, caml-list, William Lovas

Am Donnerstag, den 09.03.2006, 09:06 +1100 schrieb skaller:
> On Wed, 2006-03-08 at 14:45 -0600, Brian Hurt wrote:
> 
> > One comment I will make is that a mutex is expensive, but not *that* 
> > expensive.  I just wrote a quick program (available if anyone cares) in 
> > GNU C that measures the cost, in clocks, of locking and unlocking a posix 
> > mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm getting a cost 
> > of like 44 clock cycles.  Which makes it less expensive than an L2 cache 
> > miss.

> I have no idea if Linux, for example, running SMP kernel,
> is smart enough to know if a mutex is shared between two
> processing units or not: AFAIK Linux doesn't support
> interprocess mutex. Windows does. Be interesting to
> compare.

Of course POSIX supports interprocess mutexes: Mutexes are inherited
across fork(). And Linux even implements threads as processes, so you
can even place mutexes in shared memory (but do not ask me how).

Anyway, measuring the costs of a mutex is probably not simple. They are
highly optimized for the frequently occurring cases. And today "SMP"
behaves often more like NUMA, especially for multi-core CPUs.

> As mentioned before the only data I have at the moment
> is a two thread counter increment experiment on a dual
> CPU G5 box, where the speed up from 2 CPUs vs 1 was
> a factor of 15 .. times SLOWER.

You mean for a highly congested mutex you saw that slowdown. 

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 20:45         ` Brian Hurt
  2006-03-08 21:14           ` Paul Snively
@ 2006-03-08 22:06           ` skaller
  2006-03-08 22:10             ` Gerd Stolpmann
                               ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: skaller @ 2006-03-08 22:06 UTC (permalink / raw)
  To: Brian Hurt; +Cc: William Lovas, caml-list

On Wed, 2006-03-08 at 14:45 -0600, Brian Hurt wrote:

> One comment I will make is that a mutex is expensive, but not *that* 
> expensive.  I just wrote a quick program (available if anyone cares) in 
> GNU C that measures the cost, in clocks, of locking and unlocking a posix 
> mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm getting a cost 
> of like 44 clock cycles.  Which makes it less expensive than an L2 cache 
> miss.

Ahem. Now try that on an AMDx2 (dual core). The cost goes through
the roof if one process has a thread on each core. Because each
core has its own cache and both caches have to be flushed/
synchronised. And those caches are BIG!

I had hoped to compare dual CPU with dual core by buying a 
two CPU box (i.e. with two dual cores on it) but they're 2-3
times more expensive because it seems there's no board that
supports Athlons (you need Opterons which cost a lot more).

I have no idea if Linux, for example, running SMP kernel,
is smart enough to know if a mutex is shared between two
processing units or not: AFAIK Linux doesn't support
interprocess mutex. Windows does. Be interesting to
compare.

As mentioned before the only data I have at the moment
is a two thread counter increment experiment on a dual
CPU G5 box, where the speed up from 2 CPUs vs 1 was
a factor of 15 .. times SLOWER.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 20:45         ` Brian Hurt
@ 2006-03-08 21:14           ` Paul Snively
  2006-03-08 22:06           ` skaller
  1 sibling, 0 replies; 34+ messages in thread
From: Paul Snively @ 2006-03-08 21:14 UTC (permalink / raw)
  To: Brian Hurt; +Cc: William Lovas, caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You folks might find Tim Sweeney's slides from POPL '06, which are  
linked to at <http://lambda-the-ultimate.org/node/1277>, interesting.  
Tim gives some statistics about Gears of War, a title with about  
250,000 lines of combined C++ and UnrealScript code for the title  
itself, and another 250,000 lines of C++ for the Unreal Engine 3, and  
goes on to discuss some issues that he has. For example, he has about  
10,000 objects active at any given time. Whenever one of these  
objects is modified, it typically touches 5-10 other objects. And all  
of this has to happen 30-60 times per second.

On slide 50, Tim talks about STM, and says:

"~2-4X STM performance overhead is acceptable: if it enables our  
state-intensive code to scale to many threads, it's still a win."

Food for thought.

Best regards,
Paul

On Mar 8, 2006, at 12:45 PM, Brian Hurt wrote:

>
>
> On Wed, 8 Mar 2006, William Lovas wrote:
>
>> As the quoted paper said, though, some running transaction can always
>> commit -- so not all of the associated costs are still paid.  In
>> particular, the cost of potential non-termination due to deadlock or
>> livelock is not paid.  It doesn't matter that there's a mutex  
>> under the
>> hood, since it's used only in such a way as to preserve the  
>> property that
>> some transaction can always complete.
>
> One comment I will make is that a mutex is expensive, but not  
> *that* expensive.  I just wrote a quick program (available if  
> anyone cares) in GNU C that measures the cost, in clocks, of  
> locking and unlocking a posix mutex.  On my desktop box (AMD Athlon  
> XP 2200+ 1.8GHz), I'm getting a cost of like 44 clock cycles.   
> Which makes it less expensive than an L2 cache miss.
>
> At this point correctness is a much bigger concern of mine than  
> absolute performance.  A fair bit of conservative or unnecessary  
> locking is acceptable, given than I can write a complicated and  
> highly scalable application and know that I don't have deadlocks or  
> livelocks.  Not having race conditions would also be nice, but I  
> don't think that's possible with mutable data.
>
> Brian
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iEYEARECAAYFAkQPSSoACgkQS6yIxITC5OqqrACeK1Se7ZKbyKP9bFZSJXcoe6t/
P7IAoLr9Z/1IL2341P2ARcnMyEj+yZ1G
=Lse8
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08 19:36       ` William Lovas
@ 2006-03-08 20:45         ` Brian Hurt
  2006-03-08 21:14           ` Paul Snively
  2006-03-08 22:06           ` skaller
  0 siblings, 2 replies; 34+ messages in thread
From: Brian Hurt @ 2006-03-08 20:45 UTC (permalink / raw)
  To: William Lovas; +Cc: caml-list



On Wed, 8 Mar 2006, William Lovas wrote:

> As the quoted paper said, though, some running transaction can always
> commit -- so not all of the associated costs are still paid.  In
> particular, the cost of potential non-termination due to deadlock or
> livelock is not paid.  It doesn't matter that there's a mutex under the
> hood, since it's used only in such a way as to preserve the property that
> some transaction can always complete.

One comment I will make is that a mutex is expensive, but not *that* 
expensive.  I just wrote a quick program (available if anyone cares) in 
GNU C that measures the cost, in clocks, of locking and unlocking a posix 
mutex.  On my desktop box (AMD Athlon XP 2200+ 1.8GHz), I'm getting a cost 
of like 44 clock cycles.  Which makes it less expensive than an L2 cache 
miss.

At this point correctness is a much bigger concern of mine than absolute 
performance.  A fair bit of conservative or unnecessary locking is 
acceptable, given than I can write a complicated and highly scalable 
application and know that I don't have deadlocks or livelocks.  Not having 
race conditions would also be nice, but I don't think that's possible with 
mutable data.

Brian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08  0:52     ` skaller
  2006-03-08 10:38       ` Asfand Yar Qazi
@ 2006-03-08 19:36       ` William Lovas
  2006-03-08 20:45         ` Brian Hurt
  1 sibling, 1 reply; 34+ messages in thread
From: William Lovas @ 2006-03-08 19:36 UTC (permalink / raw)
  To: caml-list

On Wed, Mar 08, 2006 at 11:52:05AM +1100, skaller wrote:
> On Tue, 2006-03-07 at 19:05 +0000, Asfand Yar Qazi wrote:
> > As for claim 1.  "Lock-free" doesn't mean what you think it does. 
> 
> I know what STM does, thank you: I intend to implement it
> myself in my own programming language. Maybe you should
> read more carefully.
> 
> I said "protected by a mutex under the hood." which means
> sure, the programmer is not writing locks, but they're used
> in the implementation and the associated costs are still paid.

As the quoted paper said, though, some running transaction can always
commit -- so not all of the associated costs are still paid.  In
particular, the cost of potential non-termination due to deadlock or
livelock is not paid.  It doesn't matter that there's a mutex under the
hood, since it's used only in such a way as to preserve the property that
some transaction can always complete.

> I really hate it when people try to throw papers against
> simple logic. I said what the tradeoffs where:

In this case, i think the paper was right :)  Its statement of
"lock-free"ness was with regards to semantics, not performance.

You may be correct that there is some other property, regarding
performance, that the STM Haskell implementation does not have,
but it is not what their paper calls "lock-free".

William


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08  0:52     ` skaller
@ 2006-03-08 10:38       ` Asfand Yar Qazi
  2006-03-08 19:36       ` William Lovas
  1 sibling, 0 replies; 34+ messages in thread
From: Asfand Yar Qazi @ 2006-03-08 10:38 UTC (permalink / raw)
  To: caml-list

skaller wrote:
> On Tue, 2006-03-07 at 19:05 +0000, Asfand Yar Qazi wrote:
> 
> 
>>You make several claims:
>>
>>STM is not lock free.
>>STM is not useful on a small number of processors
>>
>>As for claim 1.  "Lock-free" doesn't mean what you think it does. 
> 
> 
> I know what STM does, thank you: I intend to implement it
> myself in my own programming language. Maybe you should
> read more carefully.
> 
> I said "protected by a mutex under the hood." which means
> sure, the programmer is not writing locks, but they're used
> in the implementation and the associated costs are still paid.
> 
> I really hate it when people try to throw papers against
> simple logic. I said what the tradeoffs where:
> 
> "It simply limits the locking period
>  to a bounded time, at the expense of the whole transaction
> taking unbounded time."
> 
> and then elaborated the conditions under which this
> made sense.
> 
> Long locking period on a Uniprocessor not only do not
> cause problems they can actually IMPROVE performance by preventing
> expensive context switches. 
> 
> A paper is cached here on my website, probably one of the
> ones you cited.
> 
> http://felix.sourceforge.net/papers/ea8-composablememory_stm.pdf
> 
> It's quite interesting and I've bought a dual core CPU specifically
> to test it out. The only numbers I can give you are based on a simple
> lock test on a dual core G5 incrementing an integer: 15x SLOWER
> on a dual processor than a uniprocessor with two threads.
> 
> No doubt because of the weak support provided by Linux.
> Windows may do better, haven't tried yet, but I doubt anything
> older than Vista has suitable API support.
> 
> In the end, fast concurrency is going to depend on both CPU and 
> board design and OS support. The point of the above paper is 
> not performance: the point is as I said, Sebastian said, 
> AND the paper emphasises: it provides a model which 
> supports composition.
> 
> I point out that in fact, under the right conditions -- lots
> of processors and lots of variables -- it will probably provide better
> performance too. However this is hard to test -- not many
> of us have access to >2 cores on the same board. There certainly
> no way POSIX can deliver good performance: mutexes have to be
> synchronisation points and that requires ALL the CPUs to 
> flush their caches -- it doesn't scale. Message passing does,
> since sender and receiver only need to sync the message.
> Explicit coupling, and both the subset of processor and
> memory are limited.
> 
> Oh, and Ocaml supports message passing between processes .. :)
> 


Bad form on my part old chap - didn't realise your level of expertise.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-08  0:37     ` Asfand Yar Qazi
@ 2006-03-08  5:05       ` Erick Tryzelaar
  0 siblings, 0 replies; 34+ messages in thread
From: Erick Tryzelaar @ 2006-03-08  5:05 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

Asfand Yar Qazi wrote:
> Now there are 2 things missing:
>
> - Multi-threaded support
> - Optional lazy evaluation
>
> and we have a Haskell beater :-)
>
only one thing left, as ocaml has said optional lazy evaluation:

http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#htoc97

:)

-e


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 19:05   ` Asfand Yar Qazi
@ 2006-03-08  0:52     ` skaller
  2006-03-08 10:38       ` Asfand Yar Qazi
  2006-03-08 19:36       ` William Lovas
  0 siblings, 2 replies; 34+ messages in thread
From: skaller @ 2006-03-08  0:52 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

On Tue, 2006-03-07 at 19:05 +0000, Asfand Yar Qazi wrote:

> You make several claims:
> 
> STM is not lock free.
> STM is not useful on a small number of processors
> 
> As for claim 1.  "Lock-free" doesn't mean what you think it does. 

I know what STM does, thank you: I intend to implement it
myself in my own programming language. Maybe you should
read more carefully.

I said "protected by a mutex under the hood." which means
sure, the programmer is not writing locks, but they're used
in the implementation and the associated costs are still paid.

I really hate it when people try to throw papers against
simple logic. I said what the tradeoffs where:

"It simply limits the locking period
 to a bounded time, at the expense of the whole transaction
taking unbounded time."

and then elaborated the conditions under which this
made sense.

Long locking period on a Uniprocessor not only do not
cause problems they can actually IMPROVE performance by preventing
expensive context switches. 

A paper is cached here on my website, probably one of the
ones you cited.

http://felix.sourceforge.net/papers/ea8-composablememory_stm.pdf

It's quite interesting and I've bought a dual core CPU specifically
to test it out. The only numbers I can give you are based on a simple
lock test on a dual core G5 incrementing an integer: 15x SLOWER
on a dual processor than a uniprocessor with two threads.

No doubt because of the weak support provided by Linux.
Windows may do better, haven't tried yet, but I doubt anything
older than Vista has suitable API support.

In the end, fast concurrency is going to depend on both CPU and 
board design and OS support. The point of the above paper is 
not performance: the point is as I said, Sebastian said, 
AND the paper emphasises: it provides a model which 
supports composition.

I point out that in fact, under the right conditions -- lots
of processors and lots of variables -- it will probably provide better
performance too. However this is hard to test -- not many
of us have access to >2 cores on the same board. There certainly
no way POSIX can deliver good performance: mutexes have to be
synchronisation points and that requires ALL the CPUs to 
flush their caches -- it doesn't scale. Message passing does,
since sender and receiver only need to sync the message.
Explicit coupling, and both the subset of processor and
memory are limited.

Oh, and Ocaml supports message passing between processes .. :)

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 17:44   ` Michael Hicks
@ 2006-03-08  0:37     ` Asfand Yar Qazi
  2006-03-08  5:05       ` Erick Tryzelaar
  0 siblings, 1 reply; 34+ messages in thread
From: Asfand Yar Qazi @ 2006-03-08  0:37 UTC (permalink / raw)
  To: caml-list

Michael Hicks wrote:
> There's a longer answer, but one short answer is: check out AtomCaml  at 
> http://www.cs.washington.edu/homes/miker/atomcaml/
> 
> -Mike
> 

That is actually really really useful just on its own!

Now there are 2 things missing:

- Multi-threaded support
- Optional lazy evaluation

and we have a Haskell beater :-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 17:15 ` skaller
@ 2006-03-07 19:05   ` Asfand Yar Qazi
  2006-03-08  0:52     ` skaller
  0 siblings, 1 reply; 34+ messages in thread
From: Asfand Yar Qazi @ 2006-03-07 19:05 UTC (permalink / raw)
  To: caml-list

skaller wrote:
> On Tue, 2006-03-07 at 16:18 +0000, Asfand Yar Qazi wrote:
> 
>>Hi,
>>
>>I've temporarily decided to leave off learning OCaml (although I still intend 
>>to learn it at some point) and start learning Haskell due to its support for 
>>Software Transactional Memory and lock-free concurrent programming.
> 
> 
> AFAIK STM is not lock free. It simply limits the locking period
> to a bounded time, at the expense of the whole transaction
> taking unbounded time. The final compare/write/retry must
> be atomic and is therefore protected by a mutex under the hood.
> 
> Sebastian Egner said in another post the main advantage
> of STMs: they have a combinator form, that is, they
> can be composed.
> 
> However, despite the lack of safety in a bolt on 
> addition for Ocaml .. the real problem is that STM
> isn't that useful unless you have 
> 
> (a) a lot of processors
> (b) a lot of variables
> 
> so that the risk of contention is low and the cost
> of long exclusions is high. Ocaml fails to satisfy
> property (a) since it doesn't support multi-processing.
> 

You make several claims:

STM is not lock free.
STM is not useful on a small number of processors

With all due respect, these papers refutes these claims:

http://research.microsoft.com/~simonpj/papers/stm/index.htm

http://research.microsoft.com/~simonpj/papers/stm/lock-free.htm

(That's right - the premier research on STM is being done by Micro$oft - yuck.)

As for claim 1.  "Lock-free" doesn't mean what you think it does.  Quote from 
first paper, page 11:

"The STM implementation guarantees that one transaction
can force another to abort only when the first one commits.
As a result, the STM implementation is lock-free in the sense
that it guarantees at any time that some running transaction
can successfully commit."

As for claim 2, note the two-processor performance graphs of their tests 
(second paper, "lock-free.htm", pages 14-15): STM blows the hell out of 
conventional lock-based parallel processing.

As for the lot of variables bit, I intend to have a lot of the buggers :-)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 16:50 ` [Caml-list] " Sebastian Egner
@ 2006-03-07 17:44   ` Michael Hicks
  2006-03-08  0:37     ` Asfand Yar Qazi
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Hicks @ 2006-03-07 17:44 UTC (permalink / raw)
  To: caml-list

There's a longer answer, but one short answer is: check out AtomCaml  
at http://www.cs.washington.edu/homes/miker/atomcaml/

-Mike

On Mar 7, 2006, at 11:50 AM, Sebastian Egner wrote:

>
> > Are there any plans to add STM to OCaml?  And I'm talking highly  
> integrated
> > into the language, not just some bolt-on library.
>
> I implemented 'some bolt-on library' as you call it for
> myself (for understanding the details of STM) but did
> not release it. The main lesseons I learned, however, are
>
> a) A library will be nearly as good as something 'highly
> integrated into the language'. The main difference is
> that you cannot annotate fields to be transactional,
> and define transactional record types; you need to
> work with explicit transactional reference boxes and arrays.
> This appears tolerable, and it is a lot simpler, both in
> terms of implementation and in terms of using it.
>
> b) The version of STM as designed for Haskell makes essential
> use of laziness for composing transactions in a very clean
> way. You can construct more and more complex transactions
> by not yet enclosing them into 'atomic'. This is probably
> the most important benefit of STMs at all. Unfortunately,
> this property is essentially impossible in OCaml: No matter
> how highly integrated STMs are in the language, it will
> always be easy to write a program that uses ordinary (non-STM)
> state to keep a record of something it shouldn't---messing
> up semantics of course.
>
> Sorry, I know this is bad news, but that is what I found.
> I invite everybody to verify this for him- or herself,
> or prove me wrong.
>
> Sebastian.
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 16:18 Asfand Yar Qazi
  2006-03-07 16:50 ` [Caml-list] " Sebastian Egner
@ 2006-03-07 17:15 ` skaller
  2006-03-07 19:05   ` Asfand Yar Qazi
  1 sibling, 1 reply; 34+ messages in thread
From: skaller @ 2006-03-07 17:15 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: caml-list

On Tue, 2006-03-07 at 16:18 +0000, Asfand Yar Qazi wrote:
> Hi,
> 
> I've temporarily decided to leave off learning OCaml (although I still intend 
> to learn it at some point) and start learning Haskell due to its support for 
> Software Transactional Memory and lock-free concurrent programming.

AFAIK STM is not lock free. It simply limits the locking period
to a bounded time, at the expense of the whole transaction
taking unbounded time. The final compare/write/retry must
be atomic and is therefore protected by a mutex under the hood.

Sebastian Egner said in another post the main advantage
of STMs: they have a combinator form, that is, they
can be composed.

However, despite the lack of safety in a bolt on 
addition for Ocaml .. the real problem is that STM
isn't that useful unless you have 

(a) a lot of processors
(b) a lot of variables

so that the risk of contention is low and the cost
of long exclusions is high. Ocaml fails to satisfy
property (a) since it doesn't support multi-processing.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Caml-list] STM support in OCaml
  2006-03-07 16:18 Asfand Yar Qazi
@ 2006-03-07 16:50 ` Sebastian Egner
  2006-03-07 17:44   ` Michael Hicks
  2006-03-07 17:15 ` skaller
  1 sibling, 1 reply; 34+ messages in thread
From: Sebastian Egner @ 2006-03-07 16:50 UTC (permalink / raw)
  To: caml-list; +Cc: Asfand Yar Qazi

[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]

> Are there any plans to add STM to OCaml?  And I'm talking highly 
integrated 
> into the language, not just some bolt-on library.

I implemented 'some bolt-on library' as you call it for
myself (for understanding the details of STM) but did
not release it. The main lesseons I learned, however, are

a) A library will be nearly as good as something 'highly
integrated into the language'. The main difference is
that you cannot annotate fields to be transactional,
and define transactional record types; you need to
work with explicit transactional reference boxes and arrays.
This appears tolerable, and it is a lot simpler, both in
terms of implementation and in terms of using it.

b) The version of STM as designed for Haskell makes essential 
use of laziness for composing transactions in a very clean
way. You can construct more and more complex transactions
by not yet enclosing them into 'atomic'. This is probably
the most important benefit of STMs at all. Unfortunately,
this property is essentially impossible in OCaml: No matter
how highly integrated STMs are in the language, it will 
always be easy to write a program that uses ordinary (non-STM)
state to keep a record of something it shouldn't---messing
up semantics of course.

Sorry, I know this is bad news, but that is what I found.
I invite everybody to verify this for him- or herself,
or prove me wrong.

Sebastian.

[-- Attachment #2: Type: text/html, Size: 2249 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2006-03-11 15:26 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-08 10:11 [Caml-list] STM support in OCaml yoann padioleau
2006-03-08 10:41 ` Asfand Yar Qazi
2006-03-08 12:23   ` skaller
2006-03-08 23:02     ` Asfand Yar Qazi
2006-03-09  0:36       ` skaller
2006-03-08 11:32 ` Gerd Stolpmann
2006-03-08 12:04   ` skaller
2006-03-08 19:22     ` Dan Grossman
2006-03-08 22:10       ` skaller
  -- strict thread matches above, loose matches on Subject: below --
2006-03-07 16:18 Asfand Yar Qazi
2006-03-07 16:50 ` [Caml-list] " Sebastian Egner
2006-03-07 17:44   ` Michael Hicks
2006-03-08  0:37     ` Asfand Yar Qazi
2006-03-08  5:05       ` Erick Tryzelaar
2006-03-07 17:15 ` skaller
2006-03-07 19:05   ` Asfand Yar Qazi
2006-03-08  0:52     ` skaller
2006-03-08 10:38       ` Asfand Yar Qazi
2006-03-08 19:36       ` William Lovas
2006-03-08 20:45         ` Brian Hurt
2006-03-08 21:14           ` Paul Snively
2006-03-08 22:06           ` skaller
2006-03-08 22:10             ` Gerd Stolpmann
2006-03-08 23:48               ` skaller
2006-03-09  7:45               ` Andrae Muys
2006-03-09  9:18                 ` David Brown
2006-03-08 22:11             ` Brian Hurt
2006-03-08 23:05               ` Lodewijk Vöge
2006-03-09  3:13                 ` Brian Hurt
2006-03-08 23:45               ` Robert Roessler
2006-03-09  0:23               ` skaller
2006-03-09  3:19                 ` Brian Hurt
2006-03-09  4:32                   ` skaller
2006-03-09 10:38                     ` John Chu
2006-03-11 15:26             ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).