From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p2PKR7CK003423
	for <caml-list@sympa-roc.inria.fr>; Fri, 25 Mar 2011 21:27:07 +0100
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AtQCAPH5jE3U4367lGdsb2JhbAAmhB+hHhUBAQEJCwkJFAMiiE2qJJEGAoElgU2BfncEiC+IGg
X-IronPort-AV: E=Sophos;i="4.63,244,1299452400"; 
   d="scan'208";a="79105731"
Received: from moutng.kundenserver.de ([212.227.126.187])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 25 Mar 2011 21:27:01 +0100
Received: from office1.lan.sumadev.de (dslb-188-097-077-012.pools.arcor-ip.net [188.97.77.12])
	by mrelayeu.kundenserver.de (node=mreu2) with ESMTP (Nemesis)
	id 0LhRY4-1Ph1vw2JYu-00mVGk; Fri, 25 Mar 2011 21:27:00 +0100
Received: from [192.168.1.111] (546BE640.cm-12-4d.dynamic.ziggo.nl [84.107.230.64])
	by office1.lan.sumadev.de (Postfix) with ESMTPA id 303E25F701;
	Fri, 25 Mar 2011 21:27:00 +0100 (CET)
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Hugo Ferreira <hmf@inescporto.pt>
Cc: Martin Jambon <martin.jambon@ens-lyon.org>, caml-list@inria.fr
In-Reply-To: <4D8CEAA4.2030403@inescporto.pt>
References: <2054357367.219171.1300974318806.JavaMail.root@zmbs4.inria.fr>
	 <4D8BD02D.1010505@inria.fr>  <4D8C73C8.6020801@inescporto.pt>
	 <1301055903.8429.314.camel@thinkpad>
	 <341494683.237537.1301057887481.JavaMail.root@zmbs4.inria.fr>
	 <4D8C944A.9060601@inria.fr> <4D8CB859.9040709@inescporto.pt>
	 <4D8CDDCC.4010000@ens-lyon.org>  <4D8CEAA4.2030403@inescporto.pt>
Content-Type: text/plain; charset="UTF-8"
Date: Fri, 25 Mar 2011 21:26:58 +0100
Message-ID: <1301084818.8429.435.camel@thinkpad>
Mime-Version: 1.0
X-Mailer: Evolution 2.28.1 
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:Nd/Zwwht6P66GZkwAgzksOGjUB85tiYYcabjWzR6weL
 PrRYDKRJdQzvGWLpFHit6UCiYHLbTg55Hpr16Z6Ri6zADGipVp
 +aRBOlRLnHrSqwahOf2T+Q/a/3JGn/8vSUfjLI7v5+gDXkieuv
 O51ih7yQ7Dh35vQB+LU2gwNOQ0UI/+4J+JT9w8EbqTPyvfGOId
 jyAHGO/GMbuvN6DuJ1TLA==
Subject: Re: [Caml-list] Efficient OCaml multicore -- roadmap?

Am Freitag, den 25.03.2011, 19:19 +0000 schrieb Hugo Ferreira:
> On 03/25/2011 06:24 PM, Martin Jambon wrote:
> >> On 03/25/2011 01:10 PM, Fabrice Le Fessant wrote:
> >>>    Of course, sharing structured mutable data between threads will not be
> >>> possible, but actually, it is a good thing if you want to write correct
> >>> programs ;-)
> >
> > On 03/25/11 08:44, Hugo Ferreira replied:
> >> I'll stick to my guns here. It simply makes solving certain problem
> >> unfeasible. Point in case: I work on machine learning algorithms. I
> >> use large data-structures that must be processed (altered)
> >> in order to learn. Because these data-structures are large it become
> >> impractical to copy this to a process every time I start off a new
> >> "thread".
> >
> > The solution would be to use get/set via a message-passing interface.
> >
> 
> Cannot see how this works. Say I want to share a balanced binary tree.
> Several processes/threads each take this tree and alter it by adding and
> deleting elements. Each (new) tree is then further processed by other
> processes/threads.
> 
> How can get/set be used in this scenario?
> 
> >> From my purely speculative perspective, it seems unavoidable that
> > message-passing happens at some level in order to keep a shared data
> > structure in sync between a large number of processors. In other words,
> > any access to a shared data structure requires some physical copy no
> > matter what the programming language makes it look like.
> >
> 
> I assume you are referring to multi-processing were memory is not shared 
> amongst CPU's, correct?

This is quite normal nowadays when you have several CPUs in a (server)
system. Each CPU gets its own cache, or even its own bank of RAM (e.g.
all Opterons have this). And there is indeed message passing on the
hardware level (HyperTransport, QuickPath, or Infiniband). For the
software, it still looks as if memory was uniform (cache coherency), but
under the hood messages are exchanged to get this effect (or even RAM
accesses are routed over the CPU interconnect).

The messages are only noticeable from software as speed degradation when
you access RAM in the wrong way. E.g. you can see this when you
read/modify/write the same memory cell in a loop from two threads
running on two cores. This is a lot slower than if only a single core
did this because the two cores constantly exchange messages and have to
wait until the delivery of the message is done (this is also known as
cache line bouncing).

When you implement a message passing API for a high level language, the
way to go is to provide message buffers in memory. When a thread
delivers a message it just writes to the buffer. The real message,
however, is sent by the hardware under the hood, and must be delivered
until the reading thread synchronizes. The point is here, IMHO, that you
exploit the features of the hardware the most when you do it in a way
the hardware can deal best with. Thus a program written against the
message passing API will finally be faster than one that uncritically
assumes a uniform memory, and modifies memory in-place, and will finally
suffer from cache line bouncing.

For current standard server hardware there are usually not enough CPUs
to get a big effect. But in a few years it will be very important (as it
is for current supercomputers), maybe even for standard 128 core laptops
in 2015 (just guessing).

Gerd


> 
> Hugo F.
> 
> 
> >
> > Martin
> >
> 
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------