From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p2PKR7CK003423 for ; Fri, 25 Mar 2011 21:27:07 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtQCAPH5jE3U4367lGdsb2JhbAAmhB+hHhUBAQEJCwkJFAMiiE2qJJEGAoElgU2BfncEiC+IGg X-IronPort-AV: E=Sophos;i="4.63,244,1299452400"; d="scan'208";a="79105731" Received: from moutng.kundenserver.de ([212.227.126.187]) by mail3-smtp-sop.national.inria.fr with ESMTP; 25 Mar 2011 21:27:01 +0100 Received: from office1.lan.sumadev.de (dslb-188-097-077-012.pools.arcor-ip.net [188.97.77.12]) by mrelayeu.kundenserver.de (node=mreu2) with ESMTP (Nemesis) id 0LhRY4-1Ph1vw2JYu-00mVGk; Fri, 25 Mar 2011 21:27:00 +0100 Received: from [192.168.1.111] (546BE640.cm-12-4d.dynamic.ziggo.nl [84.107.230.64]) by office1.lan.sumadev.de (Postfix) with ESMTPA id 303E25F701; Fri, 25 Mar 2011 21:27:00 +0100 (CET) From: Gerd Stolpmann To: Hugo Ferreira Cc: Martin Jambon , caml-list@inria.fr In-Reply-To: <4D8CEAA4.2030403@inescporto.pt> References: <2054357367.219171.1300974318806.JavaMail.root@zmbs4.inria.fr> <4D8BD02D.1010505@inria.fr> <4D8C73C8.6020801@inescporto.pt> <1301055903.8429.314.camel@thinkpad> <341494683.237537.1301057887481.JavaMail.root@zmbs4.inria.fr> <4D8C944A.9060601@inria.fr> <4D8CB859.9040709@inescporto.pt> <4D8CDDCC.4010000@ens-lyon.org> <4D8CEAA4.2030403@inescporto.pt> Content-Type: text/plain; charset="UTF-8" Date: Fri, 25 Mar 2011 21:26:58 +0100 Message-ID: <1301084818.8429.435.camel@thinkpad> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:Nd/Zwwht6P66GZkwAgzksOGjUB85tiYYcabjWzR6weL PrRYDKRJdQzvGWLpFHit6UCiYHLbTg55Hpr16Z6Ri6zADGipVp +aRBOlRLnHrSqwahOf2T+Q/a/3JGn/8vSUfjLI7v5+gDXkieuv O51ih7yQ7Dh35vQB+LU2gwNOQ0UI/+4J+JT9w8EbqTPyvfGOId jyAHGO/GMbuvN6DuJ1TLA== Subject: Re: [Caml-list] Efficient OCaml multicore -- roadmap? Am Freitag, den 25.03.2011, 19:19 +0000 schrieb Hugo Ferreira: > On 03/25/2011 06:24 PM, Martin Jambon wrote: > >> On 03/25/2011 01:10 PM, Fabrice Le Fessant wrote: > >>> Of course, sharing structured mutable data between threads will not be > >>> possible, but actually, it is a good thing if you want to write correct > >>> programs ;-) > > > > On 03/25/11 08:44, Hugo Ferreira replied: > >> I'll stick to my guns here. It simply makes solving certain problem > >> unfeasible. Point in case: I work on machine learning algorithms. I > >> use large data-structures that must be processed (altered) > >> in order to learn. Because these data-structures are large it become > >> impractical to copy this to a process every time I start off a new > >> "thread". > > > > The solution would be to use get/set via a message-passing interface. > > > > Cannot see how this works. Say I want to share a balanced binary tree. > Several processes/threads each take this tree and alter it by adding and > deleting elements. Each (new) tree is then further processed by other > processes/threads. > > How can get/set be used in this scenario? > > >> From my purely speculative perspective, it seems unavoidable that > > message-passing happens at some level in order to keep a shared data > > structure in sync between a large number of processors. In other words, > > any access to a shared data structure requires some physical copy no > > matter what the programming language makes it look like. > > > > I assume you are referring to multi-processing were memory is not shared > amongst CPU's, correct? This is quite normal nowadays when you have several CPUs in a (server) system. Each CPU gets its own cache, or even its own bank of RAM (e.g. all Opterons have this). And there is indeed message passing on the hardware level (HyperTransport, QuickPath, or Infiniband). For the software, it still looks as if memory was uniform (cache coherency), but under the hood messages are exchanged to get this effect (or even RAM accesses are routed over the CPU interconnect). The messages are only noticeable from software as speed degradation when you access RAM in the wrong way. E.g. you can see this when you read/modify/write the same memory cell in a loop from two threads running on two cores. This is a lot slower than if only a single core did this because the two cores constantly exchange messages and have to wait until the delivery of the message is done (this is also known as cache line bouncing). When you implement a message passing API for a high level language, the way to go is to provide message buffers in memory. When a thread delivers a message it just writes to the buffer. The real message, however, is sent by the hardware under the hood, and must be delivered until the reading thread synchronizes. The point is here, IMHO, that you exploit the features of the hardware the most when you do it in a way the hardware can deal best with. Thus a program written against the message passing API will finally be faster than one that uncritically assumes a uniform memory, and modifies memory in-place, and will finally suffer from cache line bouncing. For current standard server hardware there are usually not enough CPUs to get a big effect. But in a few years it will be very important (as it is for current supercomputers), maybe even for standard 128 core laptops in 2015 (just guessing). Gerd > > Hugo F. > > > > > > Martin > > > > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------