From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p2Q9Bnwt026492 for ; Sat, 26 Mar 2011 10:11:49 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcDAFKsjU3VhjEVd2dsb2JhbAAmhB+hHBQBDAoKCRQliGupFZBbgSeBTYF+dwSQSQY X-IronPort-AV: E=Sophos;i="4.63,247,1299452400"; d="scan'208";a="91256630" Received: from ihsmtp01cons.lis.interhost.com ([213.134.49.21]) by mail4-smtp-sop.national.inria.fr with ESMTP; 26 Mar 2011 10:11:43 +0100 Received: from [192.168.1.64] ([178.166.9.223]) by ihsmtp01cons.lis.interhost.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 26 Mar 2011 09:11:42 +0000 Message-ID: <4D8DADC9.4010508@inescporto.pt> Date: Sat, 26 Mar 2011 09:11:37 +0000 From: Hugo Ferreira User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: Gerd Stolpmann CC: Martin Jambon , caml-list@inria.fr References: <2054357367.219171.1300974318806.JavaMail.root@zmbs4.inria.fr> <4D8BD02D.1010505@inria.fr> <4D8C73C8.6020801@inescporto.pt> <1301055903.8429.314.camel@thinkpad> <341494683.237537.1301057887481.JavaMail.root@zmbs4.inria.fr> <4D8C944A.9060601@inria.fr> <4D8CB859.9040709@inescporto.pt> <4D8CDDCC.4010000@ens-lyon.org> <4D8CEAA4.2030403@inescporto.pt> <1301084818.8429.435.camel@thinkpad> In-Reply-To: <1301084818.8429.435.camel@thinkpad> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 26 Mar 2011 09:11:42.0435 (UTC) FILETIME=[D27E7B30:01CBEB95] Subject: Re: [Caml-list] Efficient OCaml multicore -- roadmap? On 03/25/2011 08:26 PM, Gerd Stolpmann wrote: > Am Freitag, den 25.03.2011, 19:19 +0000 schrieb Hugo Ferreira: >> On 03/25/2011 06:24 PM, Martin Jambon wrote: >>>> On 03/25/2011 01:10 PM, Fabrice Le Fessant wrote: >>>>> Of course, sharing structured mutable data between threads will not be >>>>> possible, but actually, it is a good thing if you want to write correct >>>>> programs ;-) >>> >>> On 03/25/11 08:44, Hugo Ferreira replied: >>>> I'll stick to my guns here. It simply makes solving certain problem >>>> unfeasible. Point in case: I work on machine learning algorithms. I >>>> use large data-structures that must be processed (altered) >>>> in order to learn. Because these data-structures are large it become >>>> impractical to copy this to a process every time I start off a new >>>> "thread". >>> >>> The solution would be to use get/set via a message-passing interface. >>> >> >> Cannot see how this works. Say I want to share a balanced binary tree. >> Several processes/threads each take this tree and alter it by adding and >> deleting elements. Each (new) tree is then further processed by other >> processes/threads. >> >> How can get/set be used in this scenario? >> >>>> From my purely speculative perspective, it seems unavoidable that >>> message-passing happens at some level in order to keep a shared data >>> structure in sync between a large number of processors. In other words, >>> any access to a shared data structure requires some physical copy no >>> matter what the programming language makes it look like. >>> >> >> I assume you are referring to multi-processing were memory is not shared >> amongst CPU's, correct? > > This is quite normal nowadays when you have several CPUs in a (server) > system. Each CPU gets its own cache, or even its own bank of RAM (e.g. > all Opterons have this). And there is indeed message passing on the > hardware level (HyperTransport, QuickPath, or Infiniband). For the > software, it still looks as if memory was uniform (cache coherency), but > under the hood messages are exchanged to get this effect (or even RAM > accesses are routed over the CPU interconnect). > > The messages are only noticeable from software as speed degradation when > you access RAM in the wrong way. E.g. you can see this when you > read/modify/write the same memory cell in a loop from two threads > running on two cores. This is a lot slower than if only a single core > did this because the two cores constantly exchange messages and have to > wait until the delivery of the message is done (this is also known as > cache line bouncing). > > When you implement a message passing API for a high level language, the > way to go is to provide message buffers in memory. When a thread > delivers a message it just writes to the buffer. The real message, > however, is sent by the hardware under the hood, and must be delivered > until the reading thread synchronizes. The point is here, IMHO, that you > exploit the features of the hardware the most when you do it in a way > the hardware can deal best with. Thus a program written against the > message passing API will finally be faster than one that uncritically > assumes a uniform memory, and modifies memory in-place, and will finally > suffer from cache line bouncing. > > For current standard server hardware there are usually not enough CPUs > to get a big effect. But in a few years it will be very important (as it > is for current supercomputers), maybe even for standard 128 core laptops > in 2015 (just guessing). > Thanks for the info. Hugo > Gerd > > >> >> Hugo F. >> >> >>> >>> Martin >>> >> >> > >