From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 3F424BC58 for ; Thu, 18 Nov 2010 14:09:27 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIBALGU40xRZ90wkWdsb2JhbACiSxUBAQEBCQsKBxEDH8IfhUsEilg X-IronPort-AV: E=Sophos;i="4.59,212,1288566000"; d="scan'208";a="78812267" Received: from mtaout02-winn.ispmail.ntl.com ([81.103.221.48]) by mail4-smtp-sop.national.inria.fr with ESMTP; 17 Nov 2010 17:40:53 +0100 Received: from aamtaout02-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com (InterMail vM.7.08.04.00 201-2186-134-20080326) with ESMTP id <20101117164051.WWMY19887.mtaout02-winn.ispmail.ntl.com@aamtaout02-winn.ispmail.ntl.com>; Wed, 17 Nov 2010 16:40:51 +0000 Received: from romulus.metastack.com ([81.102.132.77]) by aamtaout02-winn.ispmail.ntl.com (InterMail vG.3.00.04.00 201-2196-133-20080908) with ESMTP id <20101117164051.KCAU25842.aamtaout02-winn.ispmail.ntl.com@romulus.metastack.com>; Wed, 17 Nov 2010 16:40:51 +0000 Received: from remus.metastack.local ([172.16.0.1]) by romulus.metastack.com (8.14.2/8.14.2) with ESMTP id oAHGejWe008483 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 17 Nov 2010 16:40:48 GMT Received: from Remus.metastack.local ([fe80::547c:3c42:e1da:eda2]) by Remus.metastack.local ([fe80::547c:3c42:e1da:eda2%11]) with mapi; Wed, 17 Nov 2010 16:38:16 +0000 From: David Allsopp To: Edgar Friendly , "caml-list@yquem.inria.fr" Subject: RE: [Caml-list] SMP multithreading Thread-Topic: [Caml-list] SMP multithreading Thread-Index: AQHLhOoYq1LgJa11S0q//Rjbgmk3e5NzqjAAgAHrceA= Date: Wed, 17 Nov 2010 16:34:24 +0000 Message-ID: References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de> <4CE228CA.3030503@gmail.com> In-Reply-To: <4CE228CA.3030503@gmail.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Organization: MetaStack Solutions Ltd. X-Scanned-By: MIMEDefang 2.65 on 81.102.132.77 X-Cloudmark-Analysis: v=1.1 cv=JvdXmxIgLJv2/GthKqHpGJEEHukvLcvELVXUanXFreg= c=1 sm=0 a=kj9zAlcOel0A:10 a=xqWC_Br6kY4A:10 a=B6GvL0XKAAAA:8 a=Xpk_1uknRacBPExuutIA:9 a=6p_KJRZWnaYtcRSNMrUA:7 a=YWtDuL8ongooHgQLsCceVVeYy24A:4 a=CjuIK1q_8ugA:10 a=N_V6cWVp-xMA:10 a=JQlr2Kh_s0qRLTfG:21 a=N9U90A8CQYqzjb12:21 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 X-Spam: no; 0.00; ocaml:01 parallelism:01 2048:01 hpc:01 scalable:01 hpc:01 ocaml:01 parallelism:01 cmxa:01 edgar:98 branded:98 threads:01 threads:01 garbage:01 wrote:01 Edgar Friendly wrote: > It looks like high-performance computing of the near future will be built > out of many machines (message passing), each with many cores (SMP). One > could use message passing for all communication in such a system, but a > hybrid approach might be best for this architecture, with use of shared > memory within each box and message passing between. Of course the best > choice depends strongly on the particular task. Absolutely - and the problem in OCaml seems to be that shared memory parall= elism is just branded as evil and ignored... > In the long run, it'll likely be a combination of a few large, powerful > cores (Intel-CPU style w/ the capability to run a single thread as fast a= s > possible) with many many smaller compute engines (GPGPUs or the like, > optimized for power and area, closely coupled with memory) that provides > the highest performance density. I think the central thing that we can be utterly sure about is that desktop= s will always have *> 1* general purpose CPU. Maybe not be an ever-increasi= ng number of cores but definitely more than one. > The question of how to program such an architecture seems as if it's bein= g > answered without the functional community's input. What can we contribute= ? It has often seemed to me when SMP has been discussed in the past on this l= ist that it almost gets dismissed out of hand because it doesn't look futur= e-proof or because we're worried about what's round the corner in technolog= y terms. To me the principal question is not about whether a parallel/thread-safe GC= will scale to 12, 16 or even the 2048 cores on something like http://www.h= pc.cam.ac.uk/services/darwin.html but whether it will hurt a single-threade= d application - i.e. whether you will still be able to implement message pa= ssing libraries and other scalable techniques without the parallel GC getti= ng in the way of what you're doing. A parallel/thread-safe GC should be aim= ing to provide the same sort of contract as the present one - it "just work= s" for most things and in a few borderline cases (like HPC - yes, it's a bo= rderline case) you'll need to tune your code or tweak GC parameters because= it's causing some problems or because in your particular application squee= zing every cycle out of the CPU is important. As long as the GC isn't (huge= ly) slower than the present one in OCaml then we can continue to use librar= ies, frameworks and technologies-still-to-come on top of a parallel/thread-= safe GC which simply ignores shared memory thread-level parallelism just by= not instantiating threads. The argument always seems to focus on utterly maxing out all possible avail= able resources (CPU time, memory bandwidth, etc.) rather than on whether it= 's simply faster than what we're doing able to do at the moment on the same= system. Of course, it may be that the only way to do that is to have diffe= rent garbage collectors - one invoked when threads.cmxa is linked and then = the normal one otherwise (that's so easy to type out as a sentence, summari= sing a vast amount of potential work!!) Multithreading in OCaml seems to be focused on jumping the entire width of = the river of concurrency in one go, rather than coming up with stepping sto= nes to cross it in bits... David