From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dra-news@metastack.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id 3F424BC58
	for <caml-list@yquem.inria.fr>; Thu, 18 Nov 2010 14:09:27 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqIBALGU40xRZ90wkWdsb2JhbACiSxUBAQEBCQsKBxEDH8IfhUsEilg
X-IronPort-AV: E=Sophos;i="4.59,212,1288566000"; 
   d="scan'208";a="78812267"
Received: from mtaout02-winn.ispmail.ntl.com ([81.103.221.48])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 17 Nov 2010 17:40:53 +0100
Received: from aamtaout02-winn.ispmail.ntl.com ([81.103.221.35])
          by mtaout02-winn.ispmail.ntl.com
          (InterMail vM.7.08.04.00 201-2186-134-20080326) with ESMTP
          id <20101117164051.WWMY19887.mtaout02-winn.ispmail.ntl.com@aamtaout02-winn.ispmail.ntl.com>;
          Wed, 17 Nov 2010 16:40:51 +0000
Received: from romulus.metastack.com ([81.102.132.77])
          by aamtaout02-winn.ispmail.ntl.com
          (InterMail vG.3.00.04.00 201-2196-133-20080908) with ESMTP
          id <20101117164051.KCAU25842.aamtaout02-winn.ispmail.ntl.com@romulus.metastack.com>;
          Wed, 17 Nov 2010 16:40:51 +0000
Received: from remus.metastack.local ([172.16.0.1])
	by romulus.metastack.com (8.14.2/8.14.2) with ESMTP id oAHGejWe008483
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL);
	Wed, 17 Nov 2010 16:40:48 GMT
Received: from Remus.metastack.local ([fe80::547c:3c42:e1da:eda2]) by
 Remus.metastack.local ([fe80::547c:3c42:e1da:eda2%11]) with mapi; Wed, 17 Nov
 2010 16:38:16 +0000
From: David Allsopp <dra-news@metastack.com>
To: Edgar Friendly <thelema314@gmail.com>,
	"caml-list@yquem.inria.fr" <caml-list@yquem.inria.fr>
Subject: RE: [Caml-list] SMP multithreading
Thread-Topic: [Caml-list] SMP multithreading
Thread-Index: AQHLhOoYq1LgJa11S0q//Rjbgmk3e5NzqjAAgAHrceA=
Date: Wed, 17 Nov 2010 16:34:24 +0000
Message-ID: <E51C5B015DBD1348A1D85763337FB6D92C589A1C@Remus.metastack.local>
References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de>
 <4CE228CA.3030503@gmail.com>
In-Reply-To: <4CE228CA.3030503@gmail.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Organization: MetaStack Solutions Ltd.
X-Scanned-By: MIMEDefang 2.65 on 81.102.132.77
X-Cloudmark-Analysis: v=1.1 cv=JvdXmxIgLJv2/GthKqHpGJEEHukvLcvELVXUanXFreg= c=1 sm=0 a=kj9zAlcOel0A:10 a=xqWC_Br6kY4A:10 a=B6GvL0XKAAAA:8 a=Xpk_1uknRacBPExuutIA:9 a=6p_KJRZWnaYtcRSNMrUA:7 a=YWtDuL8ongooHgQLsCceVVeYy24A:4 a=CjuIK1q_8ugA:10 a=N_V6cWVp-xMA:10 a=JQlr2Kh_s0qRLTfG:21 a=N9U90A8CQYqzjb12:21 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117
X-Spam: no; 0.00; ocaml:01 parallelism:01 2048:01 hpc:01 scalable:01 hpc:01 ocaml:01 parallelism:01 cmxa:01 edgar:98 branded:98 threads:01 threads:01 garbage:01 wrote:01 

Edgar Friendly wrote:
> It looks like high-performance computing of the near future will be built
> out of many machines (message passing), each with many cores (SMP).  One
> could use message passing for all communication in such a system, but a
> hybrid approach might be best for this architecture, with use of shared
> memory within each box and message passing between.  Of course the best
> choice depends strongly on the particular task.

Absolutely - and the problem in OCaml seems to be that shared memory parall=
elism is just branded as evil and ignored...

> In the long run, it'll likely be a combination of a few large, powerful
> cores (Intel-CPU style w/ the capability to run a single thread as fast a=
s
> possible) with many many smaller compute engines (GPGPUs or the like,
> optimized for power and area, closely coupled with memory) that provides
> the highest performance density.

I think the central thing that we can be utterly sure about is that desktop=
s will always have *> 1* general purpose CPU. Maybe not be an ever-increasi=
ng number of cores but definitely more than one.

> The question of how to program such an architecture seems as if it's bein=
g
> answered without the functional community's input. What can we contribute=
?

It has often seemed to me when SMP has been discussed in the past on this l=
ist that it almost gets dismissed out of hand because it doesn't look futur=
e-proof or because we're worried about what's round the corner in technolog=
y terms.

To me the principal question is not about whether a parallel/thread-safe GC=
 will scale to 12, 16 or even the 2048 cores on something like http://www.h=
pc.cam.ac.uk/services/darwin.html but whether it will hurt a single-threade=
d application - i.e. whether you will still be able to implement message pa=
ssing libraries and other scalable techniques without the parallel GC getti=
ng in the way of what you're doing. A parallel/thread-safe GC should be aim=
ing to provide the same sort of contract as the present one - it "just work=
s" for most things and in a few borderline cases (like HPC - yes, it's a bo=
rderline case) you'll need to tune your code or tweak GC parameters because=
 it's causing some problems or because in your particular application squee=
zing every cycle out of the CPU is important. As long as the GC isn't (huge=
ly) slower than the present one in OCaml then we can continue to use librar=
ies, frameworks and technologies-still-to-come on top of a parallel/thread-=
safe GC which simply ignores shared memory thread-level parallelism just by=
 not instantiating threads.

The argument always seems to focus on utterly maxing out all possible avail=
able resources (CPU time, memory bandwidth, etc.) rather than on whether it=
's simply faster than what we're doing able to do at the moment on the same=
 system. Of course, it may be that the only way to do that is to have diffe=
rent garbage collectors - one invoked when threads.cmxa is linked and then =
the normal one otherwise (that's so easy to type out as a sentence, summari=
sing a vast amount of potential work!!)

Multithreading in OCaml seems to be focused on jumping the entire width of =
the river of concurrency in one go, rather than coming up with stepping sto=
nes to cross it in bits...


David