From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <examachine@gmail.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id 2B658BC57
	for <caml-list@yquem.inria.fr>; Fri, 19 Nov 2010 14:57:02 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AoMDAHMR5kxKfVM0k2dsb2JhbACEb485hikBhzZLCBUBAQIJCQoJEQMfon6JYoIYhSsuiFkBAQMFgweCPwSBXIh/
X-IronPort-AV: E=Sophos;i="4.59,223,1288566000"; 
   d="scan'208";a="80326180"
Received: from mail-gw0-f52.google.com ([74.125.83.52])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 19 Nov 2010 14:57:01 +0100
Received: by gwb20 with SMTP id 20so314013gwb.39
        for <caml-list@yquem.inria.fr>; Fri, 19 Nov 2010 05:57:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:mime-version:received:received:in-reply-to
         :references:date:message-id:subject:from:to:cc:content-type;
        bh=DjZVhhWhnA+JOIsNk+ZgXWH0NSwZ2Y0RTTKkD84zvLk=;
        b=NpyZ+Zdt114dH8bXSGORNCus38kjvKEiiGPzM8Nm0ugJphBKV9fE7180sN/ET4nDgB
         CHCjnGU3m7XtXkHwkz51N/4kdboKLfHh/t1rhibgFlTiA66JYR1FligZWOTPMSP457xH
         TDUksNfBa8bAZqsLEw/67NVD02c9oteAgiG78=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type;
        b=dwdXSo8ihWoZ7dKM/JrIgS+LiYrAS7ahzJ6w71yBa3wTDSpzbr6C9qCULeJNP22mtl
         RckdS65bgyz2Y8lLIb+Keg0rXXJjzaeGN3r+lbB39OkfHjcFGCbG4eTZ5di8yCA6a6xZ
         ezV1oELgus+Lsv9lYdQ8s1rKR+sHsQueoQfW0=
MIME-Version: 1.0
Received: by 10.90.227.2 with SMTP id z2mr1193971agg.32.1290175020137; Fri, 19
 Nov 2010 05:57:00 -0800 (PST)
Received: by 10.91.154.3 with HTTP; Fri, 19 Nov 2010 05:57:00 -0800 (PST)
In-Reply-To: <E51C5B015DBD1348A1D85763337FB6D92C589A1C@Remus.metastack.local>
References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de>
	<4CE228CA.3030503@gmail.com>
	<E51C5B015DBD1348A1D85763337FB6D92C589A1C@Remus.metastack.local>
Date: Fri, 19 Nov 2010 15:57:00 +0200
Message-ID: <AANLkTi=eoXgkidBKZHgzVYP-VtkAbeXcagvb5-4uU-Dw@mail.gmail.com>
Subject: Re: [Caml-list] SMP multithreading
From: Eray Ozkural <examachine@gmail.com>
To: David Allsopp <dra-news@metastack.com>
Cc: Edgar Friendly <thelema314@gmail.com>,
	"caml-list@yquem.inria.fr" <caml-list@yquem.inria.fr>
Content-Type: multipart/alternative; boundary=00163630fd87c5fd8704956847bb
X-Spam: no; 0.00; eray:01 ozkural:01 runtime:01 runtime:01 cheers:01 eray:01 ocaml:01 parallelism:01 2048:01 hpc:01 scalable:01 hpc:01 ocaml:01 parallelism:01 cmxa:01 

--00163630fd87c5fd8704956847bb
Content-Type: text/plain; charset=ISO-8859-1

There seem to be solutions in theory. I think a colleague had pointed out
one of the papers below, so there is indeed something like a "lock-free
garbage collector". Then, why do we worry about much synchronization
overhead? I don't quite understand.

Maurice Herlihy<http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/h/Herlihy:Maurice.html>,
J. Eliot B. Moss: Lock-Free Garbage Collection for Multiprocessors. IEEE
Trans. Parallel Distrib. Syst.
3<http://www.informatik.uni-trier.de/~ley/db/journals/tpds/tpds3.html#HerlihyM92>(3):
304-311 (1992)
http://doi.ieeecomputersociety.org/10.1109/71.139204

Hui Gao, Jan Friso
Groote<http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/g/Groote:Jan_Friso.html>
, Wim H. Hesselink<http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/h/Hesselink:Wim_H=.html>:
Lock-free parallel and concurrent garbage collection by mark&sweep. Sci.
Comput. Program.
64<http://www.informatik.uni-trier.de/~ley/db/journals/scp/scp64.html#GaoGH07>(3):
341-374 (2007)
http://portal.acm.org/citation.cfm?id=1223239

Java's new garbage collector is lock-free? At any rate, we really needn't
fall behind a mega-lame language like Java :) The first paper is from 1992,
enough time for the knowledge to diffuse. The second 2007 paper is probably
what Jon was referring to earlier.

In my mind, you can use one of these, and use special pool allocation
algorithms for small objects, and also use static lifetime analysis to
bypass the garbage collection in many cases. Since there are many runtime
designers here, I wonder, is there a language runtime that does all three of
these?

Cheers,

Eray
  <http://doi.ieeecomputersociety.org/10.1109/71.139204>

On Wed, Nov 17, 2010 at 6:34 PM, David Allsopp <dra-news@metastack.com>wrote:

> Edgar Friendly wrote:
> > It looks like high-performance computing of the near future will be built
> > out of many machines (message passing), each with many cores (SMP).  One
> > could use message passing for all communication in such a system, but a
> > hybrid approach might be best for this architecture, with use of shared
> > memory within each box and message passing between.  Of course the best
> > choice depends strongly on the particular task.
>
> Absolutely - and the problem in OCaml seems to be that shared memory
> parallelism is just branded as evil and ignored...
>
> > In the long run, it'll likely be a combination of a few large, powerful
> > cores (Intel-CPU style w/ the capability to run a single thread as fast
> as
> > possible) with many many smaller compute engines (GPGPUs or the like,
> > optimized for power and area, closely coupled with memory) that provides
> > the highest performance density.
>
> I think the central thing that we can be utterly sure about is that
> desktops will always have *> 1* general purpose CPU. Maybe not be an
> ever-increasing number of cores but definitely more than one.
>
> > The question of how to program such an architecture seems as if it's
> being
> > answered without the functional community's input. What can we
> contribute?
>
> It has often seemed to me when SMP has been discussed in the past on this
> list that it almost gets dismissed out of hand because it doesn't look
> future-proof or because we're worried about what's round the corner in
> technology terms.
>
> To me the principal question is not about whether a parallel/thread-safe GC
> will scale to 12, 16 or even the 2048 cores on something like
> http://www.hpc.cam.ac.uk/services/darwin.html but whether it will hurt a
> single-threaded application - i.e. whether you will still be able to
> implement message passing libraries and other scalable techniques without
> the parallel GC getting in the way of what you're doing. A
> parallel/thread-safe GC should be aiming to provide the same sort of
> contract as the present one - it "just works" for most things and in a few
> borderline cases (like HPC - yes, it's a borderline case) you'll need to
> tune your code or tweak GC parameters because it's causing some problems or
> because in your particular application squeezing every cycle out of the CPU
> is important. As long as the GC isn't (hugely) slower than the present one
> in OCaml then we can continue to use libraries, frameworks and
> technologies-still-to-come on top of a parallel/thread-safe GC which simply
> ignores shared memory thread-level parallelism just by not instantiating
> threads.
>
> The argument always seems to focus on utterly maxing out all possible
> available resources (CPU time, memory bandwidth, etc.) rather than on
> whether it's simply faster than what we're doing able to do at the moment on
> the same system. Of course, it may be that the only way to do that is to
> have different garbage collectors - one invoked when threads.cmxa is linked
> and then the normal one otherwise (that's so easy to type out as a sentence,
> summarising a vast amount of potential work!!)
>
> Multithreading in OCaml seems to be focused on jumping the entire width of
> the river of concurrency in one go, rather than coming up with stepping
> stones to cross it in bits...
>
>
> David
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

--00163630fd87c5fd8704956847bb
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

There seem to be solutions in theory. I think a colleague had pointed out o=
ne of the papers below, so there is indeed something like a &quot;lock-free=
 garbage collector&quot;. Then, why do we worry about much synchronization =
overhead? I don&#39;t quite understand.<div>
<br></div><div><span class=3D"Apple-style-span" style=3D"font-family: Times=
; font-size: medium; -webkit-border-horizontal-spacing: 2px; -webkit-border=
-vertical-spacing: 2px; "><a href=3D"http://www.informatik.uni-trier.de/~le=
y/db/indices/a-tree/h/Herlihy:Maurice.html" style=3D"color: rgb(0, 0, 0); "=
>Maurice Herlihy</a>, J. Eliot B. Moss: Lock-Free Garbage Collection for Mu=
ltiprocessors.=A0<a href=3D"http://www.informatik.uni-trier.de/~ley/db/jour=
nals/tpds/tpds3.html#HerlihyM92" style=3D"color: rgb(0, 0, 0); ">IEEE Trans=
. Parallel Distrib. Syst. 3</a>(3): 304-311 (1992)</span></div>
<div><div><a href=3D"http://doi.ieeecomputersociety.org/10.1109/71.139204">=
http://doi.ieeecomputersociety.org/10.1109/71.139204</a></div><div><br></di=
v><div><span class=3D"Apple-style-span" style=3D"font-family: Times; font-s=
ize: medium; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertica=
l-spacing: 2px; ">Hui Gao,=A0<a href=3D"http://www.informatik.uni-trier.de/=
~ley/db/indices/a-tree/g/Groote:Jan_Friso.html" style=3D"color: rgb(0, 0, 0=
); ">Jan Friso Groote</a>,=A0<a href=3D"http://www.informatik.uni-trier.de/=
~ley/db/indices/a-tree/h/Hesselink:Wim_H=3D.html" style=3D"color: rgb(0, 0,=
 0); ">Wim H. Hesselink</a>: Lock-free parallel and concurrent garbage coll=
ection by mark&amp;sweep.=A0<a href=3D"http://www.informatik.uni-trier.de/~=
ley/db/journals/scp/scp64.html#GaoGH07" style=3D"color: rgb(0, 0, 0); ">Sci=
. Comput. Program. 64</a>(3): 341-374 (2007)=A0</span></div>
<div><a href=3D"http://portal.acm.org/citation.cfm?id=3D1223239">http://por=
tal.acm.org/citation.cfm?id=3D1223239</a></div><div><br></div><div>Java&#39=
;s new garbage collector is lock-free? At any rate, we really needn&#39;t f=
all behind a mega-lame language like Java :) The first paper is from 1992, =
enough time for the knowledge to diffuse. The second 2007 paper is probably=
 what Jon was referring to earlier.</div>
<div><br></div><div>In my mind, you can use one of these, and use special p=
ool allocation algorithms for small objects, and also use static lifetime a=
nalysis to bypass the garbage collection in many cases. Since there are man=
y runtime designers here, I wonder, is there a language runtime that does a=
ll three of these?</div>
<div><br></div><div>Cheers,</div><div><br></div><div>Eray</div><div><span c=
lass=3D"Apple-style-span" style=3D"font-family: Verdana, Arial, Helvetica, =
sans-serif; ">=A0</span><a href=3D"http://doi.ieeecomputersociety.org/10.11=
09/71.139204"></a></div>
<div><br><div class=3D"gmail_quote">On Wed, Nov 17, 2010 at 6:34 PM, David =
Allsopp <span dir=3D"ltr">&lt;<a href=3D"mailto:dra-news@metastack.com">dra=
-news@metastack.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;=
">
<div class=3D"im">Edgar Friendly wrote:<br>
&gt; It looks like high-performance computing of the near future will be bu=
ilt<br>
&gt; out of many machines (message passing), each with many cores (SMP). =
=A0One<br>
&gt; could use message passing for all communication in such a system, but =
a<br>
&gt; hybrid approach might be best for this architecture, with use of share=
d<br>
&gt; memory within each box and message passing between. =A0Of course the b=
est<br>
&gt; choice depends strongly on the particular task.<br>
<br>
</div>Absolutely - and the problem in OCaml seems to be that shared memory =
parallelism is just branded as evil and ignored...<br>
<div class=3D"im"><br>
&gt; In the long run, it&#39;ll likely be a combination of a few large, pow=
erful<br>
&gt; cores (Intel-CPU style w/ the capability to run a single thread as fas=
t as<br>
&gt; possible) with many many smaller compute engines (GPGPUs or the like,<=
br>
&gt; optimized for power and area, closely coupled with memory) that provid=
es<br>
&gt; the highest performance density.<br>
<br>
</div>I think the central thing that we can be utterly sure about is that d=
esktops will always have *&gt; 1* general purpose CPU. Maybe not be an ever=
-increasing number of cores but definitely more than one.<br>
<div class=3D"im"><br>
&gt; The question of how to program such an architecture seems as if it&#39=
;s being<br>
&gt; answered without the functional community&#39;s input. What can we con=
tribute?<br>
<br>
</div>It has often seemed to me when SMP has been discussed in the past on =
this list that it almost gets dismissed out of hand because it doesn&#39;t =
look future-proof or because we&#39;re worried about what&#39;s round the c=
orner in technology terms.<br>

<br>
To me the principal question is not about whether a parallel/thread-safe GC=
 will scale to 12, 16 or even the 2048 cores on something like <a href=3D"h=
ttp://www.hpc.cam.ac.uk/services/darwin.html" target=3D"_blank">http://www.=
hpc.cam.ac.uk/services/darwin.html</a> but whether it will hurt a single-th=
readed application - i.e. whether you will still be able to implement messa=
ge passing libraries and other scalable techniques without the parallel GC =
getting in the way of what you&#39;re doing. A parallel/thread-safe GC shou=
ld be aiming to provide the same sort of contract as the present one - it &=
quot;just works&quot; for most things and in a few borderline cases (like H=
PC - yes, it&#39;s a borderline case) you&#39;ll need to tune your code or =
tweak GC parameters because it&#39;s causing some problems or because in yo=
ur particular application squeezing every cycle out of the CPU is important=
. As long as the GC isn&#39;t (hugely) slower than the present one in OCaml=
 then we can continue to use libraries, frameworks and technologies-still-t=
o-come on top of a parallel/thread-safe GC which simply ignores shared memo=
ry thread-level parallelism just by not instantiating threads.<br>

<br>
The argument always seems to focus on utterly maxing out all possible avail=
able resources (CPU time, memory bandwidth, etc.) rather than on whether it=
&#39;s simply faster than what we&#39;re doing able to do at the moment on =
the same system. Of course, it may be that the only way to do that is to ha=
ve different garbage collectors - one invoked when threads.cmxa is linked a=
nd then the normal one otherwise (that&#39;s so easy to type out as a sente=
nce, summarising a vast amount of potential work!!)<br>

<br>
Multithreading in OCaml seems to be focused on jumping the entire width of =
the river of concurrency in one go, rather than coming up with stepping sto=
nes to cross it in bits...<br>
<font color=3D"#888888"><br>
<br>
David<br>
</font><div><div></div><div class=3D"h5"><br>
_______________________________________________<br>
Caml-list mailing list. Subscription management:<br>
<a href=3D"http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list" target=
=3D"_blank">http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list</a><br=
>
Archives: <a href=3D"http://caml.inria.fr" target=3D"_blank">http://caml.in=
ria.fr</a><br>
Beginner&#39;s list: <a href=3D"http://groups.yahoo.com/group/ocaml_beginne=
rs" target=3D"_blank">http://groups.yahoo.com/group/ocaml_beginners</a><br>
Bug reports: <a href=3D"http://caml.inria.fr/bin/caml-bugs" target=3D"_blan=
k">http://caml.inria.fr/bin/caml-bugs</a><br>
</div></div></blockquote></div><br><br clear=3D"all"><br>-- <br>Eray Ozkura=
l, PhD candidate.=A0 Comp. Sci. Dept., Bilkent University, Ankara<br><a hre=
f=3D"http://groups.yahoo.com/group/ai-philosophy">http://groups.yahoo.com/g=
roup/ai-philosophy</a><br>
<a href=3D"http://myspace.com/arizanesil">http://myspace.com/arizanesil</a>=
 <a href=3D"http://myspace.com/malfunct">http://myspace.com/malfunct</a><br=
><br>
</div></div>

--00163630fd87c5fd8704956847bb--