From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <nilekim@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: *
X-Spam-Status: No, score=1.9 required=5.0 tests=AWL,DNS_FROM_RFC_POST,
	HTML_10_20,HTML_MESSAGE,SPF_NEUTRAL autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id 99DE8BB84
	for <caml-list@yquem.inria.fr>; Mon, 14 Jul 2008 19:04:05 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgwBAKkje0jRVY65fGdsb2JhbACCcY56QwEBCwUCBgcRA5MqhHw
X-IronPort-AV: E=Sophos;i="4.30,360,1212357600"; 
   d="scan'208";a="27310902"
Received: from ti-out-0910.google.com ([209.85.142.185])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 14 Jul 2008 19:04:04 +0200
Received: by ti-out-0910.google.com with SMTP id d27so2535599tid.3
        for <caml-list@yquem.inria.fr>; Mon, 14 Jul 2008 10:04:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:received:received:message-id:date:from:sender
         :to:subject:in-reply-to:mime-version:content-type:references
         :x-google-sender-auth;
        bh=QyGtbGf2DjgIMnW+rMmWKaNkpBcOYkj4Tz7cZjUSJik=;
        b=enM6O5IxQs/yvyyYwLx/jBeAGycQ0tF1/eTQvtVU+iXIe/yhRg8Vsc/QDGHIxCKtIt
         SvLzLMAaedgqbu763gQO3i4akupHwLBpTBfJ5rLsZOMTqklEEnG/g9BPlf9Ibnw6q6FE
         3CIV4vPVxFypGxFKzX8u68R6adoJDMZczh8dc=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:sender:to:subject:in-reply-to:mime-version
         :content-type:references:x-google-sender-auth;
        b=B5XA/08mC9DzR9H08/GI2BF+PKrRsJyfBR9dnvwEtAUBcbMq6mwA++cswH7p+KKAiC
         QLsxa1NgW7lSfYByharL4VnQYmiRyI1H53d8P+2GlP1OJ2m7obb7GGxGUzcqBePsnimw
         V5/j45285NlHqyhIj3LDGDZUhtR9mwW5gDdUw=
Received: by 10.151.14.5 with SMTP id r5mr9089232ybi.230.1216055041176;
        Mon, 14 Jul 2008 10:04:01 -0700 (PDT)
Received: by 10.150.91.6 with HTTP; Mon, 14 Jul 2008 10:04:01 -0700 (PDT)
Message-ID: <2a1a1a0c0807141004p28ca6805hdf8f442840c61e7b@mail.gmail.com>
Date: Mon, 14 Jul 2008 13:04:01 -0400
From: "Mike Lin" <mikelin@mit.edu>
Sender: nilekim@gmail.com
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] thousands of CPU cores
In-Reply-To: <200807141308.23474.jon@ffconsultancy.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; 
	boundary="----=_Part_67148_17302882.1216055041128"
References: <c6c368760807092257v7a125fey534fbc6d5444d14@mail.gmail.com>
	 <200807101235.42602.jon@ffconsultancy.com>
	 <c6c368760807140432i2283a487q8f07416efc81e6be@mail.gmail.com>
	 <200807141308.23474.jon@ffconsultancy.com>
X-Google-Sender-Auth: 913ccd8d5c5be09f
X-Spam: no; 0.00; mikelin:01 openmp:01 tpl:01 ocaml:01 openmp:01 ocaml:01 allocations:01 allocations:01 tpl:01 wrote:01 wrote:01 heap:01 heap:01 caml-list:01 marshal:01 

------=_Part_67148_17302882.1216055041128
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Mon, Jul 14, 2008 at 8:08 AM, Jon Harrop <jon@ffconsultancy.com> wrote:

>
> Perhaps the parallel GC could enable support for things like OpenMP but I
> personally would rather see a shift to similar functionality to that of
> Microsoft's TPL because (I assume) it is better for parallel tree
> operations
> that are themselves more common in languages like OCaml.


OpenMP is really great for parallelizing tight loops in numerical code,
which is one scenario in which I'd agree shared memory is much better than
message passing, at least as far as it scales. I wish I had this for my
OCaml CRF and M^3 network code!

But for higher level, map/reduce type of stuffs, I really think message
passing tends to gets you there. In such applications I am usually
interested in distributing across a compute farm anyway, for both CPU and
memory requirements. I started with a lame homerolled fork+Marshal library,
then moved on to Gerd's RPC stuff, now finally I'm playing with ocamlp3l...

Incidentally, it occurs to me that when one is optimizing the kind of tight
numerical loops that can really benefit from shared memory, the FIRST step,
before parallelizing, is to do away with any heap allocations in the loop.
The following is not a serious proposal, but just to kick the idea around -
what is the feasibility of removing the global interpreter lock for segments
of code which perform no heap allocations? i.e. what besides the GC is
stopping us?

Mike

------=_Part_67148_17302882.1216055041128
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<br><br><div class="gmail_quote">On Mon, Jul 14, 2008 at 8:08 AM, Jon Harrop &lt;<a href="mailto:jon@ffconsultancy.com">jon@ffconsultancy.com</a>&gt; wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Perhaps the parallel GC could enable support for things like OpenMP but I<br>
personally would rather see a shift to similar functionality to that of<br>
Microsoft&#39;s TPL because (I assume) it is better for parallel tree operations<br>
that are themselves more common in languages like OCaml.</blockquote><div><br>OpenMP is really great for parallelizing tight loops in numerical code, which is one scenario in which I&#39;d agree shared memory is much better than message passing, at least as far as it scales. I wish I had this for my OCaml CRF and M^3 network code!<br>
<br>But for higher level, map/reduce type of stuffs, I really think message passing tends to gets you there. In such applications I am usually interested in distributing across a compute farm anyway, for both CPU and memory requirements. I started with a lame homerolled fork+Marshal library, then moved on to
Gerd&#39;s RPC stuff, now finally I&#39;m playing with ocamlp3l...<br><br>Incidentally, it occurs to me that when one is optimizing the kind of tight numerical loops that can really benefit from shared memory, the FIRST step, before parallelizing, is to do away with any heap allocations in the loop. The following is not a serious proposal, but just to kick the idea around - what is the feasibility of removing the global interpreter lock for segments of code which perform no heap allocations? i.e. what besides the GC is stopping us?<br>
<br>Mike</div></div><br>

------=_Part_67148_17302882.1216055041128--