From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jonathandeanharrop@googlemail.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id 34D37BC57
	for <caml-list@yquem.inria.fr>; Wed, 17 Nov 2010 04:42:11 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqMBANbd4kxKfVIukGdsb2JhbACVZIxWCBUBAQEBCQkMBxEDH6Y8i3oBBY4FAQSFSw
X-IronPort-AV: E=Sophos;i="4.59,209,1288566000"; 
   d="scan'208";a="79266827"
Received: from mail-ww0-f46.google.com ([74.125.82.46])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 17 Nov 2010 04:42:10 +0100
Received: by wwb29 with SMTP id 29so993674wwb.3
        for <caml-list@yquem.inria.fr>; Tue, 16 Nov 2010 19:42:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlemail.com; s=gamma;
        h=domainkey-signature:received:received:from:to:references
         :in-reply-to:subject:date:organization:message-id:mime-version
         :content-type:content-transfer-encoding:x-mailer:thread-index
         :content-language;
        bh=DgGyUoSLI9JwEKQKRlTb52K5J2Dqb+SjD7IppbpDu6M=;
        b=hbAyiRztnGxDMQlZHrU7v8/EkmXfuzQz7Glc/ibIUS/wcHYWF9KGY6zyG2d2Iwh6la
         7uDQWs01YZbwIV2atTmChuh19B1W1sPjeggqzg/oN4WS3td+35e1AfzQuquH4JQVfgnH
         BBmbhxRhqwlOEM+Ucejol3bB2GEFuxQ75p0RA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=from:to:references:in-reply-to:subject:date:organization:message-id
         :mime-version:content-type:content-transfer-encoding:x-mailer
         :thread-index:content-language;
        b=mQLD5tyHJnvzy+rrbGyUcbDib6Sk+V901w6vCqFidR/fDA1r18HxrAfI/zyPPEUuxS
         nZcTgJ1GyaMiYDhyrlSbRT/bf8ZwoMTcyjwU581O1XruN+m6N87NcVcXNL1nZ+vmXd56
         nHfNHU3XaRjYgOsi/n/Flzidmpn1VHRLoMYOY=
Received: by 10.227.128.211 with SMTP id l19mr8598301wbs.73.1289965329413;
        Tue, 16 Nov 2010 19:42:09 -0800 (PST)
Received: from WinEight ([87.112.168.151])
        by mx.google.com with ESMTPS id h29sm1268099wbc.9.2010.11.16.19.42.06
        (version=SSLv3 cipher=RC4-MD5);
        Tue, 16 Nov 2010 19:42:07 -0800 (PST)
From: Jon Harrop <jonathandeanharrop@googlemail.com>
To: "'Wolfgang Draxinger'" <wdraxinger.maillist@draxit.de>,
	<caml-list@yquem.inria.fr>
References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de>	<4CE228CA.3030503@gmail.com> <1289927042.16005.176.camel@thinkpad>	<AANLkTim2r0S7ZSmXFK87-u2PKRqvyuiBkH5rQfp7ju_C@mail.gmail.com>	<1289945605.16005.205.camel@thinkpad>	<AANLkTimgyL2xzUnuSzf-upSyjicffC-6-rgSK=EfxLTj@mail.gmail.com> <20101117005215.5dc5ebcd@narfi.yggdrasil.draxit.de>
In-Reply-To: <20101117005215.5dc5ebcd@narfi.yggdrasil.draxit.de>
Subject: RE: [Caml-list] SMP multithreading
Date: Wed, 17 Nov 2010 03:41:46 -0000
Organization: Flying Frog Consultancy
Message-ID: <02bd01cb8609$5d67ea30$1837be90$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AcuF6WipUtdUCtaHRlaEYSB9dGWEyQAFZnbg
Content-Language: en-gb
X-Spam: no; 0.00; ocaml:01 haskell:01 haskell:01 haskell's:01 granularity:01 computations:01 allocations:01 speedup:01 granularity:01 arrays:01 speedups:01 speedups:01 intel's:01 afaik:01 tocs:01 

Wolfgang wrote:
> I'd like to point out how the big competitor to OCaml deals with it.
> The GHC Haskell system has SMP parallization built in for some time,
> and it does it quite well.

I beg to differ. Upon trying to reproduce many of the Haskell community's
results, I found that even their own parallel Haskell programs often exhibit
huge slowdowns. This is because Haskell's unpredictable performance leads to
unpredictable granularity and, consequently, more time can be spent
administering the tiny parallel computations than is gained by doing so.

The results I found here are typical:

 
http://flyingfrogblog.blogspot.com/2010/01/naive-parallelism-with-hlvm.html

Note that the absolute performance peaks at an unpredictable number of cores
only in the case of Haskell. This is because the GC does not scale beyond
about 4 cores for any Haskell programs doing significant amounts of
allocation, which is basically all Haskell programs because allocations are
everywhere in Haskell.

Ultimately, running on all cores attains no speedup at all with Haskell in
that case. This was branded "the last core slowdown" but the slowdown
clearly started well before all 8 cores. There was a significant development
towards improving this situation but it won't fix the granularity problem:

  http://hackage.haskell.org/trac/ghc/blog/new-gc-preview

The paper "Regular, shape-polymorphic, parallel arrays in Haskell" cites
2.5x speedups when existing techniques were not only already getting 7x
speedups but better absolute performance as well. Cache complexity is the
problem, as I explained here:

 
http://flyingfrogblog.blogspot.com/2010/06/regular-shape-polymorphic-paralle
l.html

Probably the best solution for multicore programming is Cilk. This technique
has already been adopted both in Intel's TBB and Microsoft's .NET 4 but,
AFAIK, the only functional language with access to it is F#. There are some
great papers on multicore-friendly cache oblivious algorithms written in
Cilk:

  http://www.fftw.org/~athena/papers/tocs08.pdf

Note, in particular, that Cilk is not only much faster but also much easier
to use than explicit message passing.

To do something like this, threads need to be able to run in parallel and
mutate the same shared heap. Although that is objectively easy (I did it in
HLVM), OCaml's reliance upon very high allocation rates, efficient
collection of young garbage and a ridiculous density of pointers in the heap
make it a *lot* harder.

Cheers,
Jon.