From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 34D37BC57 for ; Wed, 17 Nov 2010 04:42:11 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqMBANbd4kxKfVIukGdsb2JhbACVZIxWCBUBAQEBCQkMBxEDH6Y8i3oBBY4FAQSFSw X-IronPort-AV: E=Sophos;i="4.59,209,1288566000"; d="scan'208";a="79266827" Received: from mail-ww0-f46.google.com ([74.125.82.46]) by mail2-smtp-roc.national.inria.fr with ESMTP; 17 Nov 2010 04:42:10 +0100 Received: by wwb29 with SMTP id 29so993674wwb.3 for ; Tue, 16 Nov 2010 19:42:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:from:to:references :in-reply-to:subject:date:organization:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; bh=DgGyUoSLI9JwEKQKRlTb52K5J2Dqb+SjD7IppbpDu6M=; b=hbAyiRztnGxDMQlZHrU7v8/EkmXfuzQz7Glc/ibIUS/wcHYWF9KGY6zyG2d2Iwh6la 7uDQWs01YZbwIV2atTmChuh19B1W1sPjeggqzg/oN4WS3td+35e1AfzQuquH4JQVfgnH BBmbhxRhqwlOEM+Ucejol3bB2GEFuxQ75p0RA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:organization:message-id :mime-version:content-type:content-transfer-encoding:x-mailer :thread-index:content-language; b=mQLD5tyHJnvzy+rrbGyUcbDib6Sk+V901w6vCqFidR/fDA1r18HxrAfI/zyPPEUuxS nZcTgJ1GyaMiYDhyrlSbRT/bf8ZwoMTcyjwU581O1XruN+m6N87NcVcXNL1nZ+vmXd56 nHfNHU3XaRjYgOsi/n/Flzidmpn1VHRLoMYOY= Received: by 10.227.128.211 with SMTP id l19mr8598301wbs.73.1289965329413; Tue, 16 Nov 2010 19:42:09 -0800 (PST) Received: from WinEight ([87.112.168.151]) by mx.google.com with ESMTPS id h29sm1268099wbc.9.2010.11.16.19.42.06 (version=SSLv3 cipher=RC4-MD5); Tue, 16 Nov 2010 19:42:07 -0800 (PST) From: Jon Harrop To: "'Wolfgang Draxinger'" , References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de> <4CE228CA.3030503@gmail.com> <1289927042.16005.176.camel@thinkpad> <1289945605.16005.205.camel@thinkpad> <20101117005215.5dc5ebcd@narfi.yggdrasil.draxit.de> In-Reply-To: <20101117005215.5dc5ebcd@narfi.yggdrasil.draxit.de> Subject: RE: [Caml-list] SMP multithreading Date: Wed, 17 Nov 2010 03:41:46 -0000 Organization: Flying Frog Consultancy Message-ID: <02bd01cb8609$5d67ea30$1837be90$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcuF6WipUtdUCtaHRlaEYSB9dGWEyQAFZnbg Content-Language: en-gb X-Spam: no; 0.00; ocaml:01 haskell:01 haskell:01 haskell's:01 granularity:01 computations:01 allocations:01 speedup:01 granularity:01 arrays:01 speedups:01 speedups:01 intel's:01 afaik:01 tocs:01 Wolfgang wrote: > I'd like to point out how the big competitor to OCaml deals with it. > The GHC Haskell system has SMP parallization built in for some time, > and it does it quite well. I beg to differ. Upon trying to reproduce many of the Haskell community's results, I found that even their own parallel Haskell programs often exhibit huge slowdowns. This is because Haskell's unpredictable performance leads to unpredictable granularity and, consequently, more time can be spent administering the tiny parallel computations than is gained by doing so. The results I found here are typical: http://flyingfrogblog.blogspot.com/2010/01/naive-parallelism-with-hlvm.html Note that the absolute performance peaks at an unpredictable number of cores only in the case of Haskell. This is because the GC does not scale beyond about 4 cores for any Haskell programs doing significant amounts of allocation, which is basically all Haskell programs because allocations are everywhere in Haskell. Ultimately, running on all cores attains no speedup at all with Haskell in that case. This was branded "the last core slowdown" but the slowdown clearly started well before all 8 cores. There was a significant development towards improving this situation but it won't fix the granularity problem: http://hackage.haskell.org/trac/ghc/blog/new-gc-preview The paper "Regular, shape-polymorphic, parallel arrays in Haskell" cites 2.5x speedups when existing techniques were not only already getting 7x speedups but better absolute performance as well. Cache complexity is the problem, as I explained here: http://flyingfrogblog.blogspot.com/2010/06/regular-shape-polymorphic-paralle l.html Probably the best solution for multicore programming is Cilk. This technique has already been adopted both in Intel's TBB and Microsoft's .NET 4 but, AFAIK, the only functional language with access to it is F#. There are some great papers on multicore-friendly cache oblivious algorithms written in Cilk: http://www.fftw.org/~athena/papers/tocs08.pdf Note, in particular, that Cilk is not only much faster but also much easier to use than explicit message passing. To do something like this, threads need to be able to run in parallel and mutate the same shared heap. Although that is objectively easy (I did it in HLVM), OCaml's reliance upon very high allocation rates, efficient collection of young garbage and a ridiculous density of pointers in the heap make it a *lot* harder. Cheers, Jon.