From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id A9D18BC57 for ; Tue, 16 Nov 2010 15:11:19 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtMEAP8f4kxYxkJF/2dsb2JhbACDSJ5HgUGtAXWQcIEigzZzBIpY X-IronPort-AV: E=Sophos;i="4.59,206,1288566000"; d="scan'208";a="78336122" Received: from a.ns.draxit.de (HELO mail.draxit.de) ([88.198.66.69]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/ADH-CAMELLIA256-SHA; 16 Nov 2010 15:11:19 +0100 Received: from narfi.yggdrasil.draxit.de (0022fbb59232.dfn.mwn.de [138.246.42.80]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mail.draxit.de (Postfix) with ESMTPSA id 2DC471D6B3 for ; Tue, 16 Nov 2010 15:11:13 +0100 (CET) Date: Tue, 16 Nov 2010 15:11:11 +0100 From: Wolfgang Draxinger Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] SMP multithreading Message-ID: <20101116151111.570e5a32@narfi.yggdrasil.draxit.de> In-Reply-To: References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de> Organization: DraxIT X-Mailer: Claws Mail 3.7.5 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; 1333:01 ocaml:01 hpc:01 ocaml:01 dispatching:01 buffer:01 hpc:01 compiler:01 clocked:01 generations:01 deallocation:01 maillist:98 sdram:98 threads:01 threads:01 Am Mon, 15 Nov 2010 22:05:52 +0100 schrieb Philippe Wang : > Take the current Apple Mac Pro for instance (I take this reference > because it's easy to find and it doesn't evolve very often), with > 12-core configuration. > - Two 2.93GHz 6-Core Intel Xeon =E2=80=9CWestmere=E2=80=9D (12 cores) > - 1333MHz DDR3 ECC SDRAM (whatever the capacity) > =3D> with HT, there are 24 logical units, which all share a tiny > bandwidth for CPU<->RAM communications. > Let's say bandwidth is about 2400MHz : 2400MHz/24Thread =3D > 100MHz/Thread. It's kind of "ridiculous..." You're assuming that there'd be a lot of communication between cores and RAM. Which is not (or should not) be the case in well written multithreaded programs. > OCaml is not (at least not yet) a language for HPC (high performance > computing), it is very efficient (compared to so many other languages) > and yet doesn't not take advantage of SMP. Well, sooner or later it > will actually probably need to support SMP. (But somehow it already > does, via C code boxed in "blocking sections"). Which is a pitty, since especially functional languages could much better parallelize tasks implicitly. > Well, if you take casual OCaml programs, and put them on SMP > architectures (on which indeed they often already are) while giving > them capacity to take advantage of SMP (via POSIX-C threads in > blocking sections, message-passing style, or OCaml-for-multicore, or > whatever else), they quickly become less efficient because there is a > bottleneck on the CPU<->RAM bus. Suppose you were to implement a convolution in n dimensions on a large data set. This is a prime example of where multithreading can help and where main-memory bandwidth is not the limiting factor. One can split up the whole task in small tasklets dispatching them tho individual cores. As long as the dataset, which are the payload data i.e. input, convolution kernel (and output buffer if not in-place) plus code, fit into the L1 cache everything will be executed on-cache. On current Intel CPUs this are 32kB, AMD it's even 64kB -- per core! And all the cores on the same die share L2 cache, which has far more bandwidth, about an order of magnitude, than to system memory. Modern OS schedulers thus try to keep together threads of the same process on CPU dice in the system. And further group it by NUMA. > I want to believe you're right to ask for SMP support, even if now I'm > pretty convinced that current state of OCaml is not compatible with "I > want to write HPC programs in pure OCaml". (One should implement a > brand new compiler maybe??) This is not just about HPC but about resource utilization. A single core running at full speed consumes far more power, than 4 cores, clocked down to minimal frequency. Even worse only the most recent CPU generations can clock cores individually. So a single core running at full speed will significantly increase power consumption (and thermal output). =20 > There are people studying how to have HPC with OCaml, but it has quite > a little to do with SMP matters. Instead, it's more about (static or > dynamic) specialized-code generation for GPUs etc. We'll see in some > time what it produces... For the time being I'm more interested in what's actually preventing proper SMP in OCaml right now. I've read something about issues with the garbage collector, which surprises be, as I switched over to use Boehm-GC in my C programs to resolve problems in memory deallocation in multithreaded programs -- this of course was possible only after Boehm-GC became thread safe. Wolfgang