From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id ECE4CBBAF for ; Fri, 25 Sep 2009 23:34:33 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhgCAOzTvErU4xEKkWdsb2JhbACBUZkwAQEBAQkLCgcTA7t6AoQcBQ X-IronPort-AV: E=Sophos;i="4.44,453,1249250400"; d="scan'208";a="33553856" Received: from moutng.kundenserver.de ([212.227.17.10]) by mail2-smtp-roc.national.inria.fr with ESMTP; 25 Sep 2009 23:34:33 +0200 Received: from office1.lan.sumadev.de (dslb-094-219-214-239.pools.arcor-ip.net [94.219.214.239]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MLhzf-1MrZHX2ocN-000dBJ; Fri, 25 Sep 2009 23:34:24 +0200 Received: from [192.168.0.32] (dslb-084-058-025-090.pools.arcor-ip.net [84.58.25.90]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id 50CACC0EAA; Fri, 25 Sep 2009 23:34:24 +0200 (CEST) Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures From: Gerd Stolpmann To: Hugo Ferreira Cc: caml-list@yquem.inria.fr In-Reply-To: <4ABC720A.7060706@inescporto.pt> References: <200909241409.56894.jon@ffconsultancy.com> <20090924164933.GA5637@annexia.org> <200909242209.50565.jon@ffconsultancy.com> <20090925.130721.70227045.garrigue@math.nagoya-u.ac.jp> <4ABC720A.7060706@inescporto.pt> Content-Type: text/plain Date: Fri, 25 Sep 2009 23:39:03 +0200 Message-Id: <1253914743.10967.50.camel@flake.lan.gerd-stolpmann.de> Mime-Version: 1.0 X-Mailer: Evolution 2.12.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX19v5g56oRWh+PWogrtjunPOdlqOgeldcc718cz b8YueSywyf18v5pyZ+VfOTNOU1io6y1GNzTirRHaHCtYj6PSHB 68BGuLlLHf749r9/MOeAQ== X-Spam: no; 0.00; ocaml:01 gerd:01 stolpmann:01 gerd:01 runtime:01 runtime:01 ocaml:01 parallelism:01 stolpmann:01 darmstadt:01 6151:01 6151:01 heap:01 symbolic:01 caml-list:01 > Rethinking our application/algorithmic structure may not be a real > deterrent. An application does not require parallel/concurrent > processing everywhere. It is really a question of identifying where > and when this is useful. Much like selecting the most "appropriate" > data-structure for any application. It's not an all or nothing > proposition. Well, if you get many cores for free it sounds logical to get the most out of it. If you have to pay for extra cores, it becomes quickly a bad deal. Imagine you can parallelize 50% of the runtime of the application. Even if you have as many cores as you want, and the runtime of the sped-up part drops to almost 0, the other still-sequential 50% limit the overall improvement to only 50%. (That's known as Amdahl's law, Xavier also mentioned it.) So, especially when you have many cores, it is not the number of cores that limit the speed-up in practice, but the fraction of the algorithm that can be parallelized at all. I'm working for a company that uses Ocaml in a highly parallelized world. We are running it on grid-style compute clusters to process text and symbolic data. We are using multi-processing, which is easy to do with current Ocaml. Programs we write often run on more than 100 cores. Guess what our biggest problem is? Getting all the cores busy. Because there is always also some sequential part, or buggy parallel part that limits the overall throughput. We are constantly searching for these "bottlenecks" as our managers call this phenomenon (and we get a lot of pressure because the company pays a lot for these many cores, and they want to see them utilized). We have the big advantage that our data sets are already organized in an easy-to-parallelize way, i.e. you can usually split it up into independent portions, and process them independently (but not always). If you cannot do this (like in a multi-core-capable GC where always some part of the heap is shared by all cores), things become quickly very complicated. So I generally do not expect much from such a GC. We are also using Java with its multi-core GC. However, we are sometimes seeing better performance when we don't scale it to the full number of cores the system has, but also combine it with multi-processing (i.e. start several Javas). I simply guess the GC runs at some time into lock contention, and has to do many things sequentially. So, I'm a professional and massive user of multi-core programming. Nevertheless, my first wish is not to get a multi-core GC for shared-memory parallelism, because I doubt we ever get a satisfactory solution. My first wish is to make single-threaded execution as fast as possible. The second one is to make RPC's cheaper, especially between processes on the same system (or put it this way: I'd like to see that the processes normally have their private heaps and are fully separated, but also that they can use a shared memory segment by explicitly moving values there - in the direction of Richard's Ancient module - so that it is possible to make an RPC call by moving data to this special segment). Of course, I appreciate any work on multi-core improvements, so applause to Philippe and team. Gerd -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------