From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id CBAA1BC48 for ; Thu, 7 Apr 2005 23:47:34 +0200 (CEST) Received: from smtp12.wanadoo.fr (smtp12.wanadoo.fr [193.252.22.20]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j37LlYrE012252 for ; Thu, 7 Apr 2005 23:47:34 +0200 Received: from me-wanadoo.net (unknown [127.0.0.1]) by mwinf1201.wanadoo.fr (SMTP Server) with ESMTP id 71A991C00094 for ; Thu, 7 Apr 2005 23:47:34 +0200 (CEST) Received: from RIVENDELL (AFontenayssB-151-1-53-162.w82-121.abo.wanadoo.fr [82.121.88.162]) by mwinf1201.wanadoo.fr (SMTP Server) with ESMTP id E31541C00085 for ; Thu, 7 Apr 2005 23:47:31 +0200 (CEST) X-ME-UUID: 20050407214731930.E31541C00085@mwinf1201.wanadoo.fr From: "Yoann Fabre" To: Subject: Raising an old issue : true concurrency in OCaml [Xavier, Damien, any] Date: Thu, 7 Apr 2005 23:47:46 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.6353 Thread-Index: AcU7u291GAmmI4G9Syi0orJc9Cjt8g== X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Message-Id: <20050407214731.E31541C00085@mwinf1201.wanadoo.fr> X-Miltered: at nez-perce with ID 4255AA76.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocaml:01 damien:01 threads:01 sig:01 xavier's:01 leaking:01 lazy:01 powerpc:01 imho:01 ocaml:01 ocamlopt:01 gcc:01 threads:01 stdlib:01 xavier's:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: Well... shame on me! I'm afraid I'm going to restart the "annual discussion on threads". I'm very sorry about that Xavier, but I do believe that the recent trend in generic CPU design calls for a new discussion. I can only hope all of you will forgive my too quickly written English... and of course the fact that I'm definitely "just a nameless student" :-) Since it seems to be fashionable here nowadays, I introduce myself quickly in the .sig. I also invite you to read the last (full) Xavier's reply about this topic (link and quotes included below). My point is the following (let's be blatant): I'm afraid any pure-OCaml program will be limited to at most 50% of the CPU power on about 50% of the machines in the next five years ; and probably 25/30% on the remaining 40% of medium to high-end machines. Why? Because current CPU design is going toward true parallelism with multi-core and hyper-threading like techs. This is /not/ yet another marketing hype... there're profound physical reason being such a move. It seems to be impossible to scale up in frequency any more due to leaking current etc. See articles on http://www.anandtech.com/cpuchipsets/ for a short introduction and more links. (I'm ashamed of such a lake of true reference, but I'm too lazy to find them among my papers right now...) So Intel, AMD, IBM/Motorola and Sun are all releasing multi-core/HT CPU right now. SMP systems will not be the exception any more, but the rule! See: Intel Pentium 4C (1 HT core ~ 1.7 cores) Intel Pentium 4D (2 cores) Intel Pentium EE 840 (2 HT cores ~ 3.4 cores) New Sun micro-arch, a kind of Intel HT New (incoming) multi-core AMD64 New multi-core PowerPC IBM Cell (at least 1+6 cores) etc... And of course, we cannot execute more than one Caml thread at a time... So the questions are: - Do we need to do something about it? - If yes, what can be done in the short term? - And in the long term? IMHO, I think the Caml team need to do something... I do /love/ Ocaml but I also do run hungry code (Non-ODE solving, temporal series discovery...) and it seems that a lot of that code can be parallelized quite easily. I'm already tired not to be able to go above about 65% CPU usage on my P4C 3.2 (Intel VTune, not Task-Manager figures). I've run the FPU part of some constraint solving based on interval arithmetic in a C thread while the high level constraint management was still in OCaml (tricky business). The FPU ASM was nearly the same between ocamlopt and GCC 3.2 but I've nearly doubled the perf! A simple and efficient usage of hyper-threading between strongly FPU and ALU oriented threads (and maybe a better cache management provided by the use of the two context entries). For some project I've seriously considered using some language with a concurrent GC (IBM's java VM ?, C# ?). What an awful thought, isn't it? Their god, help me! Avoid me C++ in any way... (See also my short reply to Xavier at the end of this mail.) So what can we do? Here's my two cents proposal for the short/long term: Phase 1 - Maybe provide some module in the stdlib to: - allow easy management of multiple Caml /processes/ - allow easy communication with message passing (MPI like + Marshal) - allow easy synchronisations between these pseudo-threads Phase 2 - Add another module to: - provide a standard interface to memory sharing - allow some awful dirty hack to GC-allocate some special blocks into that region? Some "custom2" with reference counting (ouch!) between the GCs Phase 3 - write a concurrent GC (CGC) (re-ouch!) Well, I've studied a bit the CGC field and read Damien's Phd... It's frightening! But, to recast the issue in a somewhat broader context, can we really still pretend that a "modern" generic language can live without true concurrency in 2005? All in all Xavier's recognized the importance of threading for OCaml, since he's written the first linux pthread library for that very purpose... Granted it was not aimed at performances, but it's now an open question. The old dream of "never two times slower than C" is (was?) a reality. I use OCaml partly for that kind of perf... Is "never height times slower than C" (4 cores) still an acceptable tradeoffs? I don't think so and I'm worried. Only hoping to trigger a constructive discussion, Cordially, Yoann.Fabre@lip6.fr I code in Caml since 1995 (grateful victim of the ENS push on Prepa program). I've read this list nearly daily for six years. I've written more the 200.000 lines of OCaml code (wc -l) ranging from 3D system to ML like byte code compiler and type-system, going through C binding of things like CrystalSpace... I've also done dirty unsatisfactory hack in the runtime system to implements a ultra-lightweight concurrent language (a la Peyton-Jones feather-weight continuations) and some true interval arithmetic etc... Some poor fellow were also victim of my hard Ocaml evangelism -- Hi Nadji, got that you're now in PhD with Francois, how is it going? Hi SebC ! Well I do not want to be conceited... I've only tried to establish that I've got some XP in OCaml, since I know from practice that it does help in a debate with some of the world-class coders who are wandering here... ----------------------------------------------------------------------- Xavier Leroy Re: [Caml-list] Why systhreads? 2002-11-25 (10:01) http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2ca cb73338fb15.en.html What about parallelism on SMP machines? The main issue here is that the runtime system, and in particular the garbage collector and memory manager, must be MP-safe. [...] a concurrent GC avoids this problem, but adds tremendous complexity. (Of course, all this SMP support stuff slows down the runtime system even if there is only one processor, which is the case for almost all our users...) [...] Why was Concurrent Caml Light abandoned? - Too complex; too hard to debug - Dubious practical interest. Shared-memory multiprocessors have never really "taken off", at least in the general public [...] >>> True, until now. [...] Even if you have a 4-processor SMP machine, it isn't clear whether you should write your program using shared memory or using message passing [...] >>> OK if you only need to copy large linear block of memory. But seriously, do you really want to marshal ~50Mo of complex Caml data structure? This /has/ a non negligible cost and I don't want to think about maintaining sharing (which may be mandatory). It's, what, 1.5 million accesses to the hash table in marshal.c (with a mean of 8*4 bytes by block). [...] What about hyperthreading? Well, I believe it's the last convulsive movement of SMP's corpse :-) We'll see how it goes market-wise. [...] >>> All current CPU designs seem to disagree. [...] At any rate, the speedups announced for hyperthreading in the Pentium 4 are below a factor of 1.5; probably not enough to offset the overhead of making the OCaml runtime system thread-safe. [...] >>> The ratio is more about 1.7. I've used HT for more than a year, and I think it's a truly effective tech. Maybe the most effective tech. (from a complexity/perf ratio viewpoint) introduced by intel since the P2. [...] In summary: there is no SMP support in OCaml, and it is very very unlikely that there will ever be. >>> Ouch!