From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id AE578BC57 for ; Tue, 16 Nov 2010 18:04:07 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AncCANdH4kzU4xEJkGdsb2JhbACDSJBmjjEVAQEBAQkJDAcRAx+ueZB1AoEggzZzBIcIhlw X-IronPort-AV: E=Sophos;i="4.59,206,1288566000"; d="scan'208";a="78492107" Received: from moutng.kundenserver.de ([212.227.17.9]) by mail4-smtp-sop.national.inria.fr with ESMTP; 16 Nov 2010 18:04:06 +0100 Received: from office1.lan.sumadev.de (dslb-084-059-065-230.pools.arcor-ip.net [84.59.65.230]) by mrelayeu.kundenserver.de (node=mrbap2) with ESMTP (Nemesis) id 0LbgPT-1Oc9K32ZTf-00kdrK; Tue, 16 Nov 2010 18:04:04 +0100 Received: from [192.168.0.29] (dslb-084-058-005-030.pools.arcor-ip.net [84.58.5.30]) by office1.lan.sumadev.de (Postfix) with ESMTPA id 074DA5F702; Tue, 16 Nov 2010 18:04:03 +0100 (CET) Subject: Re: [Caml-list] SMP multithreading From: Gerd Stolpmann To: Edgar Friendly Cc: caml-list@yquem.inria.fr In-Reply-To: <4CE228CA.3030503@gmail.com> References: <20101115182737.42b8dcae@loki.yggdrasil.draxit.de> <4CE228CA.3030503@gmail.com> Content-Type: text/plain; charset="UTF-8" Date: Tue, 16 Nov 2010 18:04:02 +0100 Message-ID: <1289927042.16005.176.camel@thinkpad> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:IvyImfDQ5yBAYpgdLGoZwi5zN9BXbRNA0BcRs+po9gc c8mPdO41oTyw+p1UEiGozLCbIc/AUcLm1C1boIbYw9n9ldauTJ mmORpdgXTUYO7rEu/fSIcc4Twb1EsO86UlHViyvrWRF9bD96Wv DQxNjhtGZtqUCMaC1ymh6eG5QW31pVD+xsHT/na4jylpyj6c74 lvOTSDI2Bn29aYjQKLzEw== X-Spam: no; 0.00; gerd:01 stolpmann:01 speedups:01 ocaml:01 runtime:01 ocaml:01 cheers:01 non-uniform:01 ocamlnet-:01 runtime:01 model:01 compiler:01 compiler:01 speedups:01 gerd:01 Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: > On 11/15/2010 09:27 AM, Wolfgang Draxinger wrote: > > Hi, > > > > I've just read > > http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html > > in particular this paragraph: > > | What about hyperthreading? Well, I believe it's the last convulsive > > | movement of SMP's corpse :-) We'll see how it goes market-wise. At > > | any rate, the speedups announced for hyperthreading in the Pentium 4 > > | are below a factor of 1.5; probably not enough to offset the overhead > > | of making the OCaml runtime system thread-safe. > > > > This reads just like the "640k ought be enough for everyone". Multicore > > systems are the standard today. Even the cheapest consumer machines > > come with at least two cores. Once can easily get 6 core machines today. > > > > Still thinking SMP was a niche and was dying? > > > > So, what're the developments regarding SMP multithreading OCaml? > > > > > > Cheers > > > > Wolfgang > > > At the risk of feeding a (possibly unintentional) troll, I'd like to > share some possibly new thoughts on this ever-living topic. > > It looks like high-performance computing of the near future will be > built out of many machines (message passing), each with many cores > (SMP). One could use message passing for all communication in such a > system, but a hybrid approach might be best for this architecture, with > use of shared memory within each box and message passing between. Of > course the best choice depends strongly on the particular task. > > In the long run, it'll likely be a combination of a few large, powerful > cores (Intel-CPU style w/ the capability to run a single thread as fast > as possible) with many many smaller compute engines (GPGPUs or the like, > optimized for power and area, closely coupled with memory) that provides > the highest performance density. > > The question of how to program such an architecture seems as if it's > being answered without the functional community's input. What can we > contribute? Yes, that's generally the right question. Current hardware is a kind of experiment - vendors have only taken the multicore path because it is right now the easiest way of improving the performance potential, although it is questionable whether (non-server) applications can really benefit from it (excluding here server apps because for these parallelization is relatively easy to get). Future hardware will probably be even more different - however, it is still unclear which design paths will be taken. Could be manycores (many CPUs with non-uniform RAM), could be specialized compute units. Maybe we'll see again a separation of consumer and datacenter markets - the former optimizing for numeric simulation applications (i.e. games), the latter for high-throughput data paths and parallel CPU power. The problem here is that this is all speculation. There are some things we can do to improve the situation (and some ideas are not realistic): * A probably not-so-difficult improvement would be better message passing between independent but local processes. I've started an experiment for such a mechanism (http://projects.camlcity.org/projects/dl/ocamlnet-3.0.3/doc/html-main/Netcamlbox.html), which tries to exploit that GC-managed memory has a well-known structure. With more help from the GC this could be made even better (safer, fewer corner cases). * We need more frameworks for parallel programming. I'm currently developing Plasma, a Map/Reduce framework. Using a framework has the big advantage that the whole program is structured so it profits from parallelization, and that it is possible to train developers for it that have no idea about parallelization. There are probably more algorithm schemes where this is possible. * I have a lot of doubts whether FP languages ever run well on SMP with a bigger number of cores. The problem is the relatively high memory allocation rate - the GC has to work a lot harder than in imperative languages. The OC4MC project uses thread-local minor heaps because of this. Probably this is not enough, and one even needs thread-local major heaps (plus a third generation for values accessed by several threads). All in all you could get the same effect by instantiating the ocaml runtime several times (if this were possible), let each runtime run in its own thread, and provide some extra functionality for passing values between threads and for sharing values. This would not be exactly the SMP model, but would allow a number of parallelization techniques, and is probably future-proof as it encourages message passing over sharing. This is certainly worth experimentation. * One can also tackle the problem from the multi-processing side: Provide better mechanisms for message passing (see above) and value sharing. (That's probably the path I'll follow for my own experiments - no modifications of the runtime, but play tricks with the OS.) * As somebody mentioned "implicit parallelization": Don't expect anything from this. Even if a good compiler finds ways to parallelize 20% of the code (which would be a lot), the runtime effect would be marginal. 80% of the code is run at normal speed (hopefully) and dominates the runtime behavior. The point is that such compiler-driven code improvements are only local optimizations. For getting good parallelization results you need to restructure the design of the program - well, maybe compiler2.0 can do this at some time, but this is not in sight. * Looking for more "automatic" speedups: I would more look for parallelizing parts of the GC (e.g. parallelized sweep), but this is probably running quickly against the memory bandwidth limit. Maybe using 2 cores for the GC would result in an improvement, and more cores get you nothing extra. At least worth an experiment. Gerd > > E. > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------