From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 50E2B7EE99 for ; Wed, 8 Jan 2014 21:29:57 +0100 (CET) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of rdicosmo@gmail.com) identity=pra; client-ip=74.125.82.49; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="rdicosmo@gmail.com"; x-sender="rdicosmo@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of rdicosmo@gmail.com designates 74.125.82.49 as permitted sender) identity=mailfrom; client-ip=74.125.82.49; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="rdicosmo@gmail.com"; x-sender="rdicosmo@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-wg0-f49.google.com) identity=helo; client-ip=74.125.82.49; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="rdicosmo@gmail.com"; x-sender="postmaster@mail-wg0-f49.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AoIDAKS0zVJKfVIxlGdsb2JhbABZg0NQs2+GZxYOAQEBAQcLCwkSKoIlAQEBBDoGARQYDAEDDAEFBRgJJQ8FIAEFAQEhEwkSh1UDEQQBAwWdW49lkT8nDYUAEQEFDAqSGoETBJgWgTGLLYFpgWJBgWWCdQ X-IPAS-Result: AoIDAKS0zVJKfVIxlGdsb2JhbABZg0NQs2+GZxYOAQEBAQcLCwkSKoIlAQEBBDoGARQYDAEDDAEFBRgJJQ8FIAEFAQEhEwkSh1UDEQQBAwWdW49lkT8nDYUAEQEFDAqSGoETBJgWgTGLLYFpgWJBgWWCdQ X-IronPort-AV: E=Sophos;i="4.95,626,1384297200"; d="scan'208";a="44032331" Received: from mail-wg0-f49.google.com ([74.125.82.49]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 08 Jan 2014 21:29:56 +0100 Received: by mail-wg0-f49.google.com with SMTP id x12so1920746wgg.28 for ; Wed, 08 Jan 2014 12:29:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=OktY8Q0w7y5e3IUUCX6qaal7gTjoWymwvcbcwLDaW9c=; b=ro0D+qUmhp8pldWs1VpDYgGb3wbevvrB41Ohh7dRbf+hrpZfuyALaQ6BX4ZOSFGQP2 s5RgAeuZFMppBKQ8EsYPLBdzx28pxnxwWB4fUTrWz8A5p/uWKaWZCp8lsHGnP6WUV6/4 1cVGaMm8K0KIGy7eXBNGRxRcNsrdocXgB27U/KBtNGd44qmF2RkVF9DzR64pGDk/GbWt YO5OGUqiO2X6vIeGZHlUficzmhWqdswvTxiL9j7Z8p6mLXzzgdVMfaCZR2cu0//CwM18 uYTdTCveHPci9CmLZ4srpYW03z/4DoaWj6osGcm2RDQ6rRcUaGlkSpJncYtgJ817oait fCQw== X-Received: by 10.194.186.140 with SMTP id fk12mr12702153wjc.77.1389212996044; Wed, 08 Jan 2014 12:29:56 -0800 (PST) Received: from voyager (bny92-3-81-56-44-163.fbx.proxad.net. [81.56.44.163]) by mx.google.com with ESMTPSA id e1sm4157726wij.7.2014.01.08.12.29.54 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 08 Jan 2014 12:29:55 -0800 (PST) Sender: Roberto Di Cosmo Received: from dicosmo by voyager with local (Exim 4.80) (envelope-from ) id 1W0zlA-000166-SI; Wed, 08 Jan 2014 21:29:52 +0100 Date: Wed, 8 Jan 2014 21:29:52 +0100 From: Roberto Di Cosmo To: Yotam Barnoy Cc: Ocaml Mailing List Message-ID: <20140108202952.GA3669@voyager> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: [Caml-list] Concurrent/parallel programming Dear Yotam, there are regularly discussions on how to perform computations involving parallelism on this list; a relatively recent and detailed one is https://sympa.inria.fr/sympa/arc/caml-list/2012-06/msg00043.html And yes, one might be puzzled because there are so many different approaches, but this is unavoidable, because there are so many different needs which are not easily covered by a single approach. If one wants to distribute a computation that has very little data exchange compared to the local computation on each node, the best approach is quite different to the one needed when large data sets need to be shared or exchanged among the workers. And if you can do all on a single machine, you can obtain significant speedup on a multicore machines by exploiting OS level mechanisms that are not available if you need a cluster. Not to mention the fact that in many cases one is looking for concurrency, and not parallelism: even if at first sight they may look similar, deep down they really are not. Since you mention machine learning, it's quite probable that you want to perform several iterations of relatively inexpensive computations on very large arrays of integers or floats: if this is the case, Parmap (and not ParMap, that lends to confusion with a different project in the Haskell world) was specially designed to provide a highly efficient solution, and avoid boxing/unboxing overhead of float arrays, *if* you use it properly (in particular, see the notes at the end of the README file, or look at http://www.dicosmo.org/code/parmap/ and the research article pointed from there, where a precise discussion of the steps involved in performing parallel computation on float arrays is given). Actually, feedback from happy users, and synthetic benchmarks indicate that Parmap should provide one of the best possible performances for this use case on a multicore machine, short of resorting to using libraries like ancient, that requires carefulness to avoid core dumps, or external libraries that already have taken care of parallelism for you (like LaPack, etc.). But if one performs a map over an array of floats *without* using the special functions for floats, then all sort of boxing/unboxing and copying will take place, and the "parallel" version might very well be even slower than the sequential one, *if* the computation on each float is fast. Finally, if the float computations are the bottleneck, then a very interesting project to keep an eye on is SPOC, that may significantly outperform everything else on earth: taking advantage of the GPUs in your machine, it can perform float computations on large arrays in a fraction of the time that your CPU, even multicore, requires... but of course you need to learn to program a GPU kernel for that. Learn more about this here http://www.algo-prog.info/spoc The bottonline is, parallelism is easier than concurrency, but when one looks for speed, every detail counts, and getting a real speedup does not come for free. We would really need a single place where to share ideas, tips and serious analysis of the various available approaches: multicore machines and GPUs are a reality, and this issue is bound to come up again and again. -- Roberto On Tue, Jan 07, 2014 at 02:54:33PM -0500, Yotam Barnoy wrote: > Hi List > > So far, I've been programming in ocaml using only sequential programs. In my > last project, which was an implementation of a large machine learning > algorithm, I tried to speed up computation using a little bit of parallelism > with ParMap, and it was a complete failure. It's possible that more time would > have yielded better results, but I just didn't have the time to invest in it > given how bad the initial results were. > > My question is, what are the options right now as far as parallelism is > concerned? I'm not talking about cooperative multitasking, but about really > taking advantage of multiple cores. I'm well aware of the runtime lock and I'm > ok with message passing between processes or a shared area in memory, but I'd > rather have something more high level than starting up several processes, > creating a named pipe or a socket, and trying to pass messages through that. > Also, I assume that using a shared area in memory involves some C code? Am I > wrong about that? > > I was expecting Core's Async to fill this role, but realworldocaml is fuzzy on > this topic, apparently preferring to dwell on cooperative multitasking (which > is fine but not what I'm looking for), and I couldn't find any other > documentation that was clearer. > > Thanks > Yotam -- Roberto Di Cosmo ------------------------------------------------------------------ Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo FRANCE. Twitter: http://twitter.com/rdicosmo ------------------------------------------------------------------ Attachments: MIME accepted, Word deprecated http://www.gnu.org/philosophy/no-word-attachments.html ------------------------------------------------------------------ Office location: Bureau 3020 (3rd floor) Batiment Sophie Germain Avenue de France Metro Bibliotheque Francois Mitterrand, ligne 14/RER C ----------------------------------------------------------------- GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3