From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rdicosmo@gmail.com>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by sympa.inria.fr (Postfix) with ESMTPS id 50E2B7EE99
	for <caml-list@sympa.inria.fr>; Wed,  8 Jan 2014 21:29:57 +0100 (CET)
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  rdicosmo@gmail.com) identity=pra; client-ip=74.125.82.49;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="rdicosmo@gmail.com";
  x-sender="rdicosmo@gmail.com"; x-conformance=sidf_compatible
Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of
  rdicosmo@gmail.com designates 74.125.82.49 as permitted
  sender) identity=mailfrom; client-ip=74.125.82.49;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="rdicosmo@gmail.com";
  x-sender="rdicosmo@gmail.com"; x-conformance=sidf_compatible;
  x-record-type="v=spf1"
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@mail-wg0-f49.google.com) identity=helo;
  client-ip=74.125.82.49;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="rdicosmo@gmail.com";
  x-sender="postmaster@mail-wg0-f49.google.com";
  x-conformance=sidf_compatible
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AoIDAKS0zVJKfVIxlGdsb2JhbABZg0NQs2+GZxYOAQEBAQcLCwkSKoIlAQEBBDoGARQYDAEDDAEFBRgJJQ8FIAEFAQEhEwkSh1UDEQQBAwWdW49lkT8nDYUAEQEFDAqSGoETBJgWgTGLLYFpgWJBgWWCdQ
X-IPAS-Result: AoIDAKS0zVJKfVIxlGdsb2JhbABZg0NQs2+GZxYOAQEBAQcLCwkSKoIlAQEBBDoGARQYDAEDDAEFBRgJJQ8FIAEFAQEhEwkSh1UDEQQBAwWdW49lkT8nDYUAEQEFDAqSGoETBJgWgTGLLYFpgWJBgWWCdQ
X-IronPort-AV: E=Sophos;i="4.95,626,1384297200"; 
   d="scan'208";a="44032331"
Received: from mail-wg0-f49.google.com ([74.125.82.49])
  by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 08 Jan 2014 21:29:56 +0100
Received: by mail-wg0-f49.google.com with SMTP id x12so1920746wgg.28
        for <caml-list@inria.fr>; Wed, 08 Jan 2014 12:29:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=sender:date:from:to:cc:subject:message-id:references:mime-version
         :content-type:content-disposition:in-reply-to:user-agent;
        bh=OktY8Q0w7y5e3IUUCX6qaal7gTjoWymwvcbcwLDaW9c=;
        b=ro0D+qUmhp8pldWs1VpDYgGb3wbevvrB41Ohh7dRbf+hrpZfuyALaQ6BX4ZOSFGQP2
         s5RgAeuZFMppBKQ8EsYPLBdzx28pxnxwWB4fUTrWz8A5p/uWKaWZCp8lsHGnP6WUV6/4
         1cVGaMm8K0KIGy7eXBNGRxRcNsrdocXgB27U/KBtNGd44qmF2RkVF9DzR64pGDk/GbWt
         YO5OGUqiO2X6vIeGZHlUficzmhWqdswvTxiL9j7Z8p6mLXzzgdVMfaCZR2cu0//CwM18
         uYTdTCveHPci9CmLZ4srpYW03z/4DoaWj6osGcm2RDQ6rRcUaGlkSpJncYtgJ817oait
         fCQw==
X-Received: by 10.194.186.140 with SMTP id fk12mr12702153wjc.77.1389212996044;
        Wed, 08 Jan 2014 12:29:56 -0800 (PST)
Received: from voyager (bny92-3-81-56-44-163.fbx.proxad.net. [81.56.44.163])
        by mx.google.com with ESMTPSA id e1sm4157726wij.7.2014.01.08.12.29.54
        for <multiple recipients>
        (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
        Wed, 08 Jan 2014 12:29:55 -0800 (PST)
Sender: Roberto Di Cosmo <rdicosmo@gmail.com>
Received: from dicosmo by voyager with local (Exim 4.80)
	(envelope-from <roberto@dicosmo.org>)
	id 1W0zlA-000166-SI; Wed, 08 Jan 2014 21:29:52 +0100
Date: Wed, 8 Jan 2014 21:29:52 +0100
From: Roberto Di Cosmo <roberto@dicosmo.org>
To: Yotam Barnoy <yotambarnoy@gmail.com>
Cc: Ocaml Mailing List <caml-list@inria.fr>
Message-ID: <20140108202952.GA3669@voyager>
References: <CAN6ygOnW9bqcB3SeZiqgxFtPuqt2PXJ0-EBRS7Na9M0S6fT3KQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAN6ygOnW9bqcB3SeZiqgxFtPuqt2PXJ0-EBRS7Na9M0S6fT3KQ@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Subject: Re: [Caml-list] Concurrent/parallel programming

Dear Yotam,
    there are regularly discussions on how to perform computations involving
parallelism on this list; a relatively recent and detailed one is 
 
    https://sympa.inria.fr/sympa/arc/caml-list/2012-06/msg00043.html

And yes, one might be puzzled because there are so many different approaches,
but this is unavoidable, because there are so many different needs which are not
easily covered by a single approach.  If one wants to distribute a computation
that has very little data exchange compared to the local computation on each
node, the best approach is quite different to the one needed when large data
sets need to be shared or exchanged among the workers. And if you can do all on
a single machine, you can obtain significant speedup on a multicore machines by
exploiting OS level mechanisms that are not available if you need a cluster. Not
to mention the fact that in many cases one is looking for concurrency, and not
parallelism: even if at first sight they may look similar, deep down they really
are not.

Since you mention machine learning, it's quite probable that you want to perform
several iterations of relatively inexpensive computations on very large arrays
of integers or floats: if this is the case, Parmap (and not ParMap, that lends
to confusion with a different project in the Haskell world) was specially
designed to provide a highly efficient solution, and avoid boxing/unboxing
overhead of float arrays, *if* you use it properly (in particular, see the notes
at the end of the README file, or look at http://www.dicosmo.org/code/parmap/
and the research article pointed from there, where a precise discussion of the
steps involved in performing parallel computation on float arrays is given).
Actually, feedback from happy users, and synthetic benchmarks indicate that
Parmap should provide one of the best possible performances for this use case on
a multicore machine, short of resorting to using libraries like ancient, that
requires carefulness to avoid core dumps, or external libraries that already
have taken care of parallelism for you (like LaPack, etc.).

But if one performs a map over an array of floats *without* using
the special functions for floats, then all sort of boxing/unboxing and
copying will take place, and the "parallel" version might very well be
even slower than the sequential one, *if* the computation on each float
is fast.

Finally, if the float computations are the bottleneck, then a very interesting
project to keep an eye on is SPOC, that may significantly outperform everything
else on earth: taking advantage of the GPUs in your machine, it can perform
float computations on large arrays in a fraction of the time that your CPU,
even multicore, requires... but of course you need to learn to program a GPU
kernel for that. Learn more about this here http://www.algo-prog.info/spoc

The bottonline is, parallelism is easier than concurrency, but when
one looks for speed, every detail counts, and getting a real speedup
does not come for free.

We would really need a single place where to share ideas, tips and serious
analysis of the various available approaches: multicore machines and GPUs are a
reality, and this issue is bound to come up again and again.

--
Roberto



On Tue, Jan 07, 2014 at 02:54:33PM -0500, Yotam Barnoy wrote:
> Hi List
> 
> So far, I've been programming in ocaml using only sequential programs. In my
> last project, which was an implementation of a large machine learning
> algorithm, I tried to speed up computation using a little bit of parallelism
> with ParMap, and it was a complete failure. It's possible that more time would
> have yielded better results, but I just didn't have the time to invest in it
> given how bad the initial results were.
> 
> My question is, what are the options right now as far as parallelism is
> concerned? I'm not talking about cooperative multitasking, but about really
> taking advantage of multiple cores. I'm well aware of the runtime lock and I'm
> ok with message passing between processes or a shared area in memory, but I'd
> rather have something more high level than starting up several processes,
> creating a named pipe or a socket, and trying to pass messages through that.
> Also, I assume that using a shared area in memory involves some C code? Am I
> wrong about that?
> 
> I was expecting Core's Async to fill this role, but realworldocaml is fuzzy on
> this topic, apparently preferring to dwell on cooperative multitasking (which
> is fine but not what I'm looking for), and I couldn't find any other
> documentation that was clearer.
> 
> Thanks
> Yotam

-- 
Roberto Di Cosmo
 
------------------------------------------------------------------
Professeur               En delegation a l'INRIA
PPS                      E-mail: roberto@dicosmo.org
Universite Paris Diderot WWW  : http://www.dicosmo.org
Case 7014                Tel  : ++33-(0)1-57 27 92 20
5, Rue Thomas Mann       
F-75205 Paris Cedex 13   Identica: http://identi.ca/rdicosmo
FRANCE.                  Twitter: http://twitter.com/rdicosmo
------------------------------------------------------------------
Attachments:
MIME accepted, Word deprecated
      http://www.gnu.org/philosophy/no-word-attachments.html
------------------------------------------------------------------
Office location:
 
Bureau 3020 (3rd floor)
Batiment Sophie Germain
Avenue de France
Metro Bibliotheque Francois Mitterrand, ligne 14/RER C
-----------------------------------------------------------------
GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3