Re: [Caml-list] Concurrent/parallel programming

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Roberto Di Cosmo <roberto@dicosmo.org>
To: Yotam Barnoy <yotambarnoy@gmail.com>
Cc: Ocaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Concurrent/parallel programming
Date: Wed, 8 Jan 2014 21:29:52 +0100	[thread overview]
Message-ID: <20140108202952.GA3669@voyager> (raw)
In-Reply-To: <CAN6ygOnW9bqcB3SeZiqgxFtPuqt2PXJ0-EBRS7Na9M0S6fT3KQ@mail.gmail.com>

Dear Yotam,
    there are regularly discussions on how to perform computations involving
parallelism on this list; a relatively recent and detailed one is 

    https://sympa.inria.fr/sympa/arc/caml-list/2012-06/msg00043.html

And yes, one might be puzzled because there are so many different approaches,
but this is unavoidable, because there are so many different needs which are not
easily covered by a single approach.  If one wants to distribute a computation
that has very little data exchange compared to the local computation on each
node, the best approach is quite different to the one needed when large data
sets need to be shared or exchanged among the workers. And if you can do all on
a single machine, you can obtain significant speedup on a multicore machines by
exploiting OS level mechanisms that are not available if you need a cluster. Not
to mention the fact that in many cases one is looking for concurrency, and not
parallelism: even if at first sight they may look similar, deep down they really
are not.

Since you mention machine learning, it's quite probable that you want to perform
several iterations of relatively inexpensive computations on very large arrays
of integers or floats: if this is the case, Parmap (and not ParMap, that lends
to confusion with a different project in the Haskell world) was specially
designed to provide a highly efficient solution, and avoid boxing/unboxing
overhead of float arrays, *if* you use it properly (in particular, see the notes
at the end of the README file, or look at http://www.dicosmo.org/code/parmap/
and the research article pointed from there, where a precise discussion of the
steps involved in performing parallel computation on float arrays is given).
Actually, feedback from happy users, and synthetic benchmarks indicate that
Parmap should provide one of the best possible performances for this use case on
a multicore machine, short of resorting to using libraries like ancient, that
requires carefulness to avoid core dumps, or external libraries that already
have taken care of parallelism for you (like LaPack, etc.).

But if one performs a map over an array of floats *without* using
the special functions for floats, then all sort of boxing/unboxing and
copying will take place, and the "parallel" version might very well be
even slower than the sequential one, *if* the computation on each float
is fast.

Finally, if the float computations are the bottleneck, then a very interesting
project to keep an eye on is SPOC, that may significantly outperform everything
else on earth: taking advantage of the GPUs in your machine, it can perform
float computations on large arrays in a fraction of the time that your CPU,
even multicore, requires... but of course you need to learn to program a GPU
kernel for that. Learn more about this here http://www.algo-prog.info/spoc

The bottonline is, parallelism is easier than concurrency, but when
one looks for speed, every detail counts, and getting a real speedup
does not come for free.

We would really need a single place where to share ideas, tips and serious
analysis of the various available approaches: multicore machines and GPUs are a
reality, and this issue is bound to come up again and again.

--
Roberto

On Tue, Jan 07, 2014 at 02:54:33PM -0500, Yotam Barnoy wrote:
> Hi List
> 
> So far, I've been programming in ocaml using only sequential programs. In my
> last project, which was an implementation of a large machine learning
> algorithm, I tried to speed up computation using a little bit of parallelism
> with ParMap, and it was a complete failure. It's possible that more time would
> have yielded better results, but I just didn't have the time to invest in it
> given how bad the initial results were.
> 
> My question is, what are the options right now as far as parallelism is
> concerned? I'm not talking about cooperative multitasking, but about really
> taking advantage of multiple cores. I'm well aware of the runtime lock and I'm
> ok with message passing between processes or a shared area in memory, but I'd
> rather have something more high level than starting up several processes,
> creating a named pipe or a socket, and trying to pass messages through that.
> Also, I assume that using a shared area in memory involves some C code? Am I
> wrong about that?
> 
> I was expecting Core's Async to fill this role, but realworldocaml is fuzzy on
> this topic, apparently preferring to dwell on cooperative multitasking (which
> is fine but not what I'm looking for), and I couldn't find any other
> documentation that was clearer.
> 
> Thanks
> Yotam

-- 
Roberto Di Cosmo

------------------------------------------------------------------
Professeur               En delegation a l'INRIA
PPS                      E-mail: roberto@dicosmo.org
Universite Paris Diderot WWW  : http://www.dicosmo.org
Case 7014                Tel  : ++33-(0)1-57 27 92 20
5, Rue Thomas Mann       
F-75205 Paris Cedex 13   Identica: http://identi.ca/rdicosmo
FRANCE.                  Twitter: http://twitter.com/rdicosmo
------------------------------------------------------------------
Attachments:
MIME accepted, Word deprecated
      http://www.gnu.org/philosophy/no-word-attachments.html
------------------------------------------------------------------
Office location:

Bureau 3020 (3rd floor)
Batiment Sophie Germain
Avenue de France
Metro Bibliotheque Francois Mitterrand, ligne 14/RER C
-----------------------------------------------------------------
GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

next prev parent reply	other threads:[~2014-01-08 20:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07 19:54 Yotam Barnoy
2014-01-07 20:12 ` Yaron Minsky
2014-01-07 20:21   ` Yaron Minsky
2014-01-07 20:35 ` Gerd Stolpmann
2014-01-08  2:35   ` Yotam Barnoy
2014-01-08  3:33     ` Francois Berenger
2014-01-08  4:01       ` Yotam Barnoy
2014-01-08  8:37     ` Gabriel Scherer
2014-01-08 11:36     ` Gerd Stolpmann
2014-01-08 11:55       ` Mark Shinwell
2014-01-08 13:38         ` Gerd Stolpmann
2014-01-07 21:51 ` Markus Mottl
     [not found] ` <20140107200328.GA14297@voyager>
2014-01-08  1:12   ` Francois Berenger
2014-01-08 20:29 ` Roberto Di Cosmo [this message]
2014-01-08 22:13   ` Yotam Barnoy
2014-01-08 22:38     ` Anil Madhavapeddy
2014-01-08 22:57       ` [Caml-list] [ocaml-infra] " Ashish Agarwal
2014-01-09  2:52         ` Yotam Barnoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140108202952.GA3669@voyager \
    --to=roberto@dicosmo.org \
    --cc=caml-list@inria.fr \
    --cc=yotambarnoy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).