From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id pAUEwEDw029412
	for <caml-list@sympa-roc.inria.fr>; Wed, 30 Nov 2011 15:58:15 +0100
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqwBACND1k6Bu/5qkWdsb2JhbABEmn6QESIBAQEBCQsLBxQEIYFyAQEEAQwbCwE5AggDBQsLEhMPCwdFAQMOBhMSCYdsCLgag22HMwSmaA
X-IronPort-AV: E=Sophos;i="4.69,597,1315173600"; 
   d="scan'208";a="121524433"
Received: from mailrelay1.lrz-muenchen.de ([129.187.254.106])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 30 Nov 2011 15:58:09 +0100
Received: from webmail.mwn.de ([129.187.254.85] [129.187.254.85]) by mailout.lrz-muenchen.de with ESMTP; Wed, 30 Nov 2011 15:57:49 +0100
Received: from webprx2.audi.de ([143.164.102.14])
        (SquirrelMail authenticated user weissmam)
        by webmail.mwn.de with HTTP;
        Wed, 30 Nov 2011 15:57:51 +0100 (CET)
Message-Id: <63143.143.164.102.14.1322665071.squirrel@webmail.mwn.de>
In-Reply-To: 
     <CAJBwKuU3HtmhEdxKFTEoqFtAUyNnGPFjdiz86j7RjG+6Y4r7VQ@mail.gmail.com>
References: 
    <CAJBwKuU3HtmhEdxKFTEoqFtAUyNnGPFjdiz86j7RjG+6Y4r7VQ@mail.gmail.com>
Date: Wed, 30 Nov 2011 15:57:51 +0100 (CET)
From: Markus =?iso-8859-1?Q?Wei=DFmann?= <markus.weissmann@in.tum.de>
To: "Roberto Di Cosmo" <roberto@dicosmo.org>
Cc: caml-list@inria.fr, roberto@dicosmo.org, marcod@di.unipi.it
User-Agent: SquirrelMail/1.4.13
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Subject: Re: [Caml-list] [ANN]: Parmap 0.9.8

Hi Roberto,

imho it would be beneficial to have a global properties that specifies how
many workers there should be; like

val set_number_of_workers : int -> unit

If you're on a octa-core (..) machine you probably always want to use 8
(..) workers so it would be nice to have this as a global property
instead.


cheers

-Markus

> Dear all,
>    a few lines to announce the (much improved) version 0.9.8 of Parmap,
> the
> minimalistic library introduced last August, and that can be useful to
> exploit
> your multicore processor with minimal modifications to your OCaml
> programs.
>
> You will find a full description in the README file, as well as
> several examples.
>
> The main changes are:
>
>  - all functions (parmap, parfold, parmapfold) accept an optional
> parameter
>    chunksize that allows to control granularity of the parallelism
>
>  - highly specialised functions are now present to work on arrays and
>    especially on (unboxed) float arrays
>
>  - configure and ocamlbuild harness
>
>  - documentation (just make doc)
>
>  - on Linux kernels, set_affinity is used to attach a worker to a given
> core
>
> Special thanks to: Jérôme Vouillon for highly efficient code for the float
> arrays; Paul Vernaza for clever suggestions on pre-allocation of
> memory buffers for float arrays; Pietro Abate for autoconf/ocamlbuild
> help;
> Francois Berenger for tests.
>
> Project home: https://gitorious.org/parmap
>
> Version 0.9.8 is cc8915bceb7d05c2c645587f5eaaf0f6cb6080c6
>
> To compile and install:
>
> git clone git://gitorious.org/parmap/parmap.git
> git checkout pipes
> aclocal -I m4
> autoconf
> ./configure
> make
> make install
>
> Enjoy
>
> -- Marco Danelutto and Roberto Di Cosmo
>
>
> =====================Copy of the README file======================
> Parmap in a nutshell
> --------------------
>
> Parmap is a minimalistic library allowing  to exploit multicore
> architecture for
> OCaml programs with minimal modifications: if you want to use your
> many cores to
> accelerate an   operation  which  happens   to be   a  map,  fold
> or map/fold
> (map-reduce),  just use  Parmap's  parmap, parfold  and parmapfold
> primitives in
> place  of the standard   List.map   and friends,  and  specify  the
> number   of
> subprocesses to use by the optional parameter ~ncores.
>
> See the example directory for a couple of running programs.
>
> DO'S and DONT'S
> ---------------
>
> Parmap is *not*  meant to be a replacement  for a full fledged
> implementation of
> parallelism skeletons  (map, reduce, pipe, and the  many others
> described in the
> scientific literature   since the end   of the  1980's,   much earlier
> than the
> specific   implementation by Google   engineers  that popularised
> them).  It  is
> meant,  instead, to allow you to  quickly leverage the  idle
> processing power of
> your extra cores, when handling some heavy computational load.
>
> The principle of parmap is very simple: when you call one of the three
> available
> primitives, map, fold, and  mapfold , your OCaml  sequential program forks
>  in n
> subprocesses (you choose the n), and each subprocess performs the
> computation on
> the 1/n of the data, in chunks  of a size you  can choose, returning the
> results
> through a shared memory area to the  parent process, that resumes
> execution once
> all the children have terminated, and the data has been recollected.
>
> You need  to run your  program on a single multicore  machine;  repeat
> after me:
> Parmap  is   not meant to   run  on a cluster,  see   one of the  many
> available
> (re)implementations of the map-reduce schema for that.
>
> By forking the parent process  on a sigle  machine, the children get
> access, for
> free, to all the data structures already built, even the imperative ones,
> and as
> far as your computation  inside the map/fold  does not produce side
> effects that
> need  to be  preserved, the  final result will   be the same  as
> performing the
> sequential operation, the only difference is that you might get it faster.
>
> The OCaml code is reasonably simple and only marginally relies on
> external C libraries:
> most of the magic is done by your operating system's fork and memory
> mapping mechanisms.
> One could gain some speed by implementing a marshal/unmarshal operation
> directly
> on bigarrays, but we did not do this yet.
>
> Of course, if you happen  to have open  channels, or files, or other
> connections
> that should only be  used by the parent  process, your program  may behave
> in  a
> very wierd way: as an example, *do  not* open a  graphic window before
> calling a
> Parmap primitive, and   *do   not*  use  this  library   if  your  program
>    is
> multi-threaded!
>
> Pinning processes to physical CPUs
> ----------------------------------
>
> To obtain maximum  speed,  Parmap tries to  pin  the worker processes to
> a CPU,
> using  the scheduler  affinity  interface  that is  available   in recent
> Linux
> kernels.   Similar functionality may be  obtained  on different platforms
> using
> slightly different API. Contributions  are welcome to  support those other
> APIs,
> just make sure that you use autoconf properly.
>
> Using Parmap with Ocamlnat
> --------------------------
>
> You can use Parmap in a native toplevel  (it may be quite useful  if you
> use the
> native toplevel to perform fast interactive computations), but remember
> that you
> need to load the .cmxs modules in it; an example is given in
> example/topnat.ml
>
> Preservation of output order in Parmap
> --------------------------------------
>
> If the number of chunks is equal to the number of cores, it is easy to
> preserve
> the order of the elements of the sequence passed to the map/fold
> operations, so
> the result will be a list with the same order as if the sequential
> function would
> be applied to the input. This is what the parmap, parmafold and
> parfold functions
> do when the chunksize argument is not used.
>
> If the user specifies a chunksize that is different from the number of
> cores,
> there is no general way to preserve the ordering, so the result of calling
> Parmap.parmap f l are not necessarily in the same order as List.map f l.
>
> In general, using little chunksize helps in balancing the load among
> the workers,
> and provides better speed, at the price of losing the ordering: there is a
> tradeoff, and it is up to the user to choose the solution that better
> suits him/her.
>
> Fast map on arrays and on float arrays
> --------------------------------------
>
> Visiting an array is much faster than visiting a list, and conversion
> of an array
> to and from a list is expensive, on large data structures, so we
> provide a specialised
> version of map on arrays, that beaves exactly like parmap.
>
> We also provide a highly optimised specialised parmap version that is
> targeted
> to float arrays, array_float_parmap, that allows you to perform parallel
> computation on very large float arrays efficiently, without the
> boxing/unboxing
> overhead introduced by the other primitives, including array_parmap.
>
> To understand the efficiency issues involved in the case of large
> arrays of float,
> here is a short summary of the steps that any implementation of a parallel
> map
> function must perform.
>
>   1) create a float array to hold the result of the computation.
>      This operation is expensive: on an Intel i7, creating a 10M float
> array
>      takes 50 milliseconds
>
>      ocamlnat
>           Objective Caml version 3.12.0 - native toplevel
>
>      # #load "unix.cmxs";;
>      # let d = Unix.gettimeofday() in ignore(Array.create 10000000
> 0.); Unix.gettimeofday() -. d;;
>      - : float = 0.0501301288604736328
>
>   2) create a shared memory area
>
>   3) possibly copy the result array to the shared memory area
>
>   4) perform the computation in the children writing the result in the
> shared memory area
>
>   5) possibly copy the result back to the OCaml array
>
> All implementations need to do 1, 2 and 4; steps 3 and/or 5 may be
> omitted depending on
> what the user wants to do with the result.
>
> The array_float_parmap performs steps 1,2,4 and 5. It is possible to share
> steps
> 1 and 2 among subsequent calls to the parallel function by
> preallocating the result
> array and the shared memory buffer, and passing them as optional
> parameters to the
> array_float_parmap function: this may save a significant amount of time if
> the
> array is very large.
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>


-- 
Markus Weißmann, M.Sc.
Institut für Informatik
Technische Universität München
Raum 03.07.054, Boltzmannstr. 3, 85748 Garching

Tel. +49 (89) 2 89-1 81 05
Fax +49 (89) 2 89-1 81 07

mailto:markus.weissmann@in.tum.de
http://www6.in.tum.de/