On Apr 23, 2011, at 1:39 PM, Eray Ozkural wrote:

> On Sat, Apr 23, 2011 at 4:47 PM, Alexy Khrabrov <deliverable@gmail.com> wrote:
> 
> On Apr 23, 2011, at 6:17 AM, Eray Ozkural wrote:
> 
> > I don't really care what others say, but to prove that this has any performance value you should do the following:
> >
> >
> > Compare your most "parallel" algorithm with the performance of a corresponding well-written MPI application using openmpi's shared memory transport. If there is a difference, then your system has some value.
> >
> > Of course openmpi's shared memory transport is terribly buggy, but it should give a baseline acceptable performance.
> >
> > If there is no comparison, we have no idea.
> 
> The problem with "implement in MPI and compare" is that you have to rearchitect a sequential program for a totally different model.  By contrast, using shared memory parallelism, it's often a question of using pmap.
> 
> Incorrect. We always compare to sequential code in parallel computing. It's called "speedup".
> 
> And doubly incorrect because we are not comparing to sequential code but a claimed shared memory parallelism. It's only logical to compare two approaches on the same hardware.

I'm not claiming that something is correct or not -- I'm just saying that replacing map by pmap is easy, while rewriting in MPI style is complex.  Making a shared memory program out of sequential one might be this trivial, while MPI never will be; you have to program in message-passing style from the get-go, and preferably in Erlang or Scala actors with or without the  AKKA kernel and such or 0MQ, etc.

> 
> I really recommend everybody interested in parallelism to learn and try Clojure on a small problem.  You can replace a single map by pmap in a suitable setting and observe a not-quite-linear, but proportional speedup.
> 
> Of course functional programming fits such parallelism very well. It's a shame that ocaml does not have parallel functional primitives.
>  
> 
> I'd be really happy if OCaml gets the mechanisms from Clojure. 
> 
> 
> It'd be even better if such explicit parallelism had good compiler support, too :) I don't know much about Clojure, but I wouldn't use anything that runs on JVM for a parallel program. That might be like first turning your computer to Commodore 64 and then getting some speedup.

This is an obsolete urban legend.  JVM has the most mature GC out there and computational performance often on par with C when loaded and running.  In my social networking benchmark, the largest data-churning test ever for functional programming languages (http://functional.tv/), Clojure was only 2-3 times slower than OCaml and Haskell, and it's mostly due to slow Java serialization and deserialization.  My experience with Scala and Clojure tells me these are the best ways now to do shared memory parallelism for performance gains in a real-world manner (using many libraries).  BTW, Haskell then beat OCaml by a small margin, although using purely functional maps to OCaml's hash tables.  The Haskell folks keep improving their performance, although the GC then originally crashed under such an unexpected volume as a Twitter graph of 5 million users -- and was quickly fixed.  Still we had to strictify Haskell's core data structures, an exercise which made me go back to OCaml.  I finished my Twitter data mining Ph.D. in OCaml as the most practical way to handle the graph, filling up a 64 GB RAM server, yet it was only one core out of eight running, which is a pity.

Clojure's performance improves by leaps and bounds, e.g. using primitives as efficiently as in Java, and I think OCaml would benefit from a similar set of primitives -- then it would be the most practical ML-style FP language, the prize now in fact held by Scala.

-- Alexy