On Tue, 19 May 2015 10:27:25 +0200 in <555AF3ED.9040205@gmail.com>, 
Leonardo Laguna Ruiz <modlfo@gmail.com> wrote:

> Hi Erkki,
 
> Targeting OpenCL would be very interesting. Is not yet in the roadmap 
> but I will keep an eye on it. I have access to a Parallella board
> where I can do testing but I haven't though too much about how Vult
> will express parallelization.

I've spent month hacking on Parallella. And parallel programming on
such network-on-chip coprocessor is still at its infancy.

For Vult, I think you should use wrapper around OpenCL primitives to
expose the low-level API to your higher level language. Keep in mind
that the OpenCL implementation for the Parallella epiphany is not 100%
compatible with the standard for good reasons : the mesh network
between the cores and the special instructions that permit to switch
between them at some specific execution points. 

Doing parallel programming on such hardware is really quit new and
the main work so far have to take a CPU intensive task and algo and
*smart* porting them to the epiphany topology with specifics
optimisations, for examples : 

http://www.adapteva.com/wp-content/uploads/2012/10/Scalable-Parallel-Multiplication-of-Big-Matrices.pdf

http://www.hh.se/download/18.4cc60a491424e61ad932fed/1385997469308/MCC134.3.pdf

So far, nobody have found a programming model to distribute the load
across the network topology and the core that will always be
optimized (and I doubt this will happen in a near future) and that's why
it's fun to play with it :)

I play with a geometricaly shaped load distribution of
commons executions patterns found in distributed algo for grid, just
because it's now possible to do for real in 2D :) And it sound promising
in term of performance with low overhead, but who cares ? :)

-- 
Jérôme Benoit aka fraggle
Piment Noir - http://piment-noir.org
OpenPGP Key ID : 9FE9161D
Key fingerprint : 9CA4 0249 AF57 A35B 34B3 AC15 FAA0 CB50 9FE9 161D