Re: [Caml-list] How to write a CUDA kernel in ocaml?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Eray Ozkural <examachine@gmail.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] How to write a CUDA kernel in ocaml?
Date: Thu, 17 Dec 2009 08:45:28 +0200	[thread overview]
Message-ID: <320e992a0912162245u5ce780f2ke838bff2ff378321@mail.gmail.com> (raw)
In-Reply-To: <4d1b2df20912161634w682bb804meb6c4090fa28284b@mail.gmail.com>

On Thu, Dec 17, 2009 at 2:34 AM, Philippe Wang
<philippe.wang.lists@gmail.com> wrote:
> On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural <examachine@gmail.com> wrote:
>
>> One trivial and low-performance solution that comes to mind is: make
>> an ocaml bytecode interpreter into a CUDA kernel and then pass the
>> bytecode to it, and then voila, at least we have some 512-way
>> parallelism on the GT300. How does that sound? We'd be losing some
>> performance but massive parallelism will cover up for some of that.
>
>
> With parallel processors, you move very quickly the performance
> bottleneck from processor(s) to memory bandwidth, such that
> - it's hell to program because you have to manage concurrency and it
> has a real cost
> - it's useful for very specific programs that have very few memory
> access compared to processor computations (such as some compression
> algorithms, a more specific and very easy to write example is matrix
> multiplications).
>
> Imagine you have 3000MHz for memory bandwidth, which is extremely good
> today (I think). And imagine you have 100 processors that share this
> memory bandwidth. If they all want to access memory at the same time,
> even if you forget the concurrency management cost, you have
> 3000/100MHz/processor=30MHz/processor, which is very very very low. So
> think about 10 processors instead of 100 to be more realistic, it's
> still 300MHz/processor, which looks like what we had about a decade
> ago...
>
> (IMHO) A not-too-too-bad-but-still-realistic way to take benefit of
> GPUs today, with OCaml (or any high-level language), is to write
> computation functions in C (possibly with some assembly), and to write
> composition functions in OCaml. Or (less realistic in a short amount
> of time) maybe to write a compiler that may do the job for you, but
> it's not quite easy...
>
> Good luck,

First, the GT300 will have great memory bandwidth, probably 256 GB/s.
Half a gig/sec per core, not bad I think. With a smart ocaml bytecode
interpreter, we could derive some performance from this (hypothetical
yet!) baby. GT300 is great, it makes the reconfigurable computing
project I worked on mostly obsolete =)

Of course, you are right that the "memory wall" is a serious
obstacle to *any* parallel architecture, not just this architecture or
that architecture. I didn't read very thoroughly but in the Fermi
architecture the caches and local memory in NVIDIA naturally have
severe limitations. You have 512 cores. You can't give each huge
caches.

In the context of the following comments assume that a PRAM algorithm is given.

Obviously, we may expect the performance of a memory bound algorithm
to suffer in *both* multicore architectures and GPU's (that's where
reconfigurable computing might take over...).

However, if the algorithm is compute-bound, and can do with a moderate
memory bandwidth per processor, then I think this becomes an ideal
architecture. Not necessarily "embarrassingly parallel" algorithms,
but as seen in the CUDA pages of NVIDIA, those will work great!

My application is a perfect match for NVIDIA. It needs just 1-2 mb storage per
processor. And it spends more time computing than accessing memory, so
I think it will do well.

The ocaml bytecode interpreter is written in C. For a baseline
implementation we could try to port this to OpenCL.
http://caml.inria.fr/cgi-bin/viewcvs.cgi/ocaml/trunk/byterun/

Would be a cool experiment at least =)

What I want to do is to run the ocaml bytecode interpreter on each core, and
then feed the relevant bytecode to those. It can be done, I suppose? Or am I
missing something crucial? :) The runtime library would have to be ported to
OpenCL/CUDA, as well, isn't that possible?

Best,

PS: Sorry for having mailed this to you personally, I intended to post
it to the
mailing list.

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

next prev parent reply	other threads:[~2009-12-17  6:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-15 15:37 Eray Ozkural
2009-12-15 16:07 ` [Caml-list] " Basile STARYNKEVITCH
2009-12-15 16:20   ` Eray Ozkural
2009-12-15 16:29     ` Basile STARYNKEVITCH
2009-12-15 17:46       ` Eray Ozkural
2009-12-15 23:18       ` David Allsopp
2009-12-16  0:39         ` Jon Harrop
2009-12-16 13:41           ` Mattias Engdegård
2009-12-16 13:47             ` Eray Ozkural
2009-12-17  0:34               ` Philippe Wang
2009-12-17  6:45                 ` Eray Ozkural [this message]
2009-12-17 10:59                   ` Philippe Wang
2010-01-12  6:15             ` Eray Ozkural
2009-12-16  6:26         ` Basile STARYNKEVITCH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=320e992a0912162245u5ce780f2ke838bff2ff378321@mail.gmail.com \
    --to=examachine@gmail.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).