From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id E5DFBBC37 for ; Thu, 17 Dec 2009 07:45:29 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar4BAMNhKUvRVdrVkGdsb2JhbACQKoF1ghWGYD8BAQEBCQkMBxMDqW6BMoVEiGMBAgMFhCgEhWM X-IronPort-AV: E=Sophos;i="4.47,410,1257116400"; d="scan'208";a="42191933" Received: from mail-bw0-f213.google.com ([209.85.218.213]) by mail1-smtp-roc.national.inria.fr with ESMTP; 17 Dec 2009 07:45:29 +0100 Received: by bwz5 with SMTP id 5so1286785bwz.3 for ; Wed, 16 Dec 2009 22:45:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=ZNBn2d8mFUhrJ2dcqeP+B/buQrm5eo5V74wNPcX8Rys=; b=CgyOfwDMx46j9O/ECn8CVOhWb5pHOVQlgssBM6vIOB8fcDgSxeRhhNImFnINSTAO1H mCxDRPTw3aBSJwKWoRLm08SJAAE4fiHpUU1p6Xk3be5yaGXBhhzmktc/BvlpLcZBH2WJ TAgsunL5NkEw4kGkImwTXcZ4hSFAc+dNg+rMU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=TGXE+s3Y5Jc9qDRqp9ypq7cuJ4VIeVbHe49VFbnmvP+PtB/BB0f4uWTGaXd8CXuDyq WPb1IRPvoEn6APVFKVf1PzoE2lCxP86Kb68c3U+rnjBTu+lHuA6IzDdJXsAaPd8e1IXb 0gb9nlmrw4lU8sSKG9LTS408Y8Pwx3VWg6GBE= MIME-Version: 1.0 Received: by 10.204.3.78 with SMTP id 14mr357604bkm.129.1261032328836; Wed, 16 Dec 2009 22:45:28 -0800 (PST) In-Reply-To: <4d1b2df20912161634w682bb804meb6c4090fa28284b@mail.gmail.com> References: <320e992a0912150737t483a5946x9cad351fc344691f@mail.gmail.com> <4B27B981.80700@starynkevitch.net> <200912160039.51086.jon@ffconsultancy.com> <20091216134138.411322D08BF@kicki.hq.vtech> <320e992a0912160547q7969ed30i129e5798456fbf84@mail.gmail.com> <4d1b2df20912161634w682bb804meb6c4090fa28284b@mail.gmail.com> Date: Thu, 17 Dec 2009 08:45:28 +0200 Message-ID: <320e992a0912162245u5ce780f2ke838bff2ff378321@mail.gmail.com> Subject: Re: [Caml-list] How to write a CUDA kernel in ocaml? From: Eray Ozkural To: caml-list@yquem.inria.fr Content-Type: text/plain; charset=ISO-8859-1 X-Spam: no; 0.00; ocaml:01 eray:01 ozkural:01 eray:01 ozkural:01 trivial:01 ocaml:01 bytecode:01 bytecode:01 parallelism:01 parallelism:01 computations:01 high-level:01 compiler:01 hypothetical:01 On Thu, Dec 17, 2009 at 2:34 AM, Philippe Wang wrote: > On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural wrote: > >> One trivial and low-performance solution that comes to mind is: make >> an ocaml bytecode interpreter into a CUDA kernel and then pass the >> bytecode to it, and then voila, at least we have some 512-way >> parallelism on the GT300. How does that sound? We'd be losing some >> performance but massive parallelism will cover up for some of that. > > > With parallel processors, you move very quickly the performance > bottleneck from processor(s) to memory bandwidth, such that > - it's hell to program because you have to manage concurrency and it > has a real cost > - it's useful for very specific programs that have very few memory > access compared to processor computations (such as some compression > algorithms, a more specific and very easy to write example is matrix > multiplications). > > Imagine you have 3000MHz for memory bandwidth, which is extremely good > today (I think). And imagine you have 100 processors that share this > memory bandwidth. If they all want to access memory at the same time, > even if you forget the concurrency management cost, you have > 3000/100MHz/processor=30MHz/processor, which is very very very low. So > think about 10 processors instead of 100 to be more realistic, it's > still 300MHz/processor, which looks like what we had about a decade > ago... > > (IMHO) A not-too-too-bad-but-still-realistic way to take benefit of > GPUs today, with OCaml (or any high-level language), is to write > computation functions in C (possibly with some assembly), and to write > composition functions in OCaml. Or (less realistic in a short amount > of time) maybe to write a compiler that may do the job for you, but > it's not quite easy... > > Good luck, First, the GT300 will have great memory bandwidth, probably 256 GB/s. Half a gig/sec per core, not bad I think. With a smart ocaml bytecode interpreter, we could derive some performance from this (hypothetical yet!) baby. GT300 is great, it makes the reconfigurable computing project I worked on mostly obsolete =) Of course, you are right that the "memory wall" is a serious obstacle to *any* parallel architecture, not just this architecture or that architecture. I didn't read very thoroughly but in the Fermi architecture the caches and local memory in NVIDIA naturally have severe limitations. You have 512 cores. You can't give each huge caches. In the context of the following comments assume that a PRAM algorithm is given. Obviously, we may expect the performance of a memory bound algorithm to suffer in *both* multicore architectures and GPU's (that's where reconfigurable computing might take over...). However, if the algorithm is compute-bound, and can do with a moderate memory bandwidth per processor, then I think this becomes an ideal architecture. Not necessarily "embarrassingly parallel" algorithms, but as seen in the CUDA pages of NVIDIA, those will work great! My application is a perfect match for NVIDIA. It needs just 1-2 mb storage per processor. And it spends more time computing than accessing memory, so I think it will do well. The ocaml bytecode interpreter is written in C. For a baseline implementation we could try to port this to OpenCL. http://caml.inria.fr/cgi-bin/viewcvs.cgi/ocaml/trunk/byterun/ Would be a cool experiment at least =) What I want to do is to run the ocaml bytecode interpreter on each core, and then feed the relevant bytecode to those. It can be done, I suppose? Or am I missing something crucial? :) The runtime library would have to be ported to OpenCL/CUDA, as well, isn't that possible? Best, PS: Sorry for having mailed this to you personally, I intended to post it to the mailing list. -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct