From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <examachine@gmail.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by yquem.inria.fr (Postfix) with ESMTP id 276AFBC37
	for <caml-list@yquem.inria.fr>; Wed, 16 Dec 2009 14:47:19 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AjABACtzKEvRVdrVkGdsb2JhbACQKIF1ghWGVz8BAQEBCQkMBxMDqSqBMYVSiGMBAgMFhCYE
X-IronPort-AV: E=Sophos;i="4.47,406,1257116400"; 
   d="scan'208";a="39992494"
Received: from mail-bw0-f213.google.com ([209.85.218.213])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 16 Dec 2009 14:47:15 +0100
Received: by bwz5 with SMTP id 5so697705bwz.3
        for <caml-list@yquem.inria.fr>; Wed, 16 Dec 2009 05:47:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:mime-version:received:in-reply-to:references
         :date:message-id:subject:from:to:cc:content-type
         :content-transfer-encoding;
        bh=rH7sjyCb0CAevtqtwRhbDXrR0TgPgq/X1Voo98Pw0lU=;
        b=qlYHJ0wjlzqOICnD7pJ7rN3orjhb/zPSZ0Dm/s74RcKOWpKITIsb8AXHl/PNgdpi4x
         y8y6EmQsVk5fkrh5ztuXRA7EKlJ83qL1KjN7iyzh9J/MoRiocBdl3Qfz78Z1NvFyLZti
         Wg2s3i1jH0BnfH+3mHm7zDnUyye5b3wXW2XAo=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type:content-transfer-encoding;
        b=SpBTtFVX/p905+JFOUHsHxFO7e/xdIUnBl3Qhem3mw0TFegRe8DcgynzH29Sr/twAk
         T/RjJG++0TMkwDfFIUMwlplYeU0ibmtbFQxToDwr6nX4f2b1InTHM3DR5saZxDzCEeRA
         pI5la7GvwYDi/g8Gj0sbLGSWhwInLS/143SoY=
MIME-Version: 1.0
Received: by 10.204.34.209 with SMTP id m17mr566345bkd.34.1260971234979; Wed, 
	16 Dec 2009 05:47:14 -0800 (PST)
In-Reply-To: <20091216134138.411322D08BF@kicki.hq.vtech>
References: <320e992a0912150737t483a5946x9cad351fc344691f@mail.gmail.com>
	 <4B27B981.80700@starynkevitch.net>
	 <200912160039.51086.jon@ffconsultancy.com>
	 <20091216134138.411322D08BF@kicki.hq.vtech>
Date: Wed, 16 Dec 2009 15:47:14 +0200
Message-ID: <320e992a0912160547q7969ed30i129e5798456fbf84@mail.gmail.com>
Subject: Re: [Caml-list] How to write a CUDA kernel in ocaml?
From: Eray Ozkural <examachine@gmail.com>
To: =?ISO-8859-1?Q?Mattias_Engdeg=E5rd?= <mattias@virtutech.se>
Cc: jon@ffconsultancy.com, caml-list@yquem.inria.fr
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Spam: no; 0.00; ocaml:01 eray:01 ozkural:01 mattias:01 mattias:01 gcc's:01 pointers:01 gcc:01 compilation:01 compilation:01 bytecode:01 compiler:01 bytecode:01 interfacing:01 compiler:01 

On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdeg=E5rd <mattias@virtutech.se>=
 wrote:
>>And trampolines to eliminate tail calls that cannot be eliminated using g=
oto.
>>However, trampolines are ~10x slower than TCO in the code gen.
>
> With some care, gcc's sibcall mechanism can be exploited. For example,
> by having one standard signature for all generated C functions, and
> taking care not to pass pointers to variables in the caller's stack
> frame. This should give fairly good performance (better than
> trampolines anyway), at the cost of portability (but gcc is good at
> that). It would give full TCO, even across compilation units. It
> should work well with a Cheney-on-the-MTA-style GC, too.
>
> How suitable it is depends on the reason why compilation to C is done in
> the first place. It might be one of:
>
> 1) portability to odd platforms with semi-decent performance (ie,
> =A0 better than interpreted bytecode)
> 2) a simple target for maintaining bootstrapping capability for the
> =A0 compiler (but bytecode works well for this too)
> 3) simpler (?) interfacing to libraries in C etc
> 4) flat-out maximum performance by exploiting the optimisations that
> =A0 modern C compilers are capable of
>
> Of course, these days we have llvm which has a lot going for it.

Well, the original question was to be able to use the CUDA or OpenCL
compiler on that generated C code.

Possible or impossible? :)

One trivial and low-performance solution that comes to mind is: make
an ocaml bytecode interpreter into a CUDA kernel and then pass the
bytecode to it, and then voila, at least we have some 512-way
parallelism on the GT300. How does that sound? We'd be losing some
performance but massive parallelism will cover up for some of that.

Best,

--=20
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct