From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 276AFBC37 for ; Wed, 16 Dec 2009 14:47:19 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjABACtzKEvRVdrVkGdsb2JhbACQKIF1ghWGVz8BAQEBCQkMBxMDqSqBMYVSiGMBAgMFhCYE X-IronPort-AV: E=Sophos;i="4.47,406,1257116400"; d="scan'208";a="39992494" Received: from mail-bw0-f213.google.com ([209.85.218.213]) by mail3-smtp-sop.national.inria.fr with ESMTP; 16 Dec 2009 14:47:15 +0100 Received: by bwz5 with SMTP id 5so697705bwz.3 for ; Wed, 16 Dec 2009 05:47:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=rH7sjyCb0CAevtqtwRhbDXrR0TgPgq/X1Voo98Pw0lU=; b=qlYHJ0wjlzqOICnD7pJ7rN3orjhb/zPSZ0Dm/s74RcKOWpKITIsb8AXHl/PNgdpi4x y8y6EmQsVk5fkrh5ztuXRA7EKlJ83qL1KjN7iyzh9J/MoRiocBdl3Qfz78Z1NvFyLZti Wg2s3i1jH0BnfH+3mHm7zDnUyye5b3wXW2XAo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=SpBTtFVX/p905+JFOUHsHxFO7e/xdIUnBl3Qhem3mw0TFegRe8DcgynzH29Sr/twAk T/RjJG++0TMkwDfFIUMwlplYeU0ibmtbFQxToDwr6nX4f2b1InTHM3DR5saZxDzCEeRA pI5la7GvwYDi/g8Gj0sbLGSWhwInLS/143SoY= MIME-Version: 1.0 Received: by 10.204.34.209 with SMTP id m17mr566345bkd.34.1260971234979; Wed, 16 Dec 2009 05:47:14 -0800 (PST) In-Reply-To: <20091216134138.411322D08BF@kicki.hq.vtech> References: <320e992a0912150737t483a5946x9cad351fc344691f@mail.gmail.com> <4B27B981.80700@starynkevitch.net> <200912160039.51086.jon@ffconsultancy.com> <20091216134138.411322D08BF@kicki.hq.vtech> Date: Wed, 16 Dec 2009 15:47:14 +0200 Message-ID: <320e992a0912160547q7969ed30i129e5798456fbf84@mail.gmail.com> Subject: Re: [Caml-list] How to write a CUDA kernel in ocaml? From: Eray Ozkural To: =?ISO-8859-1?Q?Mattias_Engdeg=E5rd?= Cc: jon@ffconsultancy.com, caml-list@yquem.inria.fr Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; ocaml:01 eray:01 ozkural:01 mattias:01 mattias:01 gcc's:01 pointers:01 gcc:01 compilation:01 compilation:01 bytecode:01 compiler:01 bytecode:01 interfacing:01 compiler:01 On Wed, Dec 16, 2009 at 3:41 PM, Mattias Engdeg=E5rd = wrote: >>And trampolines to eliminate tail calls that cannot be eliminated using g= oto. >>However, trampolines are ~10x slower than TCO in the code gen. > > With some care, gcc's sibcall mechanism can be exploited. For example, > by having one standard signature for all generated C functions, and > taking care not to pass pointers to variables in the caller's stack > frame. This should give fairly good performance (better than > trampolines anyway), at the cost of portability (but gcc is good at > that). It would give full TCO, even across compilation units. It > should work well with a Cheney-on-the-MTA-style GC, too. > > How suitable it is depends on the reason why compilation to C is done in > the first place. It might be one of: > > 1) portability to odd platforms with semi-decent performance (ie, > =A0 better than interpreted bytecode) > 2) a simple target for maintaining bootstrapping capability for the > =A0 compiler (but bytecode works well for this too) > 3) simpler (?) interfacing to libraries in C etc > 4) flat-out maximum performance by exploiting the optimisations that > =A0 modern C compilers are capable of > > Of course, these days we have llvm which has a lot going for it. Well, the original question was to be able to use the CUDA or OpenCL compiler on that generated C code. Possible or impossible? :) One trivial and low-performance solution that comes to mind is: make an ocaml bytecode interpreter into a CUDA kernel and then pass the bytecode to it, and then voila, at least we have some 512-way parallelism on the GT300. How does that sound? We'd be losing some performance but massive parallelism will cover up for some of that. Best, --=20 Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct