From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id F19A0BC37 for ; Wed, 16 Dec 2009 14:41:39 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvoAAMByKEs+FFrDmWdsb2JhbACbSAEBAQEBCAsKBxO5FIQrBA X-IronPort-AV: E=Sophos;i="4.47,406,1257116400"; d="scan'208";a="42133730" Received: from oden.vtab.com ([62.20.90.195]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 16 Dec 2009 14:41:39 +0100 Received: from oden.vtab.com (oden.vtab.com [127.0.0.1]) by oden.vtab.com (Postfix) with ESMTP id 7987526EF03; Wed, 16 Dec 2009 14:41:38 +0100 (CET) Received: from kicki.hq.vtech (kicki.e.vtech [10.40.0.23]) by oden.vtab.com (Postfix) with SMTP id 65D1226EEDD; Wed, 16 Dec 2009 14:41:38 +0100 (CET) Received: by kicki.hq.vtech (Postfix, from userid 7557) id 411322D08BF; Wed, 16 Dec 2009 14:41:38 +0100 (CET) From: "=?utf-8?b?TWF0dGlhcyBFbmdkZWfDpQ==?= =?utf-8?b?cmQ=?=" To: jon@ffconsultancy.com Cc: caml-list@yquem.inria.fr In-reply-to: <200912160039.51086.jon@ffconsultancy.com> (message from Jon Harrop on Wed, 16 Dec 2009 00:39:50 +0000) Subject: Re: [Caml-list] How to write a CUDA kernel in ocaml? References: <320e992a0912150737t483a5946x9cad351fc344691f@mail.gmail.com> <4B27B981.80700@starynkevitch.net> <017101ca7ddc$fb936b20$f2ba4160$@allsopp@metastack.com> <200912160039.51086.jon@ffconsultancy.com> Message-Id: <20091216134138.411322D08BF@kicki.hq.vtech> Date: Wed, 16 Dec 2009 14:41:38 +0100 (CET) X-Virus-Scanned: ClamAV using ClamSMTP X-Spam: no; 0.00; mattias:01 mattias:01 ocaml:01 gcc's:01 pointers:01 gcc:01 compilation:01 compilation:01 bytecode:01 compiler:01 bytecode:01 interfacing:01 stack:01 compilers:01 caml-list:01 >And trampolines to eliminate tail calls that cannot be eliminated using goto. >However, trampolines are ~10x slower than TCO in the code gen. With some care, gcc's sibcall mechanism can be exploited. For example, by having one standard signature for all generated C functions, and taking care not to pass pointers to variables in the caller's stack frame. This should give fairly good performance (better than trampolines anyway), at the cost of portability (but gcc is good at that). It would give full TCO, even across compilation units. It should work well with a Cheney-on-the-MTA-style GC, too. How suitable it is depends on the reason why compilation to C is done in the first place. It might be one of: 1) portability to odd platforms with semi-decent performance (ie, better than interpreted bytecode) 2) a simple target for maintaining bootstrapping capability for the compiler (but bytecode works well for this too) 3) simpler (?) interfacing to libraries in C etc 4) flat-out maximum performance by exploiting the optimisations that modern C compilers are capable of Of course, these days we have llvm which has a lot going for it.