From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id AFF4ABC6B for ; Mon, 7 Jan 2008 21:06:36 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAG4VgkfUnw6Eg2dsb2JhbACCNo1eAQEBCAQGIgeXZg X-IronPort-AV: E=Sophos;i="4.24,254,1196636400"; d="scan'208";a="5808202" Received: from pih-relay05.plus.net ([212.159.14.132]) by mail2-smtp-roc.national.inria.fr with ESMTP; 07 Jan 2008 21:06:36 +0100 Received: from [80.229.56.224] (helo=beast.local) by pih-relay05.plus.net with esmtp (Exim) id 1JByEy-0000TH-Ra for caml-list@yquem.inria.fr; Mon, 07 Jan 2008 20:06:33 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Performance questions, -inline, ... Date: Mon, 7 Jan 2008 19:58:23 +0000 User-Agent: KMail/1.9.7 References: <200801031128.30183.ober.14@osu.edu> <200801071441.48212.jon@ffconsultancy.com> <200801071022.41044.ober.14@osu.edu> In-Reply-To: <200801071022.41044.ober.14@osu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801071958.23681.jon@ffconsultancy.com> X-Spam: no; 0.00; -inline:01 high-level:01 hacked:01 runtime:01 ocaml:01 inference:01 syntax:01 camlp:01 run-time:01 polymorphism:01 higher-order:01 ocaml:01 gcc:01 compiler:01 compiler:01 On Monday 07 January 2008 15:22:40 Kuba Ober wrote: > On Monday 07 January 2008, Jon Harrop wrote: > > You mean it might be possible to recover the performance of C from > > numerical code with high-level abstractions? Yes. Indeed, I would like to > > see this done. However, I've never heard of an implementation of any > > language that can do this. > > g++ does that reasonably well. Heck, I have a severely hacked in-house LISP > system which had grown out of an assembler-with-LISP-macros, and it does it > all the time. It's a LISP with static types at runtime, but general > LISPiness at compile time (macros just run in gcl). In fact I'm looking to > port it to OCaml just because type inference and commonplace LISP syntax > for type declarations don't mix too well - the code starts looking really > ugly. Maybe I stuck too much with LISP's usual declaim-this-and-that > approach, but that's what you get when you reuse most of underlying LISP's > implementation. Yes. If you're willing to do that kind of hacking then you could get a long way with camlp4 without too much trouble. > > I assume you didn't mimic run-time polymorphism, currying, closures and > > higher-order functions in the C though? > > No. But while those are very useful features to have, the code is messy if > you've got two languages to work with (hi-perf stuff in C, everything else > in OCaml). On 64-bit there is no performance benefit in dropping to C. > > Yes. Your "add1" function is entirely dead code and, therefore, is > > trivially reducible. This is not true of real programs and, therefore, > > your results will not be representative of real programs. Moreover, there > > is no sense in trying to optimize a program that does nothing anyway. > > If you code such a thing in gcc, it correctly becomes a no-op. OCaml isn't > all that clever ;) > > You're 100% academically-correct of course. As Xavier says, "the OCaml compiler is designed to compile good code". > > The simplest route to recovering C performance here is: > > > > . Inline "( +. )". > > . Inline "op1". > > . Type-specialize "op1". > > . Hoist bounds checks. > > You mean "the programmer has to do this"? OCaml compiler is *really* bad, > then. You're 100% academically-correct of course. In practice, the combination of OCaml's awesome native-code generation on AMD64 and the ease with which you can work around those missing optimizations mean that OCaml is *really* good compared to all other language implementations with comparable qualities. This is precisely why it is so popular. > > There are some trade-offs here though. Stalin-compiled Scheme, > > MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in > > theory but, in practice, they are much less useful for high-performance > > numerics because their performance is so unpredictable. > > Good to know. OCaml's code generator is also much better than MLton's and GHC's (even 6.8). > > I've already spent a long time meticulously contructing benchmarks in > > various different languages along these lines and the most important > > optimizations missing from OCaml are: > > > > . Hoisting bounds checks. > > . Common subexpression elimination. > > . Type specialization. > > . More aggressive inlining. > > . Static optimization of div and mod. > > > > The nice thing about OCaml is that it has a superb 64-bit code generator > > so, once you've done those optimizations on the hot paths manually, OCaml > > lets you run code as quickly as C but with all of the high-level benefits > > everywhere else. > > > > A perfect language implementation would certainly do these optimizations > > for you and I wish OCaml did but I still haven't found anything better. > > After I gain some more experience I will hopefully add an OCaml-esque front > end to my hackish compiler. It's completely unportable, only works on > 12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but hey, > maybe it can be a good starting point to something bigger and better. The > backend still has to be ported from Lisp, and I'm too time- and > knowledge-constrained to do it just yet. > ... I see. If you're not using OCaml as a general purpose language then you don't need its other benefits (e.g. GUI libraries). I'm in a similar situation. I'm toying with the idea of building a better FPL implementation for high-performance numerics that is commerce friendly using LLVM. If you take the path of least resistance then I believe that is tractable but you lose nice things like LablGTK2... -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e