From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jon@ffconsultancy.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id AFF4ABC6B
	for <caml-list@yquem.inria.fr>; Mon,  7 Jan 2008 21:06:36 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ao8CAG4VgkfUnw6Eg2dsb2JhbACCNo1eAQEBCAQGIgeXZg
X-IronPort-AV: E=Sophos;i="4.24,254,1196636400"; 
   d="scan'208";a="5808202"
Received: from pih-relay05.plus.net ([212.159.14.132])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 07 Jan 2008 21:06:36 +0100
Received: from [80.229.56.224] (helo=beast.local)
	 by pih-relay05.plus.net with esmtp (Exim) id 1JByEy-0000TH-Ra
	for caml-list@yquem.inria.fr; Mon, 07 Jan 2008 20:06:33 +0000
From: Jon Harrop <jon@ffconsultancy.com>
Organization: Flying Frog Consultancy Ltd.
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Performance questions, -inline, ...
Date: Mon, 7 Jan 2008 19:58:23 +0000
User-Agent: KMail/1.9.7
References: <200801031128.30183.ober.14@osu.edu> <200801071441.48212.jon@ffconsultancy.com> <200801071022.41044.ober.14@osu.edu>
In-Reply-To: <200801071022.41044.ober.14@osu.edu>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200801071958.23681.jon@ffconsultancy.com>
X-Spam: no; 0.00; -inline:01 high-level:01 hacked:01 runtime:01 ocaml:01 inference:01 syntax:01 camlp:01 run-time:01 polymorphism:01 higher-order:01 ocaml:01 gcc:01 compiler:01 compiler:01 

On Monday 07 January 2008 15:22:40 Kuba Ober wrote:
> On Monday 07 January 2008, Jon Harrop wrote:
> > You mean it might be possible to recover the performance of C from
> > numerical code with high-level abstractions? Yes. Indeed, I would like to
> > see this done. However, I've never heard of an implementation of any
> > language that can do this.
>
> g++ does that reasonably well. Heck, I have a severely hacked in-house LISP
> system which had grown out of an assembler-with-LISP-macros, and it does it
> all the time. It's a LISP with static types at runtime, but general
> LISPiness at compile time (macros just run in gcl). In fact I'm looking to
> port it to OCaml just because type inference and commonplace LISP syntax
> for type declarations don't mix too well - the code starts looking really
> ugly. Maybe I stuck too much with LISP's usual declaim-this-and-that
> approach, but that's what you get when you reuse most of underlying LISP's
> implementation.

Yes. If you're willing to do that kind of hacking then you could get a long 
way with camlp4 without too much trouble.

> > I assume you didn't mimic run-time polymorphism, currying, closures and
> > higher-order functions in the C though?
>
> No. But while those are very useful features to have, the code is messy if
> you've got two languages to work with (hi-perf stuff in C, everything else
> in OCaml).

On 64-bit there is no performance benefit in dropping to C.

> > Yes. Your "add1" function is entirely dead code and, therefore, is
> > trivially reducible. This is not true of real programs and, therefore,
> > your results will not be representative of real programs. Moreover, there
> > is no sense in trying to optimize a program that does nothing anyway.
>
> If you code such a thing in gcc, it correctly becomes a no-op. OCaml isn't
> all that clever ;)
>
> You're 100% academically-correct of course.

As Xavier says, "the OCaml compiler is designed to compile good code".

> > The simplest route to recovering C performance here is:
> >
> > . Inline "( +. )".
> > . Inline "op1".
> > . Type-specialize "op1".
> > . Hoist bounds checks.
>
> You mean "the programmer has to do this"? OCaml compiler is *really* bad,
> then.

You're 100% academically-correct of course.

In practice, the combination of OCaml's awesome native-code generation on 
AMD64 and the ease with which you can work around those missing optimizations 
mean that OCaml is *really* good compared to all other language 
implementations with comparable qualities. This is precisely why it is so 
popular.

> > There are some trade-offs here though. Stalin-compiled Scheme,
> > MLton-compiled SML and GHC-compiled Haskell do a lot more than OCaml in
> > theory but, in practice, they are much less useful for high-performance
> > numerics because their performance is so unpredictable.
>
> Good to know.

OCaml's code generator is also much better than MLton's and GHC's (even 6.8).

> > I've already spent a long time meticulously contructing benchmarks in
> > various different languages along these lines and the most important
> > optimizations missing from OCaml are:
> >
> > . Hoisting bounds checks.
> > . Common subexpression elimination.
> > . Type specialization.
> > . More aggressive inlining.
> > . Static optimization of div and mod.
> >
> > The nice thing about OCaml is that it has a superb 64-bit code generator
> > so, once you've done those optimizations on the hot paths manually, OCaml
> > lets you run code as quickly as C but with all of the high-level benefits
> > everywhere else.
> >
> > A perfect language implementation would certainly do these optimizations
> > for you and I wish OCaml did but I still haven't found anything better.
>
> After I gain some more experience I will hopefully add an OCaml-esque front
> end to my hackish compiler. It's completely unportable, only works on
> 12-bit-PIC-like architecture (SX from Parallax) and on Z8 Encore!, but hey,
> maybe it can be a good starting point to something bigger and better. The
> backend still has to be ported from Lisp, and I'm too time- and
> knowledge-constrained to do it just yet.
> ...

I see. If you're not using OCaml as a general purpose language then you don't 
need its other benefits (e.g. GUI libraries).

I'm in a similar situation. I'm toying with the idea of building a better FPL 
implementation for high-performance numerics that is commerce friendly using 
LLVM. If you take the path of least resistance then I believe that is 
tractable but you lose nice things like LablGTK2...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e