From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jon@ffconsultancy.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38])
	by yquem.inria.fr (Postfix) with ESMTP id 9B9E2BC69
	for <caml-list@yquem.inria.fr>; Thu, 31 May 2007 08:22:31 +0200 (CEST)
Received: from pih-relay06.plus.net (pih-relay06.plus.net [212.159.14.133])
	by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l4V6MUjP017791
	(version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO)
	for <caml-list@yquem.inria.fr>; Thu, 31 May 2007 08:22:31 +0200
Received: from [80.229.56.224] (helo=beast.local)
	 by pih-relay06.plus.net with esmtp (Exim) id 1Hte3K-0003C9-Fk
	for caml-list@yquem.inria.fr; Thu, 31 May 2007 07:22:30 +0100
From: Jon Harrop <jon@ffconsultancy.com>
Organization: Flying Frog Consultancy Ltd.
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics
Date: Thu, 31 May 2007 07:17:01 +0100
User-Agent: KMail/1.9.7
References: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com>
In-Reply-To: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200705310717.01553.jon@ffconsultancy.com>
X-Miltered: at discorde with ID 465E69A6.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.01; ocaml:01 ocaml:01 sml:01 numerically:01 ocamlopt:01 unrolling:01 60%:98 img:98 img':98 img:98 height:98 1.0:98 frog:98 compilers:01 wrote:01 

On Thursday 31 May 2007 06:50:05 Yuanchen Zhu wrote:
> The performance numbers were as following:
>
> Ocaml (unsafe) : user: 39.674s, real: 41.356s
> MLton (safe):  user:  17.981s, real: 21.968s

You may be interested to know that there are no optimizing SML compilers for 
AMD64, which is a much better platform for numerical work:

  http://www.ffconsultancy.com/languages/ray_tracer/results.html

OCaml is over 60% faster on this benchmark.

Having said that, I notice that twice as many people are downloading the x86 
demos on my site compared to the x64.

> let hconvolve kern (img:t) r =
>   let sup = Array.length kern - 1 in
>   let img' = create (size img) in
>     for y = 0 to height img - 1 do
>       for x = 0 to width img - 1 do
>         let s = ref 0.0 in
>           for i = 0 to sup do
>             let (kx, ky) = kern.(i) in
>               s := !s +. ky *. getReflected img y (x + kx) 1.0 r

I can think of various ways to rearrange this that might help performance.

> The new running time is:
>
> Ocaml (unsafe) : user: 21.477s, real: 23.366s

What is the running time for safe OCaml?

> which is much in line with MLton:
>
> MLton (safe):  user:  17.981s, real: 21.968s

What platforms and architectures did you benchmark on? May we have the code to 
benchmark it ourselves?

> Although note that the MLton version has array-bound check enabled and
> used the two-line high order function version of hconvolve.

You might also try an FFT-based convolution if your filter is dense.

> So the moral of the story: To use Ocaml for numerically intensive
> work, code in C style in the inner loops.

Absolutely.

> This brings me to the next question: is there any plan to implement
> type specialization optimization for ocamlopt? For numerics, this is
> really crucial if you want write both in an elegant functional style
> and get good performance. Also, I remember reading somewhere that the
> current code base of Ocaml is ill-suited for implementing this kind of
> optimization. May I ask what exactly needs to be done to the current
> code base in order to support that? I have some compiler-writing
> background and this sounds like an interesting project to work in my
> past time.

Writing OCaml programs that generate OCaml programs is by far your best bet 
here. We use a replacement standard library that uses autogenerated code to 
eliminate boxing and perform unrolling and type specialization where 
possible.

As I can autogenerate my code, I would much rather the OCaml developers 
concentrated on things that I cannot get around, like the lack of a 32-bit 
float storage type and a more efficient internal representation of complex 
numbers.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e