From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jon@ffconsultancy.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38])
	by yquem.inria.fr (Postfix) with ESMTP id 158D6BC69
	for <caml-list@yquem.inria.fr>; Fri,  9 Mar 2007 19:56:43 +0100 (CET)
Received: from pih-relay05.plus.net (pih-relay05.plus.net [212.159.14.132])
	by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l29IugNi028371
	(version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO)
	for <caml-list@yquem.inria.fr>; Fri, 9 Mar 2007 19:56:42 +0100
Received: from [80.229.56.224] (helo=beast.local)
	 by pih-relay05.plus.net with esmtp (Exim) id 1HPkGf-0004TZ-RW
	for caml-list@yquem.inria.fr; Fri, 09 Mar 2007 18:56:42 +0000
From: Jon Harrop <jon@ffconsultancy.com>
Organization: Flying Frog Consultancy Ltd.
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Interactive technical computing
Date: Fri, 9 Mar 2007 18:50:52 +0000
User-Agent: KMail/1.9.5
References: <3D1E4D9CA9BCE04D8F2B55F203AE4CE30666AB87@selma.roomandboard.com>
In-Reply-To: <3D1E4D9CA9BCE04D8F2B55F203AE4CE30666AB87@selma.roomandboard.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200703091850.52534.jon@ffconsultancy.com>
X-Miltered: at discorde with ID 45F1ADEA.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; ocaml's:01 bytecode:01 ocaml:01 ocaml:01 bytecode:01 native-code:01 fft:01 fft:01 sqrt:01 ocamlopt:01 ocamlopt:01 run-time:01 rewriting:01 stdlib:01 struct:01 

On Friday 09 March 2007 14:13, Robert Fischer wrote:
> Performance of Ocaml's bytecode is slower than F#?  Really?

Sum 1/x for x in [1 .. 10^6]. In OCaml:

time (Array.fold_left (+.) 0.)
  (Array.init 1000000 (fun i -> 1. /. float(i+1)));;

In F#:

time (Array.fold_left (+) 0.)
  (Array.map ((/) 1.) [|1. .. 1000000.|]);;

OCaml takes 0.256s, F# takes only 0.047s => F# more than 5x faster than OCaml 
bytecode.


On numerical code, F# can be faster than native-code compiled OCaml. For 
example, a naive 2^n FFT:

let fft a =
  let n = Array.length a in 
  let j = ref 0 in
  for i = 0 to n-2 do
    if i < !j then
      (let t = a.(!j) in
       a.(!j) <- a.(i);
       a.(i) <- t);
    let m = ref (n/2) in
    while !m <= !j do
      j := !j - !m;
      m := !m/2
    done;
    j := !j + !m
  done;
  let j = ref 1 in
  let b = ref (neg c1) in
  while !j<n do
    let w = ref c1 in
    for m = 0 to !j-1 do
      let i = ref m in
      while !i < n do 
	let t = !w *@ a.(!i + !j) in
	a.(!i + !j) <- a.(!i) -@ t;
	a.(!i) <- a.(!i) +@ t;
	w := !w *@ !b;
	i := !i + !j + !j
      done
    done;
    j := !j + !j;
    b := sqrt !b
  done;
  a;;

n=1<<15
ocamlopt: 0.164s
F#: 0.134s

F# is 22% faster, probably thanks to complex numbers in the language.


Discrete wavelet transform (4-tap Daubechies, n=2^20), OCaml is 25% faster:

ocamlopt: 2.03s
F#: 2.53s

Note that this is apples and oranges because my Windows environment is still 
only 32 bit. If .NET shows the same performance improvement moving from 

> From what I understand, F# has a major performance issue resulting from the
> way the .Net VM handles allocation.  Is that old info? 

Allocation is slower in F# for two main reasons:

1. The run-time is optimised for C# code that has quite different expected 
value lifetimes (far fewer very short-lived objects compared to F#).

2. F# supports concurrency, which incurs a big performance cost in allocation 
and GC.

but the consequence of this is that very allocation-heavy code (e.g. symbolic 
rewriting) is up to 4x slower in F#. Lists are also more heavyweight in F#.

However, F# regains a lot of performance by having a much faster stdlib.

For example, creating a 2^16-element set is an allocation-intensive task. In 
OCaml:

# module Int = struct
    type t = int
    let compare = compare
  end;;
# module IntSet = Set.Make(Int);;
...
# time (Array.fold_right IntSet.add (Array.init 65536 (fun i -> i)))
    IntSet.empty;;
0.744047s

Compiled with ocamlopt I get 0.065s.

For F# I get 0.240s. That's 3.7x slower than native-code OCaml.

However, it is worth noting that the F# equivalent is much more concise, just:

  time (Array.fold_right Set.add [|0 .. 65535|])
    Set.empty;;

primarily because no functors are involved when making a set. The comparison 
function is taken from the element type.

> > I've got a killer high-performance 2D and 3D visualization library
> > written in OCaml and I'd like to sell it, but I don't want to sell the
> > source code because I value it too much. What can I do? Well, I can port
> > it to F# and sell it there. In the mean time, OCaml users are stuck with
> > GNUPlot.
>
> Do you have metrics showing that performance is better with F# than OCaml
> in these two cases?

In theory, performance should be very close because so much work is done by 
the graphics card and not the CPU. In practice, I only just figured out how 
to render static geometry optimally from DirectX, so my F# version still 
sucks.

Once I've integrated that into my purely-functional scene graph I'll let you 
know what the performance is like. I expect F# to win because I'm exploiting 
concurrency and the application is more numerical than allocation intensive.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists