From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jon@ffconsultancy.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78])
	by yquem.inria.fr (Postfix) with ESMTP id 4871BBB94
	for <caml-list@yquem.inria.fr>; Tue,  4 Oct 2005 01:22:18 +0200 (CEST)
Received: from pih-relay05.plus.net (pih-relay05.plus.net [212.159.14.132])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j93NMHTC020910
	(version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO)
	for <caml-list@yquem.inria.fr>; Tue, 4 Oct 2005 01:22:17 +0200
Received: from [80.229.56.224] (helo=chetara)
	 by pih-relay05.plus.net with esmtp (Exim) id 1EMZdM-00016D-IJ
	for caml-list@yquem.inria.fr; Tue, 04 Oct 2005 00:22:12 +0100
From: Jon Harrop <jon@ffconsultancy.com>
Organization: Flying Frog Consultancy Ltd.
To: caml-list@yquem.inria.fr
Subject: Ray tracer language comparison
Date: Tue, 4 Oct 2005 00:18:24 +0100
User-Agent: KMail/1.7.2
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200510040018.24932.jon@ffconsultancy.com>
X-Miltered: at nez-perce with ID 4341BD29.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; run-time:01 ocamlopt:01 ocamlopt:01 verbose:01 ocaml:01 unboxing:01 ocaml:01 unboxing:01 denotes:01 denotes:01 transformed:01 vectors:01 inlining:01 -inline:01 radius:98 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3


I've updated my language comparison with four implementations in Scheme and 
one in Lisp:

  http://www.ffconsultancy.com/free/ray_tracer/languages.html

In short, Stalin's run-time performance is excellent (36% faster than 
ocamlopt) but its compile times are poor (2,000x slower than ocamlopt!) and 
SBCL-compiled Lisp is 6x slower than ocamlopt. Both Scheme and Lisp are >2x 
as verbose as OCaml.

Juho Snellman posted an interesting Lisp variant that used a macro to factor 
out unboxing, greatly reducing the stress on the GC and improving Lisp's 
performance to that of OCaml. Applying the same optimisation to the OCaml 
implementations makes them much faster again but I have yet to factor this 
out into a camlp4 macro.

This raises two interesting questions:

1. Can a camlp4 macro be written to factor out this unboxing transformation:

  let v = center -| orig in
  let b = dot v dir in
  let disc = b *. b -. dot v v +. radius *. radius in

where "+|" denotes vector addition and "*|" denotes scalar*vector 
multiplication. Transformed to:

  let vx = center.x -. orig.x in
  let vy = center.y -. orig.y in
  let vz = center.z -. orig.z in
  let b = vx *. dir.x +. vy *. dir.y +. vz *. dir.z in
  let vv = vx *. vx +. vy *. vy +. vz *. vz in
  let disc = b *. b -. vv +. radius *. radius in

So the intermediate vectors are not allocated and collected.

2. Manually inlining the code gives a huge performance improvement but using 
-inline does not appear to give this improvement. I assume ocamlopt can 
inline this but will not remove the boxing and subsequent unboxing. Is that 
right?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists