From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Original-To: Caml-list@yquem.inria.fr
Delivered-To: Caml-list@yquem.inria.fr
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78])
	by yquem.inria.fr (Postfix) with ESMTP id BB134BB81
	for <Caml-list@yquem.inria.fr>; Mon,  6 Mar 2006 14:08:08 +0100 (CET)
Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id k26D86Nl014417
	for <Caml-list@yquem.inria.fr>; Mon, 6 Mar 2006 14:08:07 +0100
Received: from [192.168.1.200] (ppp9-113.lns1.syd7.internode.on.net [59.167.9.113])
	by smtp3.adl2.internode.on.net (8.13.5/8.13.5) with ESMTP id k26D80MZ070033;
	Mon, 6 Mar 2006 23:38:00 +1030 (CST)
	(envelope-from skaller@users.sourceforge.net)
Subject: Re: [Caml-list] I'm sure it's been discussed a few times, but here
	we go.... single-precision floats
From: skaller <skaller@users.sourceforge.net>
To: Asfand Yar Qazi <email@asfandyar.cjb.net>
Cc: Caml-list@yquem.inria.fr
In-Reply-To: <440C29DC.4000701@asfandyar.cjb.net>
References: <440C29DC.4000701@asfandyar.cjb.net>
Content-Type: text/plain
Organization: Async P/L
Date: Tue, 07 Mar 2006 00:13:18 +1100
Message-Id: <1141650798.13144.138.camel@budgie.wigram>
Mime-Version: 1.0
X-Mailer: Evolution 2.4.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at nez-perce with ID 440C3436.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; caml-list:01 ocaml:01 reloaded:01 recursive:01 gcc:01 ocaml:01 gcc:01 stack:01 doubles:01 iterated:01 haskell:01 mlton:01 async:01 competitor:98 consultants:98 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3

On Mon, 2006-03-06 at 12:23 +0000, Asfand Yar Qazi wrote:

> All the OCaml discussions about floating point precision I have seen so far
> evolve around how fast operations are performed on them - but the critical
> thing for things like collision detection, etc. in games is the amount of data
> that can fit into the CPU cache and be operated on before the cache must be
> reloaded.  Obviously, twice as many single precision floats can fit into any
> CPU's cache than double precision floats.

No, it isn't obvious. I have some routines using the Taka algorithm
which is heavily recursive and therefore depends heavily on
cache use.

Code for gcc and Felix uses single precision floats.
It is competing with Ocaml .. which is not only using
double precision .. it might be boxing as well!???
[Perhaps gcc is passing the single precision floats
as double anyhow ..?]

http://felix.sourceforge.net/current/speed/en_flx_perf_0012.html

This is an older set of results. On my newer AMD64x2 3800,
gcc opt wins, but Ocaml is second.

The effect of cache usage ignoring FP calculation speed is
seen here:

http://felix.sourceforge.net/current/speed/en_flx_perf_0005.html

where Felix trashes everything by a clear margin simply because
it happens to use one less word on the stack than it's nearest
competitor.

Perhaps Xavier can explain how Ocaml manages to be so dang
fast on the FP stuff .. if you try C or Felix with
doubles they drop right off the chart.

> We're talking huge dynamic data structures with millions of floating point
> coordinates that all have to be iterated over many times a second - preferably
> by using multithreaded algorithms, so that multiple CPUs can be used
> efficiently. 

Ocaml doesn't currently permit multi-processing.
Felix does, it might be a better alternative for games
and possibly High Performance Computing apps.
Other options include Haskell and MLton.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net