I'm sure it's been discussed a few times, but here we go.... single-precision floats

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* I'm sure it's been discussed a few times, but here we go.... single-precision floats
@ 2006-03-06 12:23 Asfand Yar Qazi
  2006-03-06 13:13 ` [Caml-list] " skaller
  0 siblings, 1 reply; 5+ messages in thread
From: Asfand Yar Qazi @ 2006-03-06 12:23 UTC (permalink / raw)
  To: Caml-list

Hi,

I recently performed some tests on GNU C++, and found that (for a small fast
fourier transform operation anyway) double precision out-performs single
precision floating point.

However, on the Ogre 3D engine forum, I had a lengthy discussion and a
conclusion was reached that the reason single-precision floats are preferred
for games (and I assume all high-performance applications that require huge
amounts of data) is that more of them can fit into the cache of the processor.

All the OCaml discussions about floating point precision I have seen so far
evolve around how fast operations are performed on them - but the critical
thing for things like collision detection, etc. in games is the amount of data
that can fit into the CPU cache and be operated on before the cache must be
reloaded.  Obviously, twice as many single precision floats can fit into any
CPU's cache than double precision floats.

We're talking huge dynamic data structures with millions of floating point
coordinates that all have to be iterated over many times a second - preferably
by using multithreaded algorithms, so that multiple CPUs can be used
efficiently.  Since doing this sort of work (i.e. parallel computing) in C++
is a pain in the **** ('scuse my French :-), I want to learn a language that
will make it easy and less error-prone - hence my study of OCaml.

So, is there any way (I'm thinking similar to 'nativeint') to use floats in
OCaml to maximize the data that can be stored and operated on in the CPUs 
cache such that system memory is accessed as little as possible?

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats
  2006-03-06 12:23 I'm sure it's been discussed a few times, but here we go.... single-precision floats Asfand Yar Qazi
@ 2006-03-06 13:13 ` skaller
  2006-03-06 17:01   ` Asfand Yar Qazi
  0 siblings, 1 reply; 5+ messages in thread
From: skaller @ 2006-03-06 13:13 UTC (permalink / raw)
  To: Asfand Yar Qazi; +Cc: Caml-list

On Mon, 2006-03-06 at 12:23 +0000, Asfand Yar Qazi wrote:

> All the OCaml discussions about floating point precision I have seen so far
> evolve around how fast operations are performed on them - but the critical
> thing for things like collision detection, etc. in games is the amount of data
> that can fit into the CPU cache and be operated on before the cache must be
> reloaded.  Obviously, twice as many single precision floats can fit into any
> CPU's cache than double precision floats.

No, it isn't obvious. I have some routines using the Taka algorithm
which is heavily recursive and therefore depends heavily on
cache use.

Code for gcc and Felix uses single precision floats.
It is competing with Ocaml .. which is not only using
double precision .. it might be boxing as well!???
[Perhaps gcc is passing the single precision floats
as double anyhow ..?]

http://felix.sourceforge.net/current/speed/en_flx_perf_0012.html

This is an older set of results. On my newer AMD64x2 3800,
gcc opt wins, but Ocaml is second.

The effect of cache usage ignoring FP calculation speed is
seen here:

http://felix.sourceforge.net/current/speed/en_flx_perf_0005.html

where Felix trashes everything by a clear margin simply because
it happens to use one less word on the stack than it's nearest
competitor.

Perhaps Xavier can explain how Ocaml manages to be so dang
fast on the FP stuff .. if you try C or Felix with
doubles they drop right off the chart.

> We're talking huge dynamic data structures with millions of floating point
> coordinates that all have to be iterated over many times a second - preferably
> by using multithreaded algorithms, so that multiple CPUs can be used
> efficiently. 

Ocaml doesn't currently permit multi-processing.
Felix does, it might be a better alternative for games
and possibly High Performance Computing apps.
Other options include Haskell and MLton.

-- 
John Skaller <skaller at users dot sourceforge dot net>
Async PL, Realtime software consultants
Checkout Felix: http://felix.sourceforge.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats
  2006-03-06 13:13 ` [Caml-list] " skaller
@ 2006-03-06 17:01   ` Asfand Yar Qazi
  0 siblings, 0 replies; 5+ messages in thread
From: Asfand Yar Qazi @ 2006-03-06 17:01 UTC (permalink / raw)
  To: Caml-list

skaller wrote:
> On Mon, 2006-03-06 at 12:23 +0000, Asfand Yar Qazi wrote:
> 
> 
>>All the OCaml discussions about floating point precision I have seen so far
>>evolve around how fast operations are performed on them - but the critical
>>thing for things like collision detection, etc. in games is the amount of data
>>that can fit into the CPU cache and be operated on before the cache must be
>>reloaded.  Obviously, twice as many single precision floats can fit into any
>>CPU's cache than double precision floats.
> 
> 
> No, it isn't obvious. I have some routines using the Taka algorithm
> which is heavily recursive and therefore depends heavily on
> cache use.
> 
> Code for gcc and Felix uses single precision floats.
> It is competing with Ocaml .. which is not only using
> double precision .. it might be boxing as well!???
> [Perhaps gcc is passing the single precision floats
> as double anyhow ..?]
> 
> http://felix.sourceforge.net/current/speed/en_flx_perf_0012.html
> 
> This is an older set of results. On my newer AMD64x2 3800,
> gcc opt wins, but Ocaml is second.

Point taken - but I'm not experienced enough to comment - perhaps John Carmack 
would be a good person to contacta about this :-)

> 
> The effect of cache usage ignoring FP calculation speed is
> seen here:
> 
> http://felix.sourceforge.net/current/speed/en_flx_perf_0005.html
> 
> where Felix trashes everything by a clear margin simply because
> it happens to use one less word on the stack than it's nearest
> competitor.
> 
> Perhaps Xavier can explain how Ocaml manages to be so dang
> fast on the FP stuff .. if you try C or Felix with
> doubles they drop right off the chart.
>
> 
>>We're talking huge dynamic data structures with millions of floating point
>>coordinates that all have to be iterated over many times a second - preferably
>>by using multithreaded algorithms, so that multiple CPUs can be used
>>efficiently. 
> 
> 
> Ocaml doesn't currently permit multi-processing.
> Felix does, it might be a better alternative for games
> and possibly High Performance Computing apps.
> Other options include Haskell and MLton.
> 

Ah, didn't know that.  On some comments made by Tim Sweeny (Epic Games of 
Unreal Engine fame), I thought functional programming make multi-threaded 
coding easy.

I'm referring to some slides found here: 
http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/sweeny.pdf

Here's what he says about 'Performance' for example:
§ When updating 10,000 objects at 60 FPS,
everything is performance-sensitive
§ But:
– Productivity is just as important
– Will gladly sacrifice 10% of our performance
for 10% higher productivity
– We never use assembly language
§ There is not a simple set of “hotspots” to
optimize!
That’s all!

Somebody commented about it here 
(http://www.gamedev.net/community/forums/topic.asp?topic_id=373751) by saying:
The bottom line is that C/C++ is becoming way too costly to be used in the 
future. Both in terms of money (bugs cost money) but also in terms of 
performance. It's very, very hard to leverage multiple threads in C/C++, and 
it's only getting worse as we get more cores in our systems -- in contrast, 
concurrency in languages such as Haskell is almost as simple as 
single-threaded programming (due to it being purely functional, and having STM).

That's why I thought about learning OCaml - I thought it had the same 
functional programming ability as Haskell, but with more (non-functional) 
features.

But I still want to learn it, it seems a thousand light years different (in a 
good way) than C++.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats
  2006-03-06 15:22 Jonathan Harrop
@ 2006-03-06 16:34 ` Brian Hurt
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Hurt @ 2006-03-06 16:34 UTC (permalink / raw)
  To: Jonathan Harrop; +Cc: Caml-list, Asfand Yar Qazi

On Mon, 6 Mar 2006, Jonathan Harrop wrote:

> OCaml's advantages center around the ability to design and use 
> sophisticated data structures easily - the precise opposite of iterating 
> over long arrays.

Which is why, were I writting a video game in Linux, I'd be writting the 
game logic in Ocaml- and the rendering logic in C/assembly.  Or at least 
I'd be looking to write some sort of C/assembly library to handle large 
arrays of vectors of single precision floats, and do operations on them 
using the various SIMD extensions.

The problems that Jak and Daxter ran into doing pretty much precisely this 
mostly came from one of two problems: 1) they were inventing, and 
maintaining, their own language- and thus had the usual assortment of 
compiler bugs to work out.  Using a mature, debugged language like Ocaml 
would solve this.  And 2) Unfamiliarity of the development staff with 
Functional programming and it's patterns.  See:
http://www.gamasutra.com/features/20020710/white_02.htm

A couple of other, generic, comments on this topic:

1) IMHO, most game developers are focusing too much on technology, and not 
enough on game play.  Games with great game play, but even very low 
CPU/Graphics requirements, like Tetris, SpaceWar, Asteroids, PacMan, 
Nethack, etc., are still great fun to play, and are still played by large 
numbers of people.  This is despite the exceptionally crude aspects of 
lots of them (Nethack and Tetris can even be played on text consoles).
  Games which are technical acheivements but weaker on the game play tend 
to be flash in the pans, at best- who still plays Myst in any serious way? 
Spending your time adding even more realistic blood splatter to a first 
person shooter strikes me as being a suboptimal use of time.

2) If Ocaml isn't the best possible language to use for game design, so 
what?  Outside of game design, the vast majority of numerical programs are 
much more about data structures and algorithms (two things that Ocaml 
kicks ass on) than they are raw FLOPS.  A classic example- two programs (A 
and B), both solving the same program.  Program A has 10x the FLOPS rate 
as program B- and yet program B is 10x faster?  Why?  Because the problem 
is multiplying two sparse matricies, each matrix having only 10% non-zero 
elements.  Program A is doing it as dense multiplication, taking full 
advantage of all SIMD extensions, etc., while program B has implemented a 
sparse matrix data structure.  And while program B is boxing it's floats, 
and doing a lot of data structure overhead, which means it's only issuing 
1 floating point instruction for every 10 FP instructions program A is 
issuing, it's only needing to issue 0.1 * 0.1 = 1/100th the number of 
floating point instructions, and thus is still 10x faster.  That's a 
simple case, but it illustrates the real advantage Ocaml has.

But there is a wider point here.  A number of languages (and C++ is 
particularly bad for this) trying to be golden hammers.  They're a floor 
wax and a dessert topping!  I disbeleive in golden hammers- in my 
experience, languages that try to be all things to all people end up being 
the wrong tool for all jobs.  Numeric programming and game programming put 
together are a small corner of programming.  By any measure you want to 
apply, business logic programming is at least 10 times as big as both of 
those put together- wether it be the number of programmers working on that 
code, the amount of money spent on it, the number of users, the total time 
of all users spent using that code, etc.  In this market, absolute 
performance is less of an issue (witness Java) than is things like 
correctness, ease of maintainability, speed of development, etc.  I 
wouldn't try to write the rendering image for a video game in Java either- 
but that doesn't mean that Java isn't a successfull and usefull language.

Brian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] I'm sure it's been discussed a few times, but here we go.... single-precision floats
@ 2006-03-06 15:22 Jonathan Harrop
  2006-03-06 16:34 ` Brian Hurt
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Harrop @ 2006-03-06 15:22 UTC (permalink / raw)
  To: Caml-list, Asfand Yar Qazi

On Mon Mar  6 12:23 , Asfand Yar Qazi <email@asfandyar.cjb.net> sent:
>All the OCaml discussions about floating point precision I have seen so far
>evolve around how fast operations are performed on them - but the critical
>thing for things like collision detection, etc. in games is the amount of data
>that can fit into the CPU cache and be operated on before the cache must be
>reloaded.  Obviously, twice as many single precision floats can fit into any
>CPU's cache than double precision floats.

Yes.

>We're talking huge dynamic data structures with millions of floating point
>coordinates that all have to be iterated over many times a second - preferably
>by using multithreaded algorithms, so that multiple CPUs can be used
>efficiently.  Since doing this sort of work (i.e. parallel computing) in C++
>is a pain in the **** ('scuse my French :-), I want to learn a language that
>will make it easy and less error-prone - hence my study of OCaml.

Due to OCaml's lack of a concurrent GC, there is no good way to low-level parallelise OCaml programs. 
You can, of course, use message passing between separate OCaml processes to parallelise at a higher 
level.

OCaml's advantages center around the ability to design and use sophisticated data structures easily - 
the precise opposite of iterating over long arrays.

>So, is there any way (I'm thinking similar to 'nativeint') to use floats in
>OCaml to maximize the data that can be stored and operated on in the CPUs 
>cache such that system memory is accessed as little as possible?

Currently, your only choice is to use big arrays of 32-bit floats. There is no other way to store a 
single 32-bit float in an OCaml data structure. Such functionality would be useful in the case of my 
ray tracer, for example:

  http://www.ffconsultancy.com/free/ray_tracer

where efficient use of a big array would require fundamental alterations. However, my AMD64 wastes a 
lot of memory on pointers as well...

Cheers,
Jon.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-03-06 16:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-06 12:23 I'm sure it's been discussed a few times, but here we go.... single-precision floats Asfand Yar Qazi
2006-03-06 13:13 ` [Caml-list] " skaller
2006-03-06 17:01   ` Asfand Yar Qazi
2006-03-06 15:22 Jonathan Harrop
2006-03-06 16:34 ` Brian Hurt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).