caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: HLVM ray tracer performance
@ 2010-01-10 18:29 shawjef3
  2010-01-10 20:14 ` [Caml-list] " Jon Harrop
  0 siblings, 1 reply; 6+ messages in thread
From: shawjef3 @ 2010-01-10 18:29 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Plaintext Version of Message --]
[-- Type: text/plain, Size: 1645 bytes --]

Jon,
I wanted to run the raytracing benchmark myself to see if Haskell really was that slow. I'm using ghc 6.10 because that's what ubuntu comes with. I don't know if ghc 6.12 generates slower executables than 6.10 or what else might be going on. I ran each several times and the numbers I pasted are typical (+/- 0.2 seconds, say).

jeff@ubuntu:~/Desktop$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.10.4
jeff@ubuntu:~/Desktop$ g++ --version
g++ (Ubuntu 4.4.1-4ubuntu8) 4.4.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

jeff@ubuntu:~/Desktop$ ocamlopt -v
The Objective Caml native-code compiler, version 3.11.1
Standard library directory: /usr/lib/ocaml

I compiled the raytracers for c++, haskell and ocaml from

http://www.ffconsultancy.com/languages/ray_tracer/code/5

and used the compile instructions at

http://www.ffconsultancy.com/languages/ray_tracer/benchmark.html

though I had to change the haskell one to use just ghc instead of specifying a version. I also ran the ocaml and haskell code in the 1/ directory, and they completed within 0.1 seconds of each other.

c++
jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null

real    0m3.515s
user    0m3.440s
sys    0m0.016s

haskell
jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null

real    0m5.811s
user    0m5.752s
sys    0m0.032s

ocaml
jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null

real    0m6.572s
user    0m6.544s
sys    0m0.016s

Jeff

[-- Attachment #2: HTML Version of Message --]
[-- Type: text/html, Size: 1936 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: HLVM ray tracer performance
  2010-01-10 18:29 HLVM ray tracer performance shawjef3
@ 2010-01-10 20:14 ` Jon Harrop
  2010-01-10 20:37   ` Richard Jones
  2010-01-11  0:47   ` Jeff Shaw
  0 siblings, 2 replies; 6+ messages in thread
From: Jon Harrop @ 2010-01-10 20:14 UTC (permalink / raw)
  To: caml-list; +Cc: shawjef3

On Sunday 10 January 2010 18:29:42 shawjef3@msu.edu wrote:
> Jon,
>
> I wanted to run the raytracing benchmark myself to see if Haskell really
> was that slow. I'm using ghc 6.10 because that's what ubuntu comes with.
> I don't know if ghc 6.12 generates slower executables than 6.10 or what
> else might be going on.

I used GHC 6.12 with --make -O2 to get the results from the recent article 
because it generated results faster than GHC 6.10. However, I failed to 
detect that only the Haskell was generating garbage output. Rerunning the 
benchmark with GHC 6.10 here, Haskell does give the correct answer but the 
times are even worse than those I quoted.

> I ran each several times and the numbers I pasted 
> are typical (+/- 0.2 seconds, say).
>
> jeff@ubuntu:~/Desktop$ ghc --version
> The Glorious Glasgow Haskell Compilation System, version 6.10.4
> jeff@ubuntu:~/Desktop$ g++ --version
> g++ (Ubuntu 4.4.1-4ubuntu8) 4.4.1
> Copyright (C) 2009 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> jeff@ubuntu:~/Desktop$ ocamlopt -v
> The Objective Caml native-code compiler, version 3.11.1
> Standard library directory: /usr/lib/ocaml

I used g++ 4.3.3 and OCaml 3.11.1 on a 64-bit Linux kernel running 32-bit 
userland. The machine is an 8-core with two Quad-Core AMD Opteron(tm) 2352 
Processors running at 2.1GHz. AFAICT they have 512kb L2 caches each and 2Mb 
L3 caches per quadcore CPU.

> I compiled the raytracers for c++, haskell and ocaml from
>
> http://www.ffconsultancy.com/languages/ray_tracer/code/5
>
> and used the compile instructions at
>
> http://www.ffconsultancy.com/languages/ray_tracer/benchmark.html
>
> though I had to change the haskell one to use just ghc instead of
> specifying a version. I also ran the ocaml and haskell code in the 1/
> directory, and they completed within 0.1 seconds of each other.
>
> c++
> jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null
>
> real    0m3.515s
> user    0m3.440s
> sys    0m0.016s
>
> haskell
> jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null
>
> real    0m5.811s
> user    0m5.752s
> sys    0m0.032s
>
> ocaml
> jeff@ubuntu:~/Desktop$ time ./ray 9 512 > /dev/null
>
> real    0m6.572s
> user    0m6.544s
> sys    0m0.016s

Are you running x64 or on Intel hardware? What results do you get for 12, 13 
or 14 instead of 9?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: HLVM ray tracer performance
  2010-01-10 20:14 ` [Caml-list] " Jon Harrop
@ 2010-01-10 20:37   ` Richard Jones
  2010-01-11 11:03     ` Jon Harrop
  2010-01-11  0:47   ` Jeff Shaw
  1 sibling, 1 reply; 6+ messages in thread
From: Richard Jones @ 2010-01-10 20:37 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list, shawjef3

On Sun, Jan 10, 2010 at 08:14:29PM +0000, Jon Harrop wrote:
> on a 64-bit Linux kernel running 32-bit userland

I'm assuming you mean x86 (not eg ppc64), in which case that's a very
unusual choice.  Any reason for this?

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: HLVM ray tracer performance
  2010-01-10 20:14 ` [Caml-list] " Jon Harrop
  2010-01-10 20:37   ` Richard Jones
@ 2010-01-11  0:47   ` Jeff Shaw
  2010-01-11 10:48     ` Jon Harrop
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Shaw @ 2010-01-11  0:47 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list


> Are you running x64 or on Intel hardware? What results do you get for 12, 13
> or 14 instead of 9?
>
>    
I am running an AMD Phenom 9950, but the Ubuntu I'm using is just 
32-bit. I tried 5/ray.hs with level=12 instead of 9 but it ran into a 
stack overflow problem. When I increased the stack size it completed but 
it also took more time than 1/ray.hs, which required no stack size 
increase. I made sure that the other arguments I fed it were the same. I 
think there is some problem that needs to be worked out in the 5/ray.hs. 
Maybe the problem is in ghc, I'm not sure. Below, ./ray5 is 5/ray.hs, 
and ./ray is 1/ray.hs

jeff@ubuntu:~/Desktop$ time ./ray 12 512 > /dev/null

real    0m21.479s
user    0m21.093s
sys    0m0.180s
jeff@ubuntu:~/Desktop$ time ./ray5 12 512 +RTS -K2000000000 > /dev/null

real    0m28.366s
user    0m25.674s
sys    0m2.608s
jeff@ubuntu:~/Desktop$ time ./ray 14 512 > /dev/null

real    0m23.544s
user    0m23.021s
sys    0m0.500s

I tried level=14 but I ran out of memory for 5/ray.ml and 5/ray.hs.

I considered that maybe I had saved the files from your website wrong, 
or mixed them up during compilation. So I ran the timer again with 
level=9 and level=12 and got all the same results. That is, level=9 is 
faster on 5/ray.hs but level=12 is faster with 1/ray.hs. So I don't 
think I'm making a simple manual labor error.

It seems that 5/ray.ml and 5/ray.hs aren't quite equivalent in some 
important way since 1/ray.ml is faster than 5/ray.ml for both level=9 
and level=12. Whether it's a code problem or compiler problem, I cannot say.

The stack size problem does not go away when I remove all the extra 
optimization arguments to ghc.

--Jeff


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: HLVM ray tracer performance
  2010-01-11  0:47   ` Jeff Shaw
@ 2010-01-11 10:48     ` Jon Harrop
  0 siblings, 0 replies; 6+ messages in thread
From: Jon Harrop @ 2010-01-11 10:48 UTC (permalink / raw)
  To: caml-list; +Cc: Jeff Shaw

On Monday 11 January 2010 00:47:26 Jeff Shaw wrote:
> > Are you running x64 or on Intel hardware? What results do you get for 12,
> > 13 or 14 instead of 9?
>
> I am running an AMD Phenom 9950, but the Ubuntu I'm using is just
> 32-bit given that we're running the same architecture.

Then I'm even more surprised that you would see significantly different 
results to mine.

> I tried 5/ray.hs with level=12 instead of 9 but it ran into a 
> stack overflow problem.

Yes. Many of the Haskell versions regularly die with stack overflows. They are 
not predictable.

> When I increased the stack size it completed but 
> it also took more time than 1/ray.hs, which required no stack size
> increase.

This is an interesting result. I hadn't noticed that the most optimized 
Haskell implementation is not necessarily the fastest. However, I think I can 
explain the phenomenon: with a huge number of spheres, some groups of spheres 
(branches of scene tree) are always occluded and never need to be explicitly 
generated but only the Haskell is generating the scene tree lazily. In fact, 
it may be the case that with level->infinity only the Haskell required 
bounded space.

For example, at level=13 the 1/ray.hs Haskell takes 25.8s, 2/ray.hs takes 93s 
and the 5/ray.ml OCaml takes 118s. Presumably Lennart made the more optimized 
Haskell implementations eager in order to improve performance at level=9 but, 
in doing so, he degraded performance for level>9.

Unpredictable...

> I made sure that the other arguments I fed it were the same. I 
> think there is some problem that needs to be worked out in the 5/ray.hs.

There is no easy solution to this because the performance is a non-trivial 
function of "level" and "n".

> I tried level=14 but I ran out of memory for 5/ray.ml and 5/ray.hs.

But 1/ray.hs can handle level=14 and 15:

$ time ./ray 14 512 >image.pgm

real    0m27.581s
user    0m26.790s
sys     0m0.764s

$ time ./ray 15 512 >image.pgm

real    0m29.532s
user    0m28.982s
sys     0m0.552s

In fact, that is faster than any other version.

> It seems that 5/ray.ml and 5/ray.hs aren't quite equivalent in some
> important way since 1/ray.ml is faster than 5/ray.ml for both level=9
> and level=12.

Did you mean .hs instead of .ml here?

> Whether it's a code problem or compiler problem, I cannot 
> say.

The relative performance of the Haskell implementations also varies with 
compiler versions, of course. I cannot tell when it will run out of memory or 
even out of stack space. You just have to try it and, when Haskell dies with 
a stack overflow after several minutes, you just have to tweak the 
command-line parameters to try again until it happens to work.

Finally, I'd add that this "benefit" of the Haskell will almost certainly 
destroy its scalability in the parallel case because you'll have threads 
competing to force the evaluation of thunks in the shared scene tree which 
incurs global synchronization in wholly unpredictable ways (it even depends 
upon the layout of the scene!). So, while this is academically interesting, 
I'd argue that it is practically useless.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: HLVM ray tracer performance
  2010-01-10 20:37   ` Richard Jones
@ 2010-01-11 11:03     ` Jon Harrop
  0 siblings, 0 replies; 6+ messages in thread
From: Jon Harrop @ 2010-01-11 11:03 UTC (permalink / raw)
  To: caml-list


Richard asked me to draw a comparison on 64-bit as well because OCaml 
sometimes does relatively better there. With level=13, I get:

OCaml 32-bit: 118s
OCaml 64-bit:  95s
HLVM 32-bit:   34.8s
HLVM 64-bit:   30.4s

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-01-11  9:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-10 18:29 HLVM ray tracer performance shawjef3
2010-01-10 20:14 ` [Caml-list] " Jon Harrop
2010-01-10 20:37   ` Richard Jones
2010-01-11 11:03     ` Jon Harrop
2010-01-11  0:47   ` Jeff Shaw
2010-01-11 10:48     ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).