> For fun I have implemented an nbody simulation following > http://shootout.alioth.debian.org/benchmark.php?test=nbody&lang=all&sort=cpu > (code is attached). Ah, another micro-benchmark. Great pasttime! Your OCaml code is about as good as you can write. All the unboxing optimizations are triggered. > ocamlopt -o nbody.com -inline 3 -unsafe -ccopt -O2 nbody.ml On x86, you can get a bit more speed with -ffast-math, which turns the call to sqrt() into inline assembly. As others mentioned, "-ccopt -O2" is useless. > I've compared with the Java program they give. I get (on a Pentium(R) > 4 CPU 2.40GHz Debian): > > n OCaml Java > 1000 0.004 0.112 > 10000 0.016 0.112 > 100000 0.159 0.218 > 200000 0.284 0.370 > 500000 0.707 0.702 > 1000000 1.410 1.359 > 2000000 2.884 2.453 > 3000000 4.294 3.590 > 4000000 5.735 4.774 > > I am interested in explanations why OCaml seems asymptotically slower > than Java and ways to improve that. You don't say which Java implementation you used (there are several). The "0.112" overhead of Java corresponds to start-up time, which includes JIT-compilation. As to why Java is asymptotically faster, we'd need to look at the generated assembly code. Good luck doing that with a JIT compiler. So, to understand OCaml's performances here, one has to turn to a different baseline. I translated your Caml code to C and looked at gcc output. The best gcc output is faster than the best OCaml output by about 30%. Looking at the asm code, the main difference is that gcc keeps some float variables (dx, dy, dz, etc) in the floating-point stack while OCaml stores them (unboxed) to the stack. Maybe the Java implementation you used manages to use the float stack. Who knows. The x86 floating-point stack is an awfully bad match for the register-based OCaml code generation model, so, no, I'm not going to the great lengths the gcc folks went to extract some performance from that model. (Besides, being 1.3 times slower than gcc on numerical code is within the design envelope for OCaml. My performance goals have always been "never more than twice as slow as C".) On a "normal" (register-based) float architecture like PowerPC or x86_64, the OCaml-generated code is essentially identical to the gcc-generated one. The C translation is attached for your amusement. - Xavier Leroy