Dear all, find attached two simple runge-kutta iteration schemes. One is written in C, the other in OCaml. I compared the runtime of both and gcc (-O2) produces an executable that is roughly 30% faster (to be more precise: 3.52s vs. 2.63s). That is in itself quite pleasing, I think. I do not understand however, what causes this difference. Admittedly, the generated assembly looks completely different, but both compilers inline all functions and generate one big loop. Ocaml generates a lot more scaffolding, but that is to be expected. There is however an interesting particularity: OCaml generates 6 calls to cos, while gcc only needs 3 (and one direct jump). Surprisingly, there are also calls to cosh, acos and pretty much any other trigonometric function (initialization of constants, maybe?) However, the true culprit seems to be an excess of instructions between the different calls to cos. This is what happens between the first two calls to cos: gcc: jmpq 400530 nop nopw %cs:0x0(%rax,%rax,1) sub $0x38,%rsp movsd %xmm0,0x10(%rsp) movapd %xmm1,%xmm0 movsd %xmm2,0x18(%rsp) movsd %xmm1,0x8(%rsp) callq 400530 ocamlopt: callq 401a60 mulsd (%r12),%xmm0 movsd %xmm0,0x10(%rsp) sub $0x10,%r15 lea 0x25c7b6(%rip),%rax cmp (%rax),%r15 jb 404a8a lea 0x8(%r15),%rax movq $0x4fd,-0x8(%rax) movsd 0x32319(%rip),%xmm1 movapd %xmm1,%xmm2 mulsd %xmm0,%xmm2 addsd 0x0(%r13),%xmm2 movsd %xmm2,(%rax) movapd %xmm1,%xmm0 mulsd (%r12),%xmm0 addsd (%rbx),%xmm0 callq 401a60 Is this caused by some underlying difference in the representation of numeric values (i.e. tagged ints) or is it reasonable to attack this issue as a hobby experiment? thanks for any advice, Christoph -- Christoph Höger Technische Universität Berlin Fakultät IV - Elektrotechnik und Informatik Übersetzerbau und Programmiersprachen Sekr. TEL12-2, Ernst-Reuter-Platz 7, 10587 Berlin Tel.: +49 (30) 314-24890 E-Mail: christoph.hoeger@tu-berlin.de