caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Closing the performance gap to C
@ 2016-12-17 13:01 Christoph Höger
  2016-12-17 13:02 ` Christoph Höger
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Höger @ 2016-12-17 13:01 UTC (permalink / raw)
  To: caml users


[-- Attachment #1.1: Type: text/plain, Size: 2110 bytes --]

Dear all,

find attached two simple runge-kutta iteration schemes. One is written
in C, the other in OCaml. I compared the runtime of both and gcc (-O2)
produces an executable that is roughly 30% faster (to be more precise:
3.52s vs. 2.63s). That is in itself quite pleasing, I think. I do not
understand however, what causes this difference. Admittedly, the
generated assembly looks completely different, but both compilers inline
all functions and generate one big loop. Ocaml generates a lot more
scaffolding, but that is to be expected.

There is however an interesting particularity: OCaml generates 6 calls
to cos, while gcc only needs 3 (and one direct jump). Surprisingly,
there are also calls to cosh, acos and pretty much any other
trigonometric function (initialization of constants, maybe?)

However, the true culprit seems to be an excess of instructions between
the different calls to cos. This is what happens between the first two
calls to cos:

gcc:
jmpq   400530 <cos@plt>
nop
nopw   %cs:0x0(%rax,%rax,1)

sub    $0x38,%rsp
movsd  %xmm0,0x10(%rsp)
movapd %xmm1,%xmm0
movsd  %xmm2,0x18(%rsp)
movsd  %xmm1,0x8(%rsp)
callq  400530 <cos@plt>

ocamlopt:

callq  401a60 <cos@plt>
mulsd  (%r12),%xmm0
movsd  %xmm0,0x10(%rsp)
sub    $0x10,%r15
lea    0x25c7b6(%rip),%rax
cmp    (%rax),%r15
jb     404a8a <dlerror@plt+0x2d0a>
lea    0x8(%r15),%rax
movq   $0x4fd,-0x8(%rax)

movsd  0x32319(%rip),%xmm1

movapd %xmm1,%xmm2
mulsd  %xmm0,%xmm2
addsd  0x0(%r13),%xmm2
movsd  %xmm2,(%rax)
movapd %xmm1,%xmm0
mulsd  (%r12),%xmm0
addsd  (%rbx),%xmm0
callq  401a60 <cos@plt>


Is this caused by some underlying difference in the representation of
numeric values (i.e. tagged ints) or is it reasonable to attack this
issue as a hobby experiment?


thanks for any advice,

Christoph
-- 
Christoph Höger

Technische Universität Berlin
Fakultät IV - Elektrotechnik und Informatik
Übersetzerbau und Programmiersprachen

Sekr. TEL12-2, Ernst-Reuter-Platz 7, 10587 Berlin

Tel.: +49 (30) 314-24890
E-Mail: christoph.hoeger@tu-berlin.de


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-12-23 12:18 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-17 13:01 [Caml-list] Closing the performance gap to C Christoph Höger
2016-12-17 13:02 ` Christoph Höger
2016-12-19 10:58   ` Soegtrop, Michael
2016-12-19 11:51   ` Gerd Stolpmann
2016-12-19 14:52     ` Soegtrop, Michael
2016-12-19 16:41       ` Gerd Stolpmann
2016-12-19 17:09         ` Frédéric Bour
2016-12-19 17:19           ` Yotam Barnoy
2016-12-21 11:25             ` Alain Frisch
2016-12-21 14:45               ` Yotam Barnoy
2016-12-21 16:06                 ` Alain Frisch
2016-12-21 16:31                   ` Gerd Stolpmann
2016-12-21 16:39                     ` Yotam Barnoy
2016-12-21 16:47                       ` Gabriel Scherer
2016-12-21 16:51                         ` Yotam Barnoy
2016-12-21 16:56                         ` Mark Shinwell
2016-12-21 17:43                           ` Alain Frisch
2016-12-22  8:39                             ` Mark Shinwell
2016-12-22 17:23                             ` Pierre Chambart
2016-12-21 17:35                       ` Alain Frisch
2016-12-19 15:48     ` Ivan Gotovchits
2016-12-19 16:44       ` Yotam Barnoy
2016-12-19 16:59         ` Ivan Gotovchits
2016-12-21  9:08           ` Christoph Höger
2016-12-23 12:18             ` Oleg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).