caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: "Christoph Höger" <christoph.hoeger@tu-berlin.de>, caml-list@inria.fr
Subject: Re: [Caml-list] Closing the performance gap to C
Date: Mon, 19 Dec 2016 12:51:37 +0100	[thread overview]
Message-ID: <1482148297.4629.19.camel@gerd-stolpmann.de> (raw)
In-Reply-To: <adf19464-c995-0e02-48e9-100f0efd26b6@tu-berlin.de>

[-- Attachment #1: Type: text/plain, Size: 4007 bytes --]

Hi Christoph,

the extra code looks very much like an allocation on the minor heap:

sub    $0x10,%r15
lea    0x25c7b6(%rip),%rax
cmp    (%rax),%r15
jb     404a8a <dlerror@plt+0x2d0a>
lea    0x8(%r15),%rax
movq   $0x4fd,-0x8(%rax)

r15 points to the used area of the minor heap - by decrementing it you
get an additional block of memory. It is compared against the beginning
of the heap to check whether GC is needed. The constant 0x4fd is the
header of the new block (which must be always initialized).

From the source code, it remains unclear for what this is used.
Obviously, the compiler runs out of registers, and moves some values to
the minor heap (temporarily). When you call a C function like cos it is
likely that this happens because the C calling conventions do not
preserve the FP registers (xmm*). This could be improved if the OCaml
compiler tried alternate places for temporarily storing FP values:

 - int registers (which is perfectly possible on 64 bit platforms).
   A number of int registers survive C calls.
 - stack

To my knowledge, the OCaml compiler never tries this (but this could be
out of date). This is a fairly specific optimization that makes mostly
sense for purely iterating or aggregating functions like yours that do
not store FP values away.

Gerd

Am Samstag, den 17.12.2016, 14:02 +0100 schrieb Christoph Höger:
> Ups. Forgot the actual examples.
> 
> Am 17.12.2016 um 14:01 schrieb Christoph Höger:
> > 
> > Dear all,
> > 
> > find attached two simple runge-kutta iteration schemes. One is
> > written
> > in C, the other in OCaml. I compared the runtime of both and gcc (-
> > O2)
> > produces an executable that is roughly 30% faster (to be more
> > precise:
> > 3.52s vs. 2.63s). That is in itself quite pleasing, I think. I do
> > not
> > understand however, what causes this difference. Admittedly, the
> > generated assembly looks completely different, but both compilers
> > inline
> > all functions and generate one big loop. Ocaml generates a lot more
> > scaffolding, but that is to be expected.
> > 
> > There is however an interesting particularity: OCaml generates 6
> > calls
> > to cos, while gcc only needs 3 (and one direct jump). Surprisingly,
> > there are also calls to cosh, acos and pretty much any other
> > trigonometric function (initialization of constants, maybe?)
> > 
> > However, the true culprit seems to be an excess of instructions
> > between
> > the different calls to cos. This is what happens between the first
> > two
> > calls to cos:
> > 
> > gcc:
> > jmpq   400530 <cos@plt>
> > nop
> > nopw   %cs:0x0(%rax,%rax,1)
> > 
> > sub    $0x38,%rsp
> > movsd  %xmm0,0x10(%rsp)
> > movapd %xmm1,%xmm0
> > movsd  %xmm2,0x18(%rsp)
> > movsd  %xmm1,0x8(%rsp)
> > callq  400530 <cos@plt>
> > 
> > ocamlopt:
> > 
> > callq  401a60 <cos@plt>
> > mulsd  (%r12),%xmm0
> > movsd  %xmm0,0x10(%rsp)
> > sub    $0x10,%r15
> > lea    0x25c7b6(%rip),%rax
> > cmp    (%rax),%r15
> > jb     404a8a <dlerror@plt+0x2d0a>
> > lea    0x8(%r15),%rax
> > movq   $0x4fd,-0x8(%rax)
> > 
> > movsd  0x32319(%rip),%xmm1
> > 
> > movapd %xmm1,%xmm2
> > mulsd  %xmm0,%xmm2
> > addsd  0x0(%r13),%xmm2
> > movsd  %xmm2,(%rax)
> > movapd %xmm1,%xmm0
> > mulsd  (%r12),%xmm0
> > addsd  (%rbx),%xmm0
> > callq  401a60 <cos@plt>
> > 
> > 
> > Is this caused by some underlying difference in the representation
> > of
> > numeric values (i.e. tagged ints) or is it reasonable to attack
> > this
> > issue as a hobby experiment?
> > 
> > 
> > thanks for any advice,
> > 
> > Christoph
> > 
> 
-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  parent reply	other threads:[~2016-12-19 11:51 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-17 13:01 Christoph Höger
2016-12-17 13:02 ` Christoph Höger
2016-12-19 10:58   ` Soegtrop, Michael
2016-12-19 11:51   ` Gerd Stolpmann [this message]
2016-12-19 14:52     ` Soegtrop, Michael
2016-12-19 16:41       ` Gerd Stolpmann
2016-12-19 17:09         ` Frédéric Bour
2016-12-19 17:19           ` Yotam Barnoy
2016-12-21 11:25             ` Alain Frisch
2016-12-21 14:45               ` Yotam Barnoy
2016-12-21 16:06                 ` Alain Frisch
2016-12-21 16:31                   ` Gerd Stolpmann
2016-12-21 16:39                     ` Yotam Barnoy
2016-12-21 16:47                       ` Gabriel Scherer
2016-12-21 16:51                         ` Yotam Barnoy
2016-12-21 16:56                         ` Mark Shinwell
2016-12-21 17:43                           ` Alain Frisch
2016-12-22  8:39                             ` Mark Shinwell
2016-12-22 17:23                             ` Pierre Chambart
2016-12-21 17:35                       ` Alain Frisch
2016-12-19 15:48     ` Ivan Gotovchits
2016-12-19 16:44       ` Yotam Barnoy
2016-12-19 16:59         ` Ivan Gotovchits
2016-12-21  9:08           ` Christoph Höger
2016-12-23 12:18             ` Oleg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1482148297.4629.19.camel@gerd-stolpmann.de \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=christoph.hoeger@tu-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).