From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 63C227F615 for ; Mon, 19 Dec 2016 12:51:52 +0100 (CET) Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=info@gerd-stolpmann.de; spf=None smtp.mailfrom=info@gerd-stolpmann.de; spf=None smtp.helo=postmaster@mout.kundenserver.de Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of info@gerd-stolpmann.de) identity=pra; client-ip=212.227.126.187; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="info@gerd-stolpmann.de"; x-sender="info@gerd-stolpmann.de"; x-conformance=sidf_compatible Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of info@gerd-stolpmann.de) identity=mailfrom; client-ip=212.227.126.187; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="info@gerd-stolpmann.de"; x-sender="info@gerd-stolpmann.de"; x-conformance=sidf_compatible Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mout.kundenserver.de) identity=helo; client-ip=212.227.126.187; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="info@gerd-stolpmann.de"; x-sender="postmaster@mout.kundenserver.de"; x-conformance=sidf_compatible IronPort-PHdr: =?us-ascii?q?9a23=3ApQ1dgh94xARfTf9uRHKM819IXTAuvvDOBiVQ1KB3?= =?us-ascii?q?2+ocTK2v8tzYMVDF4r011RmSDN6dtK0P0LCempujcFRI2YyGvnEGfc4EfD4+ou?= =?us-ascii?q?JSoTYdBtWYA1bwNv/gYn9yNs1DUFh44yPzahANS47xaFLIv3K98yMZFAnhOgpp?= =?us-ascii?q?POT1HZPZg9iq2+yo9ZDeZwtFiCC/bL5wIxm7oxvdvdQKjIV/Lao81gHHqWZSde?= =?us-ascii?q?RMwmNoK1OTnxLi6cq14ZVu7Sdete8/+sBZSan1cLg2QrJeDDQ9LmA6/9brugXZ?= =?us-ascii?q?TQuO/XQTTGMbmQdVDgff7RH6WpDxsjbmtud4xSKXM9H6QawyVD+/6apgVR3mhz?= =?us-ascii?q?odNzMh82/Zl8x+grxVrh2jqRxw34nab46OOfpxYq/QZ8kXSHBdUstVUSFKH4Oy?= =?us-ascii?q?b5EID+oEJetWrpfyp0ETohCjGAesGOTvyjtQhn/zx6I61eIhGhzB0QwhGdIOvn?= =?us-ascii?q?PUoc76NKgMS+C60bDEzS7fb/NR3Tf98I3IfQonofGKR75/bNTexFApGgjYgFuQ?= =?us-ascii?q?ronlMCmU1uQLq2Wb4PRvVfiyhGI+sAFxvj+vxsM0ionMnI0VzFbE+T9kz4krPd?= =?us-ascii?q?G3VFR0YdugEJRMtiGaK4t3TtklQ2FytyY3zKANt52jfCUS1pgr2gDTZ+aZf4SW?= =?us-ascii?q?4B/vTvudLSl5iX5/Zb6yhBS//VCjx+D8TMW4zVVHoyRfntTNsn0BzQHf58maRv?= =?us-ascii?q?Z740yvwyyA1xrJ5eFBOU00lbTUK5omwrMok5oTtlnDHjPslEX1ka+WcFgr9fau?= =?us-ascii?q?6+T8fLrmvIGcOJFuig3kL6shhNSzAeU+MgcQQ2iW4fqw2KD98UHjXrlGkP87nr?= =?us-ascii?q?PEvJzEJMkXvLO1DgxX34o77hawFTam0NAWnXkdK1JFfQqKj4nvO1HAJ/D1Fvi/?= =?us-ascii?q?jEq2kDh23vzGJaHhApLJLnjblbfuZ7B960hGxAUu099T/4hUBa0ZIPLvRk/xs8?= =?us-ascii?q?TVAQMjPAyxx+brEdF91oIFWWKTGaKZK6PTsVqQ5u01OeWMZYkVuCz8K/c//fLu?= =?us-ascii?q?g2U5yhchevyC3YEWc2y/BvQuA9uWbGCk1twBC2YRog0mTKrqj1CNXCR7e2v3Va?= =?us-ascii?q?8m4jA9To6rW8OLTYmohPmF3TynNpxQfGFPTF6WQlnycIDRdPoWZGqpPshlijkN?= =?us-ascii?q?U77pH44n2xaGuwLgx/98Mu3Q4igRs5Sl2NUjtL6brg076TEhV5fV6GqKVWwhxm?= =?us-ascii?q?4=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0A6AAB4yVdYbbt+49RdHAEBBAEBCgEBF?= =?us-ascii?q?wEBBAEBCgEBgwwBAQEBAR9agQaNT5ZYlQ6CCiaFfAKBdj8UAQEBAQEBAQEBAQE?= =?us-ascii?q?SCw0JBx8wgjMYgh4GVTQLRlcGARIJiGYBCa0ciwsBAQEBBgEBAQEBFA+FRYVKh?= =?us-ascii?q?EyCQAuDCgWIVA6QTIFCgVECmXqGNIgOhguEDx+BXCuFV3EBhh4qghMBAQE?= X-IPAS-Result: =?us-ascii?q?A0A6AAB4yVdYbbt+49RdHAEBBAEBCgEBFwEBBAEBCgEBgww?= =?us-ascii?q?BAQEBAR9agQaNT5ZYlQ6CCiaFfAKBdj8UAQEBAQEBAQEBAQESCw0JBx8wgjMYg?= =?us-ascii?q?h4GVTQLRlcGARIJiGYBCa0ciwsBAQEBBgEBAQEBFA+FRYVKhEyCQAuDCgWIVA6?= =?us-ascii?q?QTIFCgVECmXqGNIgOhguEDx+BXCuFV3EBhh4qghMBAQE?= X-IronPort-AV: E=Sophos;i="5.33,373,1477954800"; d="asc'?scan'208";a="205088819" Received: from mout.kundenserver.de ([212.227.126.187]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Dec 2016 12:51:51 +0100 Received: from office1.lan.sumadev.de ([85.183.69.161]) by mrelayeu.kundenserver.de (mreue003 [212.227.15.167]) with ESMTPSA (Nemesis) id 0LaYbD-1d16jr30md-00mJzj; Mon, 19 Dec 2016 12:51:48 +0100 Received: from e130 (unknown [192.168.65.10]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id AEE39DC05D; Mon, 19 Dec 2016 12:51:47 +0100 (CET) Message-ID: <1482148297.4629.19.camel@gerd-stolpmann.de> From: Gerd Stolpmann To: Christoph =?ISO-8859-1?Q?H=F6ger?= , caml-list@inria.fr Date: Mon, 19 Dec 2016 12:51:37 +0100 In-Reply-To: References: <7bc766a2-d460-524b-35ca-89609a34b719@tu-berlin.de> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-9rtDDnFFEXsHqhED1gpd" X-Mailer: Evolution 3.18.5.2-0ubuntu3 Mime-Version: 1.0 X-Provags-ID: V03:K0:AgrGoWSCdJaqyoJVPEs863Tmn0nnzXosnIPGyRO4mJGXR2zeHX2 fxVJmK93KcuUeEUjZi4oWSg0qOt7SPsFeBbVOl+QYjYIFhNDsf52epXZlNaWZ72V8dSvRir MpuMnwmLe5YFZRrxk93TRy3h/2gMvDFJuit7NmMDmwcm0Ct3J5J4St8e+FyhkNMLl7s4IPe uPi6+lW1pHXEMZeAb1i7g== X-UI-Out-Filterresults: notjunk:1;V01:K0:K1YnHsJBE94=:1Tj7eqmHXiFfIyk6aT7EdW NjMrk4nhBO4a4SsLNdslrf96MxCUXIeWmnmA4uOOnhSJdQNKK/uPcLyW+zQR7S7az83WEuDuf 96gP/oMWb/cG38c8X0Cf7dg+noFSJG+NCL3OREIjvax7F+/FR46MjVtcHlz/W4hpcG7Ym16Ez nuQtz8lqLkmTjvecsrLND5NHXootqnRW0yu+5K8auTTpYw9ePbru3D/WQAX+iLl2KOd9cpX4M Dcndyk+PE+Jbyy+RBJKsv7cxaQrTPm368oQ8Ys5b2aUvFTD9n/Xhz3QDVJJPLKnXYk2Qd+ZgG wpPaDuSe/uyPIx7pQDBnZne2vh9s26bncXSrGtUEmbGgJN6P00hCPVmY+u14cmbnbCwJd3U8u SuV+qogj4KmblnwWx+SUPR/CJu7Y5gYu5xM5PO8FbEPlgbxRm0ldwKu293kSa++JEvrQANjoQ AoyJ339y4t40iwtPI3I/BuDWtW7TwHqmQfRZ6NjikxGuRZswEaCwo1C61xySnWdxGP3YRtXQt cAJiQrUbLEQJxfzF+gUBfqFlRVuAqAQwn+MV7ZTxHC/zSPSkSrSIEed5Etr4f1jous+nueAGb XALOpAVgYVaCuehlPIa+WUJRth9zZj3j2sXgrunjVtpxU/jkgU/yecqOXf8Oa6pwmCvghdjpD hti/UmDDIuN2dIA5tTK2CGLlkhYzVcgDMd3kZ5BIRoi7RSESRMlih+RUZ5aXkcx1R+xw= Subject: Re: [Caml-list] Closing the performance gap to C --=-9rtDDnFFEXsHqhED1gpd Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable Hi Christoph, the extra code looks very much like an allocation on the minor heap: sub=A0=A0=A0=A0$0x10,%r15 lea=A0=A0=A0=A00x25c7b6(%rip),%rax cmp=A0=A0=A0=A0(%rax),%r15 jb=A0=A0=A0=A0=A0404a8a lea=A0=A0=A0=A00x8(%r15),%rax movq=A0=A0=A0$0x4fd,-0x8(%rax) r15 points to the used area of the minor heap - by decrementing it you get an additional block of memory. It is compared against the beginning of the heap to check whether GC is needed. The constant 0x4fd is the header of the new block (which must be always initialized). =46rom the source code, it remains unclear for what this is used. Obviously, the compiler runs out of registers, and moves some values to the minor heap (temporarily). When you call a C function like cos it is likely that this happens because the C calling conventions do not preserve the FP registers (xmm*). This could be improved if the OCaml compiler tried alternate places for temporarily storing FP values: =A0- int registers (which is perfectly possible on 64 bit platforms). =A0 =A0A number of int registers survive C calls. =A0- stack To my knowledge, the OCaml compiler never tries this (but this could be out of date). This is a fairly specific optimization that makes mostly sense for purely iterating or aggregating functions like yours that do not store FP values away. Gerd Am Samstag, den 17.12.2016, 14:02 +0100 schrieb Christoph H=F6ger: > Ups. Forgot the actual examples. >=20 > Am 17.12.2016 um 14:01 schrieb Christoph H=F6ger: > >=20 > > Dear all, > >=20 > > find attached two simple runge-kutta iteration schemes. One is > > written > > in C, the other in OCaml. I compared the runtime of both and gcc (- > > O2) > > produces an executable that is roughly 30% faster (to be more > > precise: > > 3.52s vs. 2.63s). That is in itself quite pleasing, I think. I do > > not > > understand however, what causes this difference. Admittedly, the > > generated assembly looks completely different, but both compilers > > inline > > all functions and generate one big loop. Ocaml generates a lot more > > scaffolding, but that is to be expected. > >=20 > > There is however an interesting particularity: OCaml generates 6 > > calls > > to cos, while gcc only needs 3 (and one direct jump). Surprisingly, > > there are also calls to cosh, acos and pretty much any other > > trigonometric function (initialization of constants, maybe?) > >=20 > > However, the true culprit seems to be an excess of instructions > > between > > the different calls to cos. This is what happens between the first > > two > > calls to cos: > >=20 > > gcc: > > jmpq=A0=A0=A0400530 > > nop > > nopw=A0=A0=A0%cs:0x0(%rax,%rax,1) > >=20 > > sub=A0=A0=A0=A0$0x38,%rsp > > movsd=A0=A0%xmm0,0x10(%rsp) > > movapd %xmm1,%xmm0 > > movsd=A0=A0%xmm2,0x18(%rsp) > > movsd=A0=A0%xmm1,0x8(%rsp) > > callq=A0=A0400530 > >=20 > > ocamlopt: > >=20 > > callq=A0=A0401a60 > > mulsd=A0=A0(%r12),%xmm0 > > movsd=A0=A0%xmm0,0x10(%rsp) > > sub=A0=A0=A0=A0$0x10,%r15 > > lea=A0=A0=A0=A00x25c7b6(%rip),%rax > > cmp=A0=A0=A0=A0(%rax),%r15 > > jb=A0=A0=A0=A0=A0404a8a > > lea=A0=A0=A0=A00x8(%r15),%rax > > movq=A0=A0=A0$0x4fd,-0x8(%rax) > >=20 > > movsd=A0=A00x32319(%rip),%xmm1 > >=20 > > movapd %xmm1,%xmm2 > > mulsd=A0=A0%xmm0,%xmm2 > > addsd=A0=A00x0(%r13),%xmm2 > > movsd=A0=A0%xmm2,(%rax) > > movapd %xmm1,%xmm0 > > mulsd=A0=A0(%r12),%xmm0 > > addsd=A0=A0(%rbx),%xmm0 > > callq=A0=A0401a60 > >=20 > >=20 > > Is this caused by some underlying difference in the representation > > of > > numeric values (i.e. tagged ints) or is it reasonable to attack > > this > > issue as a hobby experiment? > >=20 > >=20 > > thanks for any advice, > >=20 > > Christoph > >=20 >=20 --=20 ------------------------------------------------------------ Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de My OCaml site: http://www.camlcity.org Contact details: http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de ------------------------------------------------------------ --=-9rtDDnFFEXsHqhED1gpd Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJYV8nKAAoJEAaM4b9ZLB5TXJ8H/AmR6VIXsnzI+xIV2lkFgC7S vChgyhDE+vwDCe5JezS3HZA7c2O0SFyenDFFbwYrBYltOPyolJRrwXLdbOxpDchO BFZMak+8B2Si2jhMNrgkNC1COz8izBVYFCVAHAR10lxcHSXNLQvSxggOvoW1jwTC 4tM65Z+UKt+jKisBHiZGQ62OW0KWLr1geQ6k7WCXfHEKyOXOPtYREBYAUIfkopqT j7HuA3kMmXQ6PU9cTu/mOILzIC6kx+FTQkJqBrdw6rTOncue7k4xKBzBNQOxVbxU +KpAuYE2Xj+vIm9clkDPlpOkbG4x3nOOtDm/qQ1rOOYHNCAMaAKj30diNeVSGss= =Rycz -----END PGP SIGNATURE----- --=-9rtDDnFFEXsHqhED1gpd--