From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id B049E7FD90 for ; Sat, 17 Dec 2016 14:03:14 +0100 (CET) Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=christoph.hoeger@tu-berlin.de; spf=None smtp.mailfrom=christoph.hoeger@tu-berlin.de; spf=None smtp.helo=postmaster@mail.tu-berlin.de Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of christoph.hoeger@tu-berlin.de) identity=pra; client-ip=130.149.7.33; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="christoph.hoeger@tu-berlin.de"; x-sender="christoph.hoeger@tu-berlin.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of christoph.hoeger@tu-berlin.de) identity=mailfrom; client-ip=130.149.7.33; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="christoph.hoeger@tu-berlin.de"; x-sender="christoph.hoeger@tu-berlin.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail.tu-berlin.de) identity=helo; client-ip=130.149.7.33; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="christoph.hoeger@tu-berlin.de"; x-sender="postmaster@mail.tu-berlin.de"; x-conformance=sidf_compatible IronPort-PHdr: =?us-ascii?q?9a23=3AgiXW5xz5yvkJcjLXCy+O+j09IxM/srCxBDY+r6Qd?= =?us-ascii?q?0uISIJqq85mqBkHD//Il1AaPBtSAra0fwLaL++C4ACpbvsbH6ChDOLV3FDY7yu?= =?us-ascii?q?wu1zQ6B8CEDUCpZNXLVAcdWPp4aVl+4nugOlJUEsutL3fbo3m18CJAUk6nbVk9?= =?us-ascii?q?dazJHdvZhsGzkuSz4IH7YgNShTP7b6khAg+xqFD6ttMXmpdlMqYG6oXGr2EAL+?= =?us-ascii?q?9W32JzOVWLn1D84cq/8YRL7zkVsf87889GF6n3KfdrBYdEBSgrZjhmrPbgsgPO?= =?us-ascii?q?GFOC?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0AYAAAeN1VYhyEHlYJcHAEXAQwBBQEBF?= =?us-ascii?q?gEBBgEEAQGCeQEBAQEBgX+kJ5UOggqGIgKCGD8UAQEBAQEBAQEBAQESAQEBCgs?= =?us-ascii?q?JCR0wgjMYgh4GI2YLQgICAlUTBgIBAYhnBKpAgiiLGiMPiDOCXIRXgjULLYJdB?= =?us-ascii?q?YhUkFqBQoNjgXyVboY0iA6GC4QPH4IHg26BaXGIdwEBAQ?= X-IPAS-Result: =?us-ascii?q?A0AYAAAeN1VYhyEHlYJcHAEXAQwBBQEBFgEBBgEEAQGCeQE?= =?us-ascii?q?BAQEBgX+kJ5UOggqGIgKCGD8UAQEBAQEBAQEBAQESAQEBCgsJCR0wgjMYgh4GI?= =?us-ascii?q?2YLQgICAlUTBgIBAYhnBKpAgiiLGiMPiDOCXIRXgjULLYJdBYhUkFqBQoNjgXy?= =?us-ascii?q?VboY0iA6GC4QPH4IHg26BaXGIdwEBAQ?= X-IronPort-AV: E=Sophos;i="5.33,363,1477954800"; d="asc'?ml'?c'?scan'208";a="250506566" Received: from mail.tu-berlin.de ([130.149.7.33]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2016 14:02:58 +0100 X-tubIT-Incoming-IP: 91.66.22.179 Received: from ip5b4216b3.dynamic.kabel-deutschland.de ([91.66.22.179] helo=[192.168.178.42]) by mail.tu-berlin.de (exim-4.84_2/mailfrontend-8) with esmtpa for id 1cIEdd-0005uN-lG; Sat, 17 Dec 2016 14:02:58 +0100 To: caml-list@inria.fr References: <7bc766a2-d460-524b-35ca-89609a34b719@tu-berlin.de> From: =?UTF-8?Q?Christoph_H=c3=b6ger?= Organization: =?UTF-8?Q?Technische_Universit=c3=a4t_Berlin?= Message-ID: Date: Sat, 17 Dec 2016 14:02:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <7bc766a2-d460-524b-35ca-89609a34b719@tu-berlin.de> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Mss5QkO61HMJLrJsbGGcT1OrXKsVwFhmJ" X-PMX-Version: 6.0.0.2142326, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2016.12.17.125715 X-PMX-Spam: Gauge=IIIIIII, Probability=0%, Report='' Subject: Re: [Caml-list] Closing the performance gap to C This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --Mss5QkO61HMJLrJsbGGcT1OrXKsVwFhmJ Content-Type: multipart/mixed; boundary="uHd2LKFusipX3t3KWUSqIuiMcM6T1vdgB"; protected-headers="v1" From: =?UTF-8?Q?Christoph_H=c3=b6ger?= To: caml-list@inria.fr Message-ID: Subject: Re: [Caml-list] Closing the performance gap to C References: <7bc766a2-d460-524b-35ca-89609a34b719@tu-berlin.de> In-Reply-To: <7bc766a2-d460-524b-35ca-89609a34b719@tu-berlin.de> --uHd2LKFusipX3t3KWUSqIuiMcM6T1vdgB Content-Type: multipart/mixed; boundary="------------ABC8441E5A3BA9141C1CD635" This is a multi-part message in MIME format. --------------ABC8441E5A3BA9141C1CD635 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ups. Forgot the actual examples. Am 17.12.2016 um 14:01 schrieb Christoph H=C3=B6ger: > Dear all, >=20 > find attached two simple runge-kutta iteration schemes. One is written > in C, the other in OCaml. I compared the runtime of both and gcc (-O2) > produces an executable that is roughly 30% faster (to be more precise: > 3.52s vs. 2.63s). That is in itself quite pleasing, I think. I do not > understand however, what causes this difference. Admittedly, the > generated assembly looks completely different, but both compilers inline > all functions and generate one big loop. Ocaml generates a lot more > scaffolding, but that is to be expected. >=20 > There is however an interesting particularity: OCaml generates 6 calls > to cos, while gcc only needs 3 (and one direct jump). Surprisingly, > there are also calls to cosh, acos and pretty much any other > trigonometric function (initialization of constants, maybe?) >=20 > However, the true culprit seems to be an excess of instructions between > the different calls to cos. This is what happens between the first two > calls to cos: >=20 > gcc: > jmpq 400530 > nop > nopw %cs:0x0(%rax,%rax,1) >=20 > sub $0x38,%rsp > movsd %xmm0,0x10(%rsp) > movapd %xmm1,%xmm0 > movsd %xmm2,0x18(%rsp) > movsd %xmm1,0x8(%rsp) > callq 400530 >=20 > ocamlopt: >=20 > callq 401a60 > mulsd (%r12),%xmm0 > movsd %xmm0,0x10(%rsp) > sub $0x10,%r15 > lea 0x25c7b6(%rip),%rax > cmp (%rax),%r15 > jb 404a8a > lea 0x8(%r15),%rax > movq $0x4fd,-0x8(%rax) >=20 > movsd 0x32319(%rip),%xmm1 >=20 > movapd %xmm1,%xmm2 > mulsd %xmm0,%xmm2 > addsd 0x0(%r13),%xmm2 > movsd %xmm2,(%rax) > movapd %xmm1,%xmm0 > mulsd (%r12),%xmm0 > addsd (%rbx),%xmm0 > callq 401a60 >=20 >=20 > Is this caused by some underlying difference in the representation of > numeric values (i.e. tagged ints) or is it reasonable to attack this > issue as a hobby experiment? >=20 >=20 > thanks for any advice, >=20 > Christoph >=20 --=20 Christoph H=C3=B6ger Technische Universit=C3=A4t Berlin Fakult=C3=A4t IV - Elektrotechnik und Informatik =C3=9Cbersetzerbau und Programmiersprachen Sekr. TEL12-2, Ernst-Reuter-Platz 7, 10587 Berlin Tel.: +49 (30) 314-24890 E-Mail: christoph.hoeger@tu-berlin.de --------------ABC8441E5A3BA9141C1CD635 Content-Type: text/plain; charset=UTF-8; name="rk4.c" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="rk4.c" I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxtYXRoLmg+Cgpkb3VibGUg ZXhhY3QoZG91YmxlIHQpIHsgcmV0dXJuIHNpbih0KTsgfQoKZG91YmxlIGR5 KGRvdWJsZSB0LCBkb3VibGUgeSkgeyByZXR1cm4gY29zKHQpOyB9Cgpkb3Vi bGUgcms0X3N0ZXAoZG91YmxlIHksIGRvdWJsZSB0LCBkb3VibGUgaCkgewog ICAgZG91YmxlIGsxID0gaCAqIGR5KHQsIHkpOwogICAgZG91YmxlIGsyID0g aCAqIGR5KHQgKyAwLjUgKiBoLCB5ICsgMC41ICogazEpOwogICAgZG91Ymxl IGszID0gaCAqIGR5KHQgKyAwLjUgKiBoLCB5ICsgMC41ICogazIpOwogICAg ZG91YmxlIGs0ID0gaCAqIGR5KHQgKyBoLCB5ICsgazMpOwogICAgcmV0dXJu ICB5ICsgKGsxICsgazQpLyA2LjAgKyAoazIrazMpIC8gMy4wOwp9Cgpkb3Vi bGUgbG9vcCAoaW50IHN0ZXBzLCBkb3VibGUgaCwgaW50IG4sIGRvdWJsZSB5 LCBkb3VibGUgdCkgewogICAgaWYgKG4gPCBzdGVwcykKICAgICAgICByZXR1 cm4gbG9vcChzdGVwcywgaCwgbisxLCByazRfc3RlcCh5LHQsaCksIHQraCk7 CiAgICBlbHNlIHJldHVybiB5Owp9CgppbnQgbWFpbigpIHsKICAgIGRvdWJs ZSBoID0gMC4xOwogICAgZG91YmxlIHkgPSBsb29wKDEwMiwgaCwgMSwgMS4w LCAwLjApOwogICAgZG91YmxlIGVyciA9IGZhYnMoeSAtIGV4YWN0KDEwMiAq IGgpKTsKICAgIGludCBsYXJnZSA9IDEwMDAwMDAwOwogICAgZG91YmxlIHky ID0gbG9vcChsYXJnZSwgaCwgMSwgMS4wLCAwLjApOwogICAgcHJpbnRmKCIl ZFxuIiwKICAgICAgICAgICAoZmFicyh5MiAtIChleGFjdChsYXJnZSAqIGgp KSkgPCAyLiAqIGVycikpOwogICAgcmV0dXJuIDA7Cn0K --------------ABC8441E5A3BA9141C1CD635 Content-Type: text/plain; charset=UTF-8; name="testrk4.ml" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="testrk4.ml" bGV0IHknIHQgeSA9IGNvcyB0CmxldCBleGFjdCB0ID0gc2luIHQKCmxldCBy azRfc3RlcCB5IHQgaCA9CiAgbGV0IGsxID0gaCAqLiB5JyB0IHkgaW4KICBs ZXQgazIgPSBoICouIHknICh0ICsuIDAuNSouaCkgKHkgKy4gMC41Ki5rMSkg aW4KICBsZXQgazMgPSBoICouIHknICh0ICsuIDAuNSouaCkgKHkgKy4gMC41 Ki5rMikgaW4KICBsZXQgazQgPSBoICouIHknICh0ICsuIGgpICh5ICsuIGsz KSBpbgogIHkgKy4gKGsxKy5rNCkvLjYuMCArLiAoazIrLmszKS8uMy4wCgps ZXQgcmVjIGxvb3Agc3RlcHMgaCBuIHkgdCA9CiAgaWYgbiA8IHN0ZXBzIHRo ZW4gbG9vcCBzdGVwcyBoIChuKzEpIChyazRfc3RlcCB5IHQgaCkgKHQgKy4g aCkgZWxzZQogICAgeQpsZXQgXyA9CiAgbGV0IGggPSAwLjEgaW4KICBsZXQg eSA9IGxvb3AgMTAyIGggMSAxLjAgMC4wIGluCiAgbGV0IGVyciA9IGFic19m bG9hdCAoeSAtLiAoZXhhY3QgKChmbG9hdF9vZl9pbnQgMTAyKSAqLiBoKSkp IGluCiAgbGV0IGxhcmdlID0gMTAwMDAwMDAgaW4KICBsZXQgeSA9IGxvb3Ag bGFyZ2UgaCAxIDEuMCAwLjAgaW4KICBQcmludGYucHJpbnRmICIlYlxuIgog ICAgKGFic19mbG9hdCAoeSAtLiAoZXhhY3QgKGZsb2F0X29mX2ludCBsYXJn ZSkgKi4gaCkpIDwgMi4gKi4gZXJyKQo= --------------ABC8441E5A3BA9141C1CD635-- --uHd2LKFusipX3t3KWUSqIuiMcM6T1vdgB-- --Mss5QkO61HMJLrJsbGGcT1OrXKsVwFhmJ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlhVN4EACgkQhMBO4cVSGS+CjwCgsU3f5aLDo6vu9b9BfUTrMUm6 M3IAnjK5ElXYEx2lYmNqUXDxGpTPFWMI =C4JJ -----END PGP SIGNATURE----- --Mss5QkO61HMJLrJsbGGcT1OrXKsVwFhmJ--