From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 7B20B7F615 for ; Mon, 19 Dec 2016 18:09:22 +0100 (CET) Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=frederic.bour@lakaban.net; spf=Pass smtp.mailfrom=frederic.bour@lakaban.net; spf=None smtp.helo=postmaster@mail.lakaban.net Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of frederic.bour@lakaban.net) identity=pra; client-ip=213.251.185.180; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="frederic.bour@lakaban.net"; x-sender="frederic.bour@lakaban.net"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of frederic.bour@lakaban.net designates 213.251.185.180 as permitted sender) identity=mailfrom; client-ip=213.251.185.180; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="frederic.bour@lakaban.net"; x-sender="frederic.bour@lakaban.net"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail.lakaban.net) identity=helo; client-ip=213.251.185.180; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="frederic.bour@lakaban.net"; x-sender="postmaster@mail.lakaban.net"; x-conformance=sidf_compatible IronPort-PHdr: =?us-ascii?q?9a23=3A+Jg0mRbn/LKsxJ6yX9PRYIH/LSx+4OfEezUN459i?= =?us-ascii?q?sYplN5qZpc67bnLW6fgltlLVR4KTs6sC0LuN9f+/Ej1fqdbZ6TZZL8wKD0dEwe?= =?us-ascii?q?wt3CUeQ+e9QXXhK/DrayFoVO9jb3RCu0+BDE5OBczlbEfTqHDhpRQbGxH4KBYn?= =?us-ascii?q?br+tQt2ap42N2uuz45zeZRlTzHr4OOsqbUb+kQKEmcQMjcNZNqE10h7ApH0AL+?= =?us-ascii?q?VQy2RAKl+Jk1Pn+sC05Jtq9SIWt/93pOBaVqCvUb65QT1CDT9uGWco/oW/uwPO?= =?us-ascii?q?QQaV53BaWGILiDJMCgvM5hj8GJH8rn2p5aJGxCCGMJiuHvgPUjO44vIuEUewhQ?= =?us-ascii?q?=3D=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0BQAQBIE1hY/7S5+9VdGgEBAQECAQEBA?= =?us-ascii?q?QgBAQEBFQEBAQECAQEBAQgBAQEBgkhEAQEBAQF5gQaNT3OVZpUOggoqhXgCgkE?= =?us-ascii?q?UAQEBAQEBAQEBAQFhKIIzBAEVAQSCFwEDAQEjHQEBBwwkAQQLCw4qAQYDAgJGA?= =?us-ascii?q?wENBhMJiFoMCqcAaIIogwwBAQV+hm0BAQEBAQEBAwEBAQEBAQEBAQEBFQiIMwi?= =?us-ascii?q?BTIEIhEwhgh8LLS2CMIhnjCGFbYZSgxKDYY4Nhi+OBRSEDx83gVAPAYE8ggKCC?= =?us-ascii?q?XEBhkwqRIFPAQEB?= X-IPAS-Result: =?us-ascii?q?A0BQAQBIE1hY/7S5+9VdGgEBAQECAQEBAQgBAQEBFQEBAQE?= =?us-ascii?q?CAQEBAQgBAQEBgkhEAQEBAQF5gQaNT3OVZpUOggoqhXgCgkEUAQEBAQEBAQEBA?= =?us-ascii?q?QFhKIIzBAEVAQSCFwEDAQEjHQEBBwwkAQQLCw4qAQYDAgJGAwENBhMJiFoMCqc?= =?us-ascii?q?AaIIogwwBAQV+hm0BAQEBAQEBAwEBAQEBAQEBAQEBFQiIMwiBTIEIhEwhgh8LL?= =?us-ascii?q?S2CMIhnjCGFbYZSgxKDYY4Nhi+OBRSEDx83gVAPAYE8ggKCCXEBhkwqRIFPAQE?= =?us-ascii?q?B?= X-IronPort-AV: E=Sophos;i="5.33,374,1477954800"; d="scan'208,217";a="250731587" Received: from pepper.lakaban.net (HELO mail.lakaban.net) ([213.251.185.180]) by mail2-smtp-roc.national.inria.fr with ESMTP; 19 Dec 2016 18:09:21 +0100 Received: from [192.168.1.33] (62.red-79-147-147.dynamicip.rima-tde.net [79.147.147.62]) (Authenticated sender: defre@ygg-drasil.fr) by mail.lakaban.net (Postfix) with ESMTPSA id 282D98A005B; Mon, 19 Dec 2016 17:08:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=lakaban.net; s=default; t=1482167325; bh=4ndKFshk2U10IhItIF+E7YPaND1XwhJLoH3DqMNQyc8=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=izngoWaZ8vUDkKxK+oeTeki/fr5yGLYKKDGYM/bJ8YttUlGiDE1eVVsGWBF7F9Le/ E9i8d7LALkOGzmyGMjH1C2qdpOZ8On6Z16yhcJzvbrkaUQUWc5olkXNKHfnwJKYiVB OM7PgJxwPBsMYJPlEj1UUK+nLHf/PP60H2mHvQkQ= From: =?utf-8?B?RnLDqWTDqXJpYyBCb3Vy?= Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_F5A0DB06-AB29-4158-92E0-FF8DE10BABAC" Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\)) Date: Mon, 19 Dec 2016 18:09:18 +0100 In-Reply-To: <1482165686.4629.28.camel@gerd-stolpmann.de> Cc: "Soegtrop, Michael" , =?utf-8?Q?Christoph_H=C3=B6ger?= , "caml-list@inria.fr" To: Gerd Stolpmann References: <7bc766a2-d460-524b-35ca-89609a34b719@tu-berlin.de> <1482148297.4629.19.camel@gerd-stolpmann.de> <0F7D3B1B3C4B894D824F5B822E3E5A172CFB9581@IRSMSX102.ger.corp.intel.com> <1482165686.4629.28.camel@gerd-stolpmann.de> X-Mailer: Apple Mail (2.3251) Subject: Re: [Caml-list] Closing the performance gap to C --Apple-Mail=_F5A0DB06-AB29-4158-92E0-FF8DE10BABAC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 OCamlopt is able to spill floating point register. You can even see with -dalloc that the code will spill a floating point reg= ister in the loop. The problem observed is not because of spilling but simply because of float= boxing and compilation of recursive calls. The loop seems to compile down to an efficient code ended by a jump, but fl= oat unboxing is done in a much earlier pass in the compiler (cmm). Passing -dcmm to the compiler, we can see that before the recursive call to= loop the float is boxed again. At this point, it is just a regular ocaml function call, taking boxed float. A simpler code to observe this behavior: let rec test f =3D test (f +. 1.0) let () =3D test 0.0 will box at every iteration. > Le 19 d=C3=A9c. 2016 =C3=A0 17:41, Gerd Stolpmann a =C3=A9crit : >=20 > Michael, >=20 > look here, it's the "definitive source": > https://github.com/ocaml/ocaml/blob/trunk/asmcomp/amd64/proc.ml >=20 > Windows is in deed different. I don't have enough insight into the > register spilling algorithm to say whether this has a significant > effect, though. OCaml code never keeps registers alive, and thus I > would bet the spilling algorithm is designed for that, and it is > probably not tried to move values preferably to xmm6-15 in order to > avoid spilling during C calls. But that's just a hypothesis... Does > somebody know? >=20 > Gerd >=20 >=20 > Am Montag, den 19.12.2016, 14:52 +0000 schrieb Soegtrop, Michael: >> Dear Gerd, >>=20 >>>=20 >>> When you call a C function like cos it is likely that this >>> happens because the C calling conventions do not preserve the FP >>> registers >>> (xmm*). This could be improved if the OCaml compiler tried >>> alternate places for >>> temporarily storing FP values: >> For Windows this doesn't seem to be true. See e.g.: >>=20 >> https://msdn.microsoft.com/en-us/library/ms235286.aspx >>=20 >> which states that XMM0..XMM5 are volatile, while XMM6..XMM15 must be >> preserved. >>=20 >> I think for Linix you are right. I couldn't find a better reference >> than Wikipedia: >>=20 >> https://en.wikipedia.org/wiki/X86_calling_conventions >>=20 >> see "System V AMD64 ABI" there. >>=20=20 >> This reference contains a good overview, which matches the above data >> in table 4: >>=20 >> http://www.agner.org/optimize/calling_conventions.pdf >>=20 >> So on Windows, there is for sure no need to save XMM6..XMM15, while >> on Linux this seems to be an issue. >>=20 >> Best regards, >>=20 >> Michael >> Intel Deutschland GmbH >> Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany >> Tel: +49 89 99 8853-0, www.intel.de >> Managing Directors: Christin Eisenschmid, Christian Lamprechter >> Chairperson of the Supervisory Board: Nicole Lau >> Registered Office: Munich >> Commercial Register: Amtsgericht Muenchen HRB 186928 >>=20 > --=20 > ------------------------------------------------------------ > Gerd Stolpmann, Darmstadt, Germany gerd@gerd-stolpmann.de > My OCaml site: http://www.camlcity.org > Contact details: http://www.camlcity.org/contact.html > Company homepage: http://www.gerd-stolpmann.de > ------------------------------------------------------------ --Apple-Mail=_F5A0DB06-AB29-4158-92E0-FF8DE10BABAC Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 OCamlopt is able t= o spill floating point register.
You can even see with -dall= oc that the code will spill a floating point register in the loop.

The problem observed is n= ot because of spilling but simply because of float boxing and compilation o= f recursive calls.
The loop seems to compile down to a= n efficient code ended by a jump, but float unboxing is done in a much earl= ier pass in the compiler (cmm).

<= div class=3D"">Passing -dcmm to the compiler, we can see that before the re= cursive call to loop the float is boxed again.
At this= point, it is just a regular ocaml function call, taking boxed float.
=

A simpler code to obse= rve this behavior:

let rec test f = =3D
&nb= sp; test (f +. 1.0)

let () =3D test 0.0

will box at every it= eration.

<= div>
Le 19 d=C3=A9c. 20= 16 =C3=A0 17:41, Gerd Stolpmann <info@gerd-stolpmann.de> a =C3=A9crit :

Michael,

look = here, it's the "definitive source":
https://github.com/ocaml/ocaml/blob/trunk/a= smcomp/amd64/proc.ml

Windows is in deed different.= I don't have enough insight into the
register spilling algorit= hm to say whether this has a significant
effect, though. OCaml co= de never keeps registers alive, and thus I
would bet the spilling= algorithm is designed for that, and it is
probably not tried to = move values preferably to xmm6-15 in order to
avoid spilling duri= ng C calls. But that's just a hypothesis... Does
somebody know?

Gerd

<= br style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; fo= nt-variant-caps: normal; font-weight: normal; letter-spacing: normal; orpha= ns: auto; text-align: start; text-indent: 0px; text-transform: none; white-= space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: = 0px;" class=3D"">Am Montag, den 19.12.2016, 14:52 +0000 schrieb Soegtrop, Michael:=
Dear Gerd,


When you call a C function like cos it is l= ikely that this
happens because the C calling conventions do = not preserve the FP
registers
(xmm*). This coul= d be improved if the OCaml compiler tried
alternate places fo= r
temporarily storing FP values:
F= or Windows this doesn't seem to be true. See e.g.:

https://msdn.microsoft.com/en-us/library/ms235286.aspx

which states that XMM0..XMM5 are volatile, while XMM6= ..XMM15 must be
preserved.

I thi= nk for Linix you are right. I couldn't find a better reference
than Wikipedia:

https://en.wikipedia.org/wik= i/X86_calling_conventions

see "System V AMD64 = ABI" there.
 
This reference contains a go= od overview, which matches the above data
in table 4:

http://www.agner.org/optimize/calling_conventions.pdf=

So on Windows, there is for sure no need to s= ave XMM6..XMM15, while
on Linux this seems to be an issue.
Best regards,

Micha= el
Intel Deutschland GmbH
Registered Address: A= m Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853= -0, www.intel.de
Managing Directors: Christin Eisenschmid, Ch= ristian Lamprechter
Chairperson of the Supervisory Board: Nic= ole Lau
Registered Office: Munich
Commercial Re= gister: Amtsgericht Muenchen HRB 186928

-- 
------------------= ------------------------------------------
Gerd Stolpmann, Darmst= adt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          =
http://www.camlcity.org
Contact details:   =      http://www.camlcit= y.org/contact.html
Company homepage:      &= nbsp;http://www.gerd-stolpmann.de
= --------------= ----------------------------------------------

= --Apple-Mail=_F5A0DB06-AB29-4158-92E0-FF8DE10BABAC--