From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id B43B180198 for ; Tue, 11 Jul 2017 20:55:30 +0200 (CEST) Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=ivg@ieee.org; spf=Pass smtp.mailfrom=ivg@ieee.org; spf=None smtp.helo=postmaster@mail-yb0-f170.google.com Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of ivg@ieee.org) identity=pra; client-ip=209.85.213.170; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of ivg@ieee.org designates 209.85.213.170 as permitted sender) identity=mailfrom; client-ip=209.85.213.170; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-yb0-f170.google.com) identity=helo; client-ip=209.85.213.170; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="ivg@ieee.org"; x-sender="postmaster@mail-yb0-f170.google.com"; x-conformance=sidf_compatible IronPort-PHdr: =?us-ascii?q?9a23=3AIuJIqhBNPKtTP7bZy6pHUyQJP3N1i/DPJgcQr6Af?= =?us-ascii?q?oPdwSPTzoMbcNUDSrc9gkEXOFd2CrakV1KyO6+jJYi8p2d65qncMcZhBBVcuqP?= =?us-ascii?q?49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6?= =?us-ascii?q?JvjvGo7Vks+7y/2+94fdbghMhjexe69+IAmrpgjNq8cahpdvJLwswRXTuHtIfO?= =?us-ascii?q?pWxWJsJV2Nmhv3+9m98p1+/SlOovwt78FPX7n0cKQ+VrxYES8pM3sp683xtBnM?= =?us-ascii?q?VhWA630BWWgLiBVIAgzF7BbnXpfttybxq+Rw1DWGMcDwULs5Qiqp4bt1RxD0iS?= =?us-ascii?q?cHLz85/3/Risxsl6JQvRatqwViz4LIfI2ZMfxzcaTAc9MHXmpBRtheWDBdAo2y?= =?us-ascii?q?aIsPCvAOPeder4Lgo1cDoh+zCQyqCejyyDFHm2X20LU43OQvEQ/I0g8uEc8Qvn?= =?us-ascii?q?vIt9j6LrseXPqvwaXU0TnObfVb0ir95ojSdRAhpOmBU7FuccXLz0kkCgLLjlKM?= =?us-ascii?q?qYziITOayuQNs2mH7+p7SOmijG8nqx9+ojW0x8cjlJfGiZwPxlDD7yV5z584KN?= =?us-ascii?q?ulQ0B1Zt6kFYFftyCcN4ZuQ8MiRXtouCcgxbEct567ZjAGyJAgxx7Yc/yLaY2I?= =?us-ascii?q?4hb7WOaeIDd4mHJleK+kiBqo7Uegzej8WtG00VlQripFld7MumoR2BzU78iLUv?= =?us-ascii?q?V88Vm82TaUyQ/T8u5EIVgumaraLZ4hzLkwm5wOukrABi/7gFv6gLOSe0k++eWl?= =?us-ascii?q?6/7rbqv7qpKSLYN4lwPzPrgol8eiG+o3KBIOUHKe+emk1L3s40n5QLJSg/0ziK?= =?us-ascii?q?bZsZTaKd0bp6GiHwNZy4gj5wu9Aju6ytgYkn4HLFVKeBKDkYflIU3BIPf9Dfun?= =?us-ascii?q?glSslilkx+zeM7H/HpnAKmLPnbThcLpn9UJQ1QQ+wcpC659WFr0NOPfzVVXwtN?= =?us-ascii?q?zcAB85KQu0w+P/BdV8yIMeVnmCAq6HP6zMr1CE/OUvI/ODZIMNojbyN+Al5+Ly?= =?us-ascii?q?jX8+gVISYbOm3Z4TaHyhGvRmIl6ZYWb3j9caEWYKuxI+Q/bwhF2DVz5TfXeyUL?= =?us-ascii?q?gm6jE1EoL1RbvEE7GqnLWElA2yBJtRdygSG1eQGHLifp+CXOYkZyebI8snmTsB?= =?us-ascii?q?A+uPUYgkgDCwvQmy5L17KfTf+iwErtq31cZ6z+zemBx08iZ7WZfOm1qRRn15yz?= =?us-ascii?q?tbDwQ927py9Akgk1o=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0C4AAD+HWVZhqrVVdFdGwEBAQMBAQEJA?= =?us-ascii?q?QEBFwEBBAEBCgEBhBOBFAeEbTKaUoJChWyPBwNcLIVKAoM3B0IVAQEBAQEBAQE?= =?us-ascii?q?BAQESAQEBCAsLCCgvgjMkgkEBAQEBAgEjBBkBASwCCQEECwsLDQ0dAgIhARIBB?= =?us-ascii?q?QEKEgYTEooFAwgFCBCfMD+LH2uBbDqDBQEBBYQlAwqDZAEBAQEBAQQBAQEBAQE?= =?us-ascii?q?aCBKDFoNNg3mBDIJXgkCCZoJhiAEMlmE7h0iDRYQRhG6CY49Ci32IAhQfgRUPJ?= =?us-ascii?q?oEsgQIISRoGhDoqDxAMggMkNgGILQEBAQ?= X-IPAS-Result: =?us-ascii?q?A0C4AAD+HWVZhqrVVdFdGwEBAQMBAQEJAQEBFwEBBAEBCgE?= =?us-ascii?q?BhBOBFAeEbTKaUoJChWyPBwNcLIVKAoM3B0IVAQEBAQEBAQEBAQESAQEBCAsLC?= =?us-ascii?q?CgvgjMkgkEBAQEBAgEjBBkBASwCCQEECwsLDQ0dAgIhARIBBQEKEgYTEooFAwg?= =?us-ascii?q?FCBCfMD+LH2uBbDqDBQEBBYQlAwqDZAEBAQEBAQQBAQEBAQEaCBKDFoNNg3mBD?= =?us-ascii?q?IJXgkCCZoJhiAEMlmE7h0iDRYQRhG6CY49Ci32IAhQfgRUPJoEsgQIISRoGhDo?= =?us-ascii?q?qDxAMggMkNgGILQEBAQ?= X-IronPort-AV: E=Sophos;i="5.40,347,1496095200"; d="scan'208,217";a="283005928" Received: from mail-yb0-f170.google.com ([209.85.213.170]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 11 Jul 2017 20:55:28 +0200 Received: by mail-yb0-f170.google.com with SMTP id f194so326567yba.3 for ; Tue, 11 Jul 2017 11:55:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ieee-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=YpDKc3H+TUz6VI0+ZrDAz9nthFHsh30gNTZvuknJ9XA=; b=lzkkiyBYDtC8fffPnODTrRoeZDak/AUjKzB5dUDk6p7xbqodJwnXJYiGI5Jn5PCr7T dPPX7ehxc/1L4PELER6h9kGVnzMsuGwvsaMPXmYAAF5utoC6Lu/stXRrxPyQHig5GjWI 3svtrgcswQi6yuHWK9isCeyA459DQPs7m55cKtbHfR2Grs1fHOO0OIccnGseB5HOQTQv WoG4MzJxF6RB8nVxaJ2LlBZUFgWeSrRTwY0MTEPleEQtuP0cP3eOaOjZhojoIa30tPrH qod+FnnhArGsP0jA0GDLh5khE/moCwl5rd2APC/QFa8cA0prFqnH8nSFELNeXCfWfE3W 0NoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=YpDKc3H+TUz6VI0+ZrDAz9nthFHsh30gNTZvuknJ9XA=; b=l7JqGxPaMPmX+rjNYTMa0SVA/5zRm7A3WsVtJqSqdX66qECPb57PLL+CtJjRuuP5/B +c2mH2WkmQGBrdMSyFqtiCVi7lU0wxsNBXCWWkgStlJg98CrUOHyRtdm+sYWFaXyjcFP 40s0ja6edj54WWu0rTuOapwLaGhdv+PFgDK3wQApqxTDp1mpELGHEKBX3wtduYW3Gdn7 FOF2LL9Q+QSvL1YXWr0kQSym3cNG/dwegm+vYvKMbbVI49IAEzp9OZBnip31dxHYE2WW h4nWM9KWV+tgIst0rDAUXJ6exhG1Vd30MVA8ZMcPNBkxSL9lg6D+l2CQnVVbWbEe6GZS TeVg== X-Gm-Message-State: AIVw113x1WutPBvU7LaI5NkU7tc8Oa7/RBdlea/n+9XdkKCNvpjMxO44 zBCu1WE6tj2AuS0A9uet2dORRaymE8Fz X-Received: by 10.37.179.27 with SMTP id l27mr7476ybj.10.1499799327361; Tue, 11 Jul 2017 11:55:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.39.7 with HTTP; Tue, 11 Jul 2017 11:55:26 -0700 (PDT) In-Reply-To: References: <20170711125438.GP21228@nunchakus.loria.fr> From: Ivan Gotovchits Date: Tue, 11 Jul 2017 14:55:26 -0400 Message-ID: To: Yotam Barnoy Cc: Gabriel Scherer , Simon Cruanes , Alexey Egorov , caml users Content-Type: multipart/alternative; boundary="f403045f3f0a24967605540f3e65" Subject: Re: [Caml-list] Optimizing pure-functional streams --f403045f3f0a24967605540f3e65 Content-Type: text/plain; charset="UTF-8" Yep, moving to boxing increased the time by ten times. However, I found a strange behavior, that might be a bug, the following program: open Int64 let (=) = equal let (+) = add let (mod) = rem let ( * ) = mul let loop high = let rec loop i = function | t when i > high -> t | t when i mod 2L = 0L -> loop (i + 1L) (t + i * 2L) | t -> loop (i + 1L) t in loop 1L 0L let high = 1000L * 1000L * 1000L let _ = Printf.printf "%Ld\n" (loop high) Didn't compile with the following _tags file: true : optimize(3), unbox_closures, optimization_rounds(8), inline_max_depth(8), inline_max_unroll(2) And terminates with the "Fatal error: exception Stack overflow" message. With higher values of the inline_max_unroll parameter, it doesn't terminate on mine machine in reasonable time and space. The most surprising is not that, but that the offending line is `let (=) = equal`. Without this line, the program compiles without any issues. Best, Ivan P.S. Actually, I'm playing with flambda and our monads library, that is highly functorized and abstracted. I reimplemented this loop example using the State monad and got about 10 times better performance with flambda in comparison to a compiler without flambda enabled (from 40 seconds to 4 seconds). That makes my happy panda :) It is still 4 times slower than the regular version, but I'm ready to pay this cost. On Tue, Jul 11, 2017 at 2:15 PM, Yotam Barnoy wrote: > > Moving from tagging to boxing does not make much sense to me, because > the techniques to avoid either are similar and boxing is more > expensive. It might be the case that the compiler today is better at > unboxing than untagging (it currently tries to do both of these > locally), but that would only be because boxing is more expensive and > thus more effort was spent on the unboxing, but medium-term one could > hope for a uniform approach to both transformations. > > Absolutely. I believe there was some skepticism expressed about the > need for untagging in the past by some members of the dev team, which > is why I brought it up. I suggested Int64 only because it's possible > that Flambda will *currently* do better with it than with tagged > integers. > > On Tue, Jul 11, 2017 at 2:04 PM, Gabriel Scherer > wrote: > > Moving from tagging to boxing does not make much sense to me, because > > the techniques to avoid either are similar and boxing is more > > expensive. It might be the case that the compiler today is better at > > unboxing than untagging (it currently tries to do both of these > > locally), but that would only be because boxing is more expensive and > > thus more effort was spent on the unboxing, but medium-term one could > > hope for a uniform approach to both transformations. > > > > > > On Tue, Jul 11, 2017 at 1:46 PM, Yotam Barnoy > wrote: > >>> I've played a little bit with different optimization options in > flambda 4.04, and finally, all three versions of the loop: curried, > uncurried, and the for-loop, have the same performance, though they still > loose about 30% to the C version, due to tagging. > >> > >> Would it perhaps make sense to try Int64 in order to avoid the tagging? > >> > >> Also, I believe it would make sense to have Flambda try to switch to > >> an untagged type in tight loops to avoid this tagging cost. > >> > >> On Tue, Jul 11, 2017 at 1:37 PM, Ivan Gotovchits wrote: > >>> TWIMC, > >>> > >>> I've played a little bit with different optimization options in flambda > >>> 4.04, and finally, all three versions of the loop: curried, uncurried, > and > >>> the for-loop, have the same performance, though they still loose about > 30% > >>> to the C version, due to tagging. > >>> > >>> Basically, this means, that flambda was able to get rid of the > allocation. I > >>> don't actually know which of the options finally made the difference, > but > >>> this is how I compiled it. > >>> > >>> ocamlopt.opt -c -S -inlining-report -unbox-closures -O3 -rounds 8 > >>> -inline-max-depth 256 -inline-max-unroll 1024 -o loop.cmx loop.ml > >>> ocamlopt.opt loop.cmx -o loop.native > >>> > >>> > >>> Regards, > >>> Ivan > >>> > >>> > >>> > >>> > >>> On Tue, Jul 11, 2017 at 8:54 AM, Simon Cruanes < > simon.cruanes.2007@m4x.org> > >>> wrote: > >>>> > >>>> Hello, > >>>> > >>>> Iterators in OCaml have been the topic of many discussions. Another > >>>> option for fast iterators is https://github.com/c-cube/sequence , > >>>> which (with flambda) should compile down to loops and tests on this > kind > >>>> of benchmark. With the attached additional file on 4.04.0+flambda, > >>>> I obtain the following (where sequence is test-seq): > >>>> > >>>> $ for i in test-* ; do echo $i ; time ./$i ; done > >>>> test-c_loop > >>>> 5000000100000000 > >>>> ./$i 0.08s user 0.00s system 97% cpu 0.085 total > >>>> test-f_loop > >>>> 5000000100000000 > >>>> ./$i 0.10s user 0.00s system 96% cpu 0.100 total > >>>> test-loop > >>>> 5000000100000000 > >>>> ./$i 0.18s user 0.00s system 97% cpu 0.184 total > >>>> test-seq > >>>> 5000000100000000 > >>>> ./$i 0.11s user 0.00s system 97% cpu 0.113 total > >>>> test-stream > >>>> 5000000100000000 > >>>> ./$i 0.44s user 0.00s system 98% cpu 0.449 total > >>>> > >>>> > >>>> Note that sequence is imperative underneath, but can be safely used > as a > >>>> functional structure. > >>>> > >>>> -- > >>>> Simon Cruanes > >>>> > >>>> http://weusepgp.info/ > >>>> key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3 7D8D 4AC0 1D08 > 49AA > >>>> 62B6 > >>> > >>> > >> > >> -- > >> Caml-list mailing list. Subscription management and archives: > >> https://sympa.inria.fr/sympa/arc/caml-list > >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > >> Bug reports: http://caml.inria.fr/bin/caml-bugs > --f403045f3f0a24967605540f3e65 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Yep, moving to boxing increased the time by ten times. How= ever, I found a strange behavior, that might be a bug, the following progra= m:

open Int64=

=
let (=3D) =3D equal
let (+) =3D ad= d
let (mod)= =3D rem
le= t ( * ) =3D mul

l= et loop high =3D
=C2=A0 let rec loop i =3D function
=C2=A0 =C2=A0 | t when i > high -> t
=C2=A0 =C2=A0= | t when i mod 2L =3D 0L -> loop (i + 1L) (t + i * 2L)
=C2=A0 =C2=A0 | t -> loo= p (i + 1L) t in
=C2=A0 loop 1L 0L

let high =3D 1000L * 1000L * 1000L
let _ =3D Printf.printf "%Ld\n" (lo= op high)

Didn't comp= ile with the following _tags file:

true : optimize(3), unbox_closures, optimization_rounds(8),= inline_max_depth(8), inline_max_unroll(2)

And terminates with the "Fatal error: exce= ption Stack overflow" message. With higher values of the inline_max_un= roll parameter, it doesn't terminate on mine machine in reasonable time= and space.=C2=A0


The most surprisi= ng is not that, but that the offending line=C2=A0is `let (=3D) =3D equal`. = Without this line, the program compiles without any issues.=C2=A0

Best,
Ivan

P.S. Actually= , I'm playing with flambda=C2=A0and our monads library, that is highly = functorized=C2=A0and abstracted. I reimplemented this loop example using th= e State monad and got about 10 times better performance with flambda in com= parison to a compiler without flambda=C2=A0enabled (from 40 seconds to 4 se= conds). That makes my happy panda :) It is still 4 times slower than the re= gular version, but I'm ready to pay this cost.=C2=A0




On Tue, Jul 11, 2017 at 2:15 PM, Yotam Barnoy <yo= tambarnoy@gmail.com> wrote:
> Moving from tagging to boxing does not make much sen= se to me, because
the techniques to avoid either are similar and boxing is more
expensive. It might be the case that the compiler today is better at
unboxing than untagging (it currently tries to do both of these
locally), but that would only be because boxing is more expensive and
thus more effort was spent on the unboxing, but medium-term one could
hope for a uniform approach to both transformations.

Absolutely. I believe there was some skepticism expressed about the<= br> need for untagging in the past by some members of the dev team, which
is why I brought it up. I suggested Int64 only because it's possible
that Flambda will *currently* do better with it than with tagged
integers.

On Tue, Jul 11, 2017 at 2:04 PM, Gabriel Scherer
<gabriel.scherer@gmail.com<= /a>> wrote:
> Moving from tagging to boxing does not make much sense to me, because<= br> > the techniques to avoid either are similar and boxing is more
> expensive. It might be the case that the compiler today is better at > unboxing than untagging (it currently tries to do both of these
> locally), but that would only be because boxing is more expensive and<= br> > thus more effort was spent on the unboxing, but medium-term one could<= br> > hope for a uniform approach to both transformations.
>
>
> On Tue, Jul 11, 2017 at 1:46 PM, Yotam Barnoy <
yotambarnoy@gmail.com> wrote:
>>> I've played a little bit with different optimization optio= ns in flambda 4.04, and finally, all three versions of the loop: curried, u= ncurried, and the for-loop, have the same performance, though they still lo= ose about 30% to the C version, due to tagging.
>>
>> Would it perhaps make sense to try Int64 in order to avoid the tag= ging?
>>
>> Also, I believe it would make sense to have Flambda try to switch = to
>> an untagged type in tight loops to avoid this tagging cost.
>>
>> On Tue, Jul 11, 2017 at 1:37 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
>>> TWIMC,
>>>
>>> I've played a little bit with different optimization optio= ns in flambda
>>> 4.04, and finally, all three versions of the loop: curried, un= curried, and
>>> the for-loop, have the same performance, though they still loo= se about 30%
>>> to the C version, due to tagging.
>>>
>>> Basically, this means, that flambda was able to get rid of the= allocation. I
>>> don't actually know which of the options finally made the = difference, but
>>> this is how I compiled it.
>>>
>>> ocamlopt.opt -c -S -inlining-report -unbox-closures -O3 -round= s 8
>>> -inline-max-depth 256 -inline-max-unroll 1024 -o loop.cmx loop.ml
>>> ocamlopt.opt loop.cmx -o loop.native
>>>
>>>
>>> Regards,
>>> Ivan
>>>
>>>
>>>
>>>
>>> On Tue, Jul 11, 2017 at 8:54 AM, Simon Cruanes <simon.cruanes.2007@m4x.org>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Iterators in OCaml have been the topic of many discussions= . Another
>>>> option for fast iterators is https://github.com/c= -cube/sequence ,
>>>> which (with flambda) should compile down to loops and test= s on this kind
>>>> of benchmark. With the attached additional file on 4.04.0+= flambda,
>>>> I obtain the following (where sequence is test-seq):
>>>>
>>>> $ for i in test-* ; do echo $i ; time ./$i ; done
>>>> test-c_loop
>>>> 5000000100000000
>>>> ./$i=C2=A0 0.08s user 0.00s system 97% cpu 0.085 total
>>>> test-f_loop
>>>> 5000000100000000
>>>> ./$i=C2=A0 0.10s user 0.00s system 96% cpu 0.100 total
>>>> test-loop
>>>> 5000000100000000
>>>> ./$i=C2=A0 0.18s user 0.00s system 97% cpu 0.184 total
>>>> test-seq
>>>> 5000000100000000
>>>> ./$i=C2=A0 0.11s user 0.00s system 97% cpu 0.113 total
>>>> test-stream
>>>> 5000000100000000
>>>> ./$i=C2=A0 0.44s user 0.00s system 98% cpu 0.449 total
>>>>
>>>>
>>>> Note that sequence is imperative underneath, but can be sa= fely used as a
>>>> functional structure.
>>>>
>>>> --
>>>> Simon Cruanes
>>>>
>>>> http://weusepgp.info/
>>>> key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3=C2=A0 7= D8D 4AC0 1D08 49AA
>>>> 62B6
>>>
>>>
>>
>> --
>> Caml-list mailing list.=C2=A0 Subscription management and archives= :
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list:
http://groups.yahoo.com/g= roup/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs<= br>

--f403045f3f0a24967605540f3e65--