From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ivg@ieee.org>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by sympa.inria.fr (Postfix) with ESMTPS id B43B180198
	for <caml-list@sympa.inria.fr>; Tue, 11 Jul 2017 20:55:30 +0200 (CEST)
Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=ivg@ieee.org; spf=Pass smtp.mailfrom=ivg@ieee.org; spf=None smtp.helo=postmaster@mail-yb0-f170.google.com
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  ivg@ieee.org) identity=pra; client-ip=209.85.213.170;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org";
  x-conformance=sidf_compatible
Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of
  ivg@ieee.org designates 209.85.213.170 as permitted sender)
  identity=mailfrom; client-ip=209.85.213.170;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org";
  x-conformance=sidf_compatible; x-record-type="v=spf1"
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@mail-yb0-f170.google.com) identity=helo;
  client-ip=209.85.213.170;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="ivg@ieee.org";
  x-sender="postmaster@mail-yb0-f170.google.com";
  x-conformance=sidf_compatible
IronPort-PHdr: =?us-ascii?q?9a23=3AIuJIqhBNPKtTP7bZy6pHUyQJP3N1i/DPJgcQr6Af?=
 =?us-ascii?q?oPdwSPTzoMbcNUDSrc9gkEXOFd2CrakV1KyO6+jJYi8p2d65qncMcZhBBVcuqP?=
 =?us-ascii?q?49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6?=
 =?us-ascii?q?JvjvGo7Vks+7y/2+94fdbghMhjexe69+IAmrpgjNq8cahpdvJLwswRXTuHtIfO?=
 =?us-ascii?q?pWxWJsJV2Nmhv3+9m98p1+/SlOovwt78FPX7n0cKQ+VrxYES8pM3sp683xtBnM?=
 =?us-ascii?q?VhWA630BWWgLiBVIAgzF7BbnXpfttybxq+Rw1DWGMcDwULs5Qiqp4bt1RxD0iS?=
 =?us-ascii?q?cHLz85/3/Risxsl6JQvRatqwViz4LIfI2ZMfxzcaTAc9MHXmpBRtheWDBdAo2y?=
 =?us-ascii?q?aIsPCvAOPeder4Lgo1cDoh+zCQyqCejyyDFHm2X20LU43OQvEQ/I0g8uEc8Qvn?=
 =?us-ascii?q?vIt9j6LrseXPqvwaXU0TnObfVb0ir95ojSdRAhpOmBU7FuccXLz0kkCgLLjlKM?=
 =?us-ascii?q?qYziITOayuQNs2mH7+p7SOmijG8nqx9+ojW0x8cjlJfGiZwPxlDD7yV5z584KN?=
 =?us-ascii?q?ulQ0B1Zt6kFYFftyCcN4ZuQ8MiRXtouCcgxbEct567ZjAGyJAgxx7Yc/yLaY2I?=
 =?us-ascii?q?4hb7WOaeIDd4mHJleK+kiBqo7Uegzej8WtG00VlQripFld7MumoR2BzU78iLUv?=
 =?us-ascii?q?V88Vm82TaUyQ/T8u5EIVgumaraLZ4hzLkwm5wOukrABi/7gFv6gLOSe0k++eWl?=
 =?us-ascii?q?6/7rbqv7qpKSLYN4lwPzPrgol8eiG+o3KBIOUHKe+emk1L3s40n5QLJSg/0ziK?=
 =?us-ascii?q?bZsZTaKd0bp6GiHwNZy4gj5wu9Aju6ytgYkn4HLFVKeBKDkYflIU3BIPf9Dfun?=
 =?us-ascii?q?glSslilkx+zeM7H/HpnAKmLPnbThcLpn9UJQ1QQ+wcpC659WFr0NOPfzVVXwtN?=
 =?us-ascii?q?zcAB85KQu0w+P/BdV8yIMeVnmCAq6HP6zMr1CE/OUvI/ODZIMNojbyN+Al5+Ly?=
 =?us-ascii?q?jX8+gVISYbOm3Z4TaHyhGvRmIl6ZYWb3j9caEWYKuxI+Q/bwhF2DVz5TfXeyUL?=
 =?us-ascii?q?gm6jE1EoL1RbvEE7GqnLWElA2yBJtRdygSG1eQGHLifp+CXOYkZyebI8snmTsB?=
 =?us-ascii?q?A+uPUYgkgDCwvQmy5L17KfTf+iwErtq31cZ6z+zemBx08iZ7WZfOm1qRRn15yz?=
 =?us-ascii?q?tbDwQ927py9Akgk1o=3D?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0C4AAD+HWVZhqrVVdFdGwEBAQMBAQEJA?=
 =?us-ascii?q?QEBFwEBBAEBCgEBhBOBFAeEbTKaUoJChWyPBwNcLIVKAoM3B0IVAQEBAQEBAQE?=
 =?us-ascii?q?BAQESAQEBCAsLCCgvgjMkgkEBAQEBAgEjBBkBASwCCQEECwsLDQ0dAgIhARIBB?=
 =?us-ascii?q?QEKEgYTEooFAwgFCBCfMD+LH2uBbDqDBQEBBYQlAwqDZAEBAQEBAQQBAQEBAQE?=
 =?us-ascii?q?aCBKDFoNNg3mBDIJXgkCCZoJhiAEMlmE7h0iDRYQRhG6CY49Ci32IAhQfgRUPJ?=
 =?us-ascii?q?oEsgQIISRoGhDoqDxAMggMkNgGILQEBAQ?=
X-IPAS-Result: =?us-ascii?q?A0C4AAD+HWVZhqrVVdFdGwEBAQMBAQEJAQEBFwEBBAEBCgE?=
 =?us-ascii?q?BhBOBFAeEbTKaUoJChWyPBwNcLIVKAoM3B0IVAQEBAQEBAQEBAQESAQEBCAsLC?=
 =?us-ascii?q?CgvgjMkgkEBAQEBAgEjBBkBASwCCQEECwsLDQ0dAgIhARIBBQEKEgYTEooFAwg?=
 =?us-ascii?q?FCBCfMD+LH2uBbDqDBQEBBYQlAwqDZAEBAQEBAQQBAQEBAQEaCBKDFoNNg3mBD?=
 =?us-ascii?q?IJXgkCCZoJhiAEMlmE7h0iDRYQRhG6CY49Ci32IAhQfgRUPJoEsgQIISRoGhDo?=
 =?us-ascii?q?qDxAMggMkNgGILQEBAQ?=
X-IronPort-AV: E=Sophos;i="5.40,347,1496095200"; 
   d="scan'208,217";a="283005928"
Received: from mail-yb0-f170.google.com ([209.85.213.170])
  by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 11 Jul 2017 20:55:28 +0200
Received: by mail-yb0-f170.google.com with SMTP id f194so326567yba.3
        for <caml-list@inria.fr>; Tue, 11 Jul 2017 11:55:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=ieee-org.20150623.gappssmtp.com; s=20150623;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=YpDKc3H+TUz6VI0+ZrDAz9nthFHsh30gNTZvuknJ9XA=;
        b=lzkkiyBYDtC8fffPnODTrRoeZDak/AUjKzB5dUDk6p7xbqodJwnXJYiGI5Jn5PCr7T
         dPPX7ehxc/1L4PELER6h9kGVnzMsuGwvsaMPXmYAAF5utoC6Lu/stXRrxPyQHig5GjWI
         3svtrgcswQi6yuHWK9isCeyA459DQPs7m55cKtbHfR2Grs1fHOO0OIccnGseB5HOQTQv
         WoG4MzJxF6RB8nVxaJ2LlBZUFgWeSrRTwY0MTEPleEQtuP0cP3eOaOjZhojoIa30tPrH
         qod+FnnhArGsP0jA0GDLh5khE/moCwl5rd2APC/QFa8cA0prFqnH8nSFELNeXCfWfE3W
         0NoQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=YpDKc3H+TUz6VI0+ZrDAz9nthFHsh30gNTZvuknJ9XA=;
        b=l7JqGxPaMPmX+rjNYTMa0SVA/5zRm7A3WsVtJqSqdX66qECPb57PLL+CtJjRuuP5/B
         +c2mH2WkmQGBrdMSyFqtiCVi7lU0wxsNBXCWWkgStlJg98CrUOHyRtdm+sYWFaXyjcFP
         40s0ja6edj54WWu0rTuOapwLaGhdv+PFgDK3wQApqxTDp1mpELGHEKBX3wtduYW3Gdn7
         FOF2LL9Q+QSvL1YXWr0kQSym3cNG/dwegm+vYvKMbbVI49IAEzp9OZBnip31dxHYE2WW
         h4nWM9KWV+tgIst0rDAUXJ6exhG1Vd30MVA8ZMcPNBkxSL9lg6D+l2CQnVVbWbEe6GZS
         TeVg==
X-Gm-Message-State: AIVw113x1WutPBvU7LaI5NkU7tc8Oa7/RBdlea/n+9XdkKCNvpjMxO44
	zBCu1WE6tj2AuS0A9uet2dORRaymE8Fz
X-Received: by 10.37.179.27 with SMTP id l27mr7476ybj.10.1499799327361; Tue,
 11 Jul 2017 11:55:27 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.129.39.7 with HTTP; Tue, 11 Jul 2017 11:55:26 -0700 (PDT)
In-Reply-To: <CAN6ygO=k-GP6HwgpdsNgr2dU794oz_SpDcqND+LB07LQjUTfUw@mail.gmail.com>
References: <CAJannG4mcHu6rDsb+LBfydCmhhN1F+OW7pUWDyWTnkvRP6r46w@mail.gmail.com>
 <20170711125438.GP21228@nunchakus.loria.fr> <CALdWJ+y9Ez=26zpYYNb9OTYZ=aOZm-x4P14i-PmH3oYax0xxaA@mail.gmail.com>
 <CAN6ygOm+Jodg4YSUPDQruVMCZ-50Xnh7bO_XDaAzT_9MqaZOfQ@mail.gmail.com>
 <CAPFanBEJ0yAkf2E_B2kDX24jx7g4kKYkk8qkysJzuc81ntT1Dw@mail.gmail.com> <CAN6ygO=k-GP6HwgpdsNgr2dU794oz_SpDcqND+LB07LQjUTfUw@mail.gmail.com>
From: Ivan Gotovchits <ivg@ieee.org>
Date: Tue, 11 Jul 2017 14:55:26 -0400
Message-ID: <CALdWJ+ypjkGUvbHs_WrvPyF__q+PQZ8mTEicLWr-cuTNvhqoYA@mail.gmail.com>
To: Yotam Barnoy <yotambarnoy@gmail.com>
Cc: Gabriel Scherer <gabriel.scherer@gmail.com>, Simon Cruanes <simon.cruanes.2007@m4x.org>, 
	Alexey Egorov <alex.only.d@gmail.com>, caml users <caml-list@inria.fr>
Content-Type: multipart/alternative; boundary="f403045f3f0a24967605540f3e65"
Subject: Re: [Caml-list] Optimizing pure-functional streams

--f403045f3f0a24967605540f3e65
Content-Type: text/plain; charset="UTF-8"

Yep, moving to boxing increased the time by ten times. However, I found a
strange behavior, that might be a bug, the following program:

open Int64

let (=) = equal
let (+) = add
let (mod) = rem
let ( * ) = mul

let loop high =
  let rec loop i = function
    | t when i > high -> t
    | t when i mod 2L = 0L -> loop (i + 1L) (t + i * 2L)
    | t -> loop (i + 1L) t in
  loop 1L 0L

let high = 1000L * 1000L * 1000L
let _ = Printf.printf "%Ld\n" (loop high)


Didn't compile with the following _tags file:

true : optimize(3), unbox_closures, optimization_rounds(8),
inline_max_depth(8), inline_max_unroll(2)


And terminates with the "Fatal error: exception Stack overflow" message.
With higher values of the inline_max_unroll parameter, it doesn't terminate
on mine machine in reasonable time and space.


The most surprising is not that, but that the offending line is `let (=) =
equal`. Without this line, the program compiles without any issues.

Best,
Ivan

P.S. Actually, I'm playing with flambda and our monads library, that is
highly functorized and abstracted. I reimplemented this loop example using
the State monad and got about 10 times better performance with flambda in
comparison to a compiler without flambda enabled (from 40 seconds to 4
seconds). That makes my happy panda :) It is still 4 times slower than the
regular version, but I'm ready to pay this cost.


On Tue, Jul 11, 2017 at 2:15 PM, Yotam Barnoy <yotambarnoy@gmail.com> wrote:

> > Moving from tagging to boxing does not make much sense to me, because
> the techniques to avoid either are similar and boxing is more
> expensive. It might be the case that the compiler today is better at
> unboxing than untagging (it currently tries to do both of these
> locally), but that would only be because boxing is more expensive and
> thus more effort was spent on the unboxing, but medium-term one could
> hope for a uniform approach to both transformations.
>
> Absolutely. I believe there was some skepticism expressed about the
> need for untagging in the past by some members of the dev team, which
> is why I brought it up. I suggested Int64 only because it's possible
> that Flambda will *currently* do better with it than with tagged
> integers.
>
> On Tue, Jul 11, 2017 at 2:04 PM, Gabriel Scherer
> <gabriel.scherer@gmail.com> wrote:
> > Moving from tagging to boxing does not make much sense to me, because
> > the techniques to avoid either are similar and boxing is more
> > expensive. It might be the case that the compiler today is better at
> > unboxing than untagging (it currently tries to do both of these
> > locally), but that would only be because boxing is more expensive and
> > thus more effort was spent on the unboxing, but medium-term one could
> > hope for a uniform approach to both transformations.
> >
> >
> > On Tue, Jul 11, 2017 at 1:46 PM, Yotam Barnoy <yotambarnoy@gmail.com>
> wrote:
> >>> I've played a little bit with different optimization options in
> flambda 4.04, and finally, all three versions of the loop: curried,
> uncurried, and the for-loop, have the same performance, though they still
> loose about 30% to the C version, due to tagging.
> >>
> >> Would it perhaps make sense to try Int64 in order to avoid the tagging?
> >>
> >> Also, I believe it would make sense to have Flambda try to switch to
> >> an untagged type in tight loops to avoid this tagging cost.
> >>
> >> On Tue, Jul 11, 2017 at 1:37 PM, Ivan Gotovchits <ivg@ieee.org> wrote:
> >>> TWIMC,
> >>>
> >>> I've played a little bit with different optimization options in flambda
> >>> 4.04, and finally, all three versions of the loop: curried, uncurried,
> and
> >>> the for-loop, have the same performance, though they still loose about
> 30%
> >>> to the C version, due to tagging.
> >>>
> >>> Basically, this means, that flambda was able to get rid of the
> allocation. I
> >>> don't actually know which of the options finally made the difference,
> but
> >>> this is how I compiled it.
> >>>
> >>> ocamlopt.opt -c -S -inlining-report -unbox-closures -O3 -rounds 8
> >>> -inline-max-depth 256 -inline-max-unroll 1024 -o loop.cmx loop.ml
> >>> ocamlopt.opt loop.cmx -o loop.native
> >>>
> >>>
> >>> Regards,
> >>> Ivan
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jul 11, 2017 at 8:54 AM, Simon Cruanes <
> simon.cruanes.2007@m4x.org>
> >>> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> Iterators in OCaml have been the topic of many discussions. Another
> >>>> option for fast iterators is https://github.com/c-cube/sequence ,
> >>>> which (with flambda) should compile down to loops and tests on this
> kind
> >>>> of benchmark. With the attached additional file on 4.04.0+flambda,
> >>>> I obtain the following (where sequence is test-seq):
> >>>>
> >>>> $ for i in test-* ; do echo $i ; time ./$i ; done
> >>>> test-c_loop
> >>>> 5000000100000000
> >>>> ./$i  0.08s user 0.00s system 97% cpu 0.085 total
> >>>> test-f_loop
> >>>> 5000000100000000
> >>>> ./$i  0.10s user 0.00s system 96% cpu 0.100 total
> >>>> test-loop
> >>>> 5000000100000000
> >>>> ./$i  0.18s user 0.00s system 97% cpu 0.184 total
> >>>> test-seq
> >>>> 5000000100000000
> >>>> ./$i  0.11s user 0.00s system 97% cpu 0.113 total
> >>>> test-stream
> >>>> 5000000100000000
> >>>> ./$i  0.44s user 0.00s system 98% cpu 0.449 total
> >>>>
> >>>>
> >>>> Note that sequence is imperative underneath, but can be safely used
> as a
> >>>> functional structure.
> >>>>
> >>>> --
> >>>> Simon Cruanes
> >>>>
> >>>> http://weusepgp.info/
> >>>> key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3  7D8D 4AC0 1D08
> 49AA
> >>>> 62B6
> >>>
> >>>
> >>
> >> --
> >> Caml-list mailing list.  Subscription management and archives:
> >> https://sympa.inria.fr/sympa/arc/caml-list
> >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> >> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

--f403045f3f0a24967605540f3e65
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Yep, moving to boxing increased the time by ten times. How=
ever, I found a strange behavior, that might be a bug, the following progra=
m:<div><br></div><blockquote style=3D"margin:0px 0px 0px 40px;border:none;p=
adding:0px"><div><div><font face=3D"monospace, monospace">open Int64</font>=
</div></div><div><div><font face=3D"monospace, monospace"><br></font></div>=
</div><div><div><font face=3D"monospace, monospace">let (=3D) =3D equal</fo=
nt></div></div><div><div><font face=3D"monospace, monospace">let (+) =3D ad=
d</font></div></div><div><div><font face=3D"monospace, monospace">let (mod)=
 =3D rem</font></div></div><div><div><font face=3D"monospace, monospace">le=
t ( * ) =3D mul</font></div></div><div><div><font face=3D"monospace, monosp=
ace"><br></font></div></div><div><div><font face=3D"monospace, monospace">l=
et loop high =3D</font></div></div><div><div><font face=3D"monospace, monos=
pace">=C2=A0 let rec loop i =3D function</font></div></div><div><div><font =
face=3D"monospace, monospace">=C2=A0 =C2=A0 | t when i &gt; high -&gt; t</f=
ont></div></div><div><div><font face=3D"monospace, monospace">=C2=A0 =C2=A0=
 | t when i mod 2L =3D 0L -&gt; loop (i + 1L) (t + i * 2L)</font></div></di=
v><div><div><font face=3D"monospace, monospace">=C2=A0 =C2=A0 | t -&gt; loo=
p (i + 1L) t in</font></div></div><div><div><font face=3D"monospace, monosp=
ace">=C2=A0 loop 1L 0L</font></div></div><div><div><font face=3D"monospace,=
 monospace"><br></font></div></div><div><div><font face=3D"monospace, monos=
pace">let high =3D 1000L * 1000L * 1000L</font></div></div><div><div><font =
face=3D"monospace, monospace">let _ =3D Printf.printf &quot;%Ld\n&quot; (lo=
op high)</font></div></div></blockquote><div><br></div><div>Didn&#39;t comp=
ile with the following _tags file:</div><div><br></div><blockquote style=3D=
"margin:0px 0px 0px 40px;border:none;padding:0px"><div><font face=3D"monosp=
ace, monospace">true : optimize(3), unbox_closures, optimization_rounds(8),=
 inline_max_depth(8), inline_max_unroll(2)</font></div></blockquote><div><b=
r></div><div><br></div><div>And terminates with the &quot;Fatal error: exce=
ption Stack overflow&quot; message. With higher values of the inline_max_un=
roll parameter, it doesn&#39;t terminate on mine machine in reasonable time=
 and space.=C2=A0</div><div><br></div><div><br></div><div>The most surprisi=
ng is not that, but that the offending line=C2=A0is `let (=3D) =3D equal`. =
Without this line, the program compiles without any issues.=C2=A0</div><div=
><br></div><div>Best,</div><div>Ivan</div><div><br></div><div>P.S. Actually=
, I&#39;m playing with flambda=C2=A0and our monads library, that is highly =
functorized=C2=A0and abstracted. I reimplemented this loop example using th=
e State monad and got about 10 times better performance with flambda in com=
parison to a compiler without flambda=C2=A0enabled (from 40 seconds to 4 se=
conds). That makes my happy panda :) It is still 4 times slower than the re=
gular version, but I&#39;m ready to pay this cost.=C2=A0</div><div><br></di=
v><div><br></div><div><br></div></div><div class=3D"gmail_extra"><br><div c=
lass=3D"gmail_quote">On Tue, Jul 11, 2017 at 2:15 PM, Yotam Barnoy <span di=
r=3D"ltr">&lt;<a href=3D"mailto:yotambarnoy@gmail.com" target=3D"_blank">yo=
tambarnoy@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
><span class=3D"">&gt; Moving from tagging to boxing does not make much sen=
se to me, because<br>
the techniques to avoid either are similar and boxing is more<br>
expensive. It might be the case that the compiler today is better at<br>
unboxing than untagging (it currently tries to do both of these<br>
locally), but that would only be because boxing is more expensive and<br>
thus more effort was spent on the unboxing, but medium-term one could<br>
hope for a uniform approach to both transformations.<br>
<br>
</span>Absolutely. I believe there was some skepticism expressed about the<=
br>
need for untagging in the past by some members of the dev team, which<br>
is why I brought it up. I suggested Int64 only because it&#39;s possible<br>
that Flambda will *currently* do better with it than with tagged<br>
integers.<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Tue, Jul 11, 2017 at 2:04 PM, Gabriel Scherer<br>
&lt;<a href=3D"mailto:gabriel.scherer@gmail.com">gabriel.scherer@gmail.com<=
/a>&gt; wrote:<br>
&gt; Moving from tagging to boxing does not make much sense to me, because<=
br>
&gt; the techniques to avoid either are similar and boxing is more<br>
&gt; expensive. It might be the case that the compiler today is better at<b=
r>
&gt; unboxing than untagging (it currently tries to do both of these<br>
&gt; locally), but that would only be because boxing is more expensive and<=
br>
&gt; thus more effort was spent on the unboxing, but medium-term one could<=
br>
&gt; hope for a uniform approach to both transformations.<br>
&gt;<br>
&gt;<br>
&gt; On Tue, Jul 11, 2017 at 1:46 PM, Yotam Barnoy &lt;<a href=3D"mailto:yo=
tambarnoy@gmail.com">yotambarnoy@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt; I&#39;ve played a little bit with different optimization optio=
ns in flambda 4.04, and finally, all three versions of the loop: curried, u=
ncurried, and the for-loop, have the same performance, though they still lo=
ose about 30% to the C version, due to tagging.<br>
&gt;&gt;<br>
&gt;&gt; Would it perhaps make sense to try Int64 in order to avoid the tag=
ging?<br>
&gt;&gt;<br>
&gt;&gt; Also, I believe it would make sense to have Flambda try to switch =
to<br>
&gt;&gt; an untagged type in tight loops to avoid this tagging cost.<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Jul 11, 2017 at 1:37 PM, Ivan Gotovchits &lt;<a href=3D"ma=
ilto:ivg@ieee.org">ivg@ieee.org</a>&gt; wrote:<br>
&gt;&gt;&gt; TWIMC,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I&#39;ve played a little bit with different optimization optio=
ns in flambda<br>
&gt;&gt;&gt; 4.04, and finally, all three versions of the loop: curried, un=
curried, and<br>
&gt;&gt;&gt; the for-loop, have the same performance, though they still loo=
se about 30%<br>
&gt;&gt;&gt; to the C version, due to tagging.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Basically, this means, that flambda was able to get rid of the=
 allocation. I<br>
&gt;&gt;&gt; don&#39;t actually know which of the options finally made the =
difference, but<br>
&gt;&gt;&gt; this is how I compiled it.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; ocamlopt.opt -c -S -inlining-report -unbox-closures -O3 -round=
s 8<br>
&gt;&gt;&gt; -inline-max-depth 256 -inline-max-unroll 1024 -o loop.cmx <a h=
ref=3D"http://loop.ml" rel=3D"noreferrer" target=3D"_blank">loop.ml</a><br>
&gt;&gt;&gt; ocamlopt.opt loop.cmx -o loop.native<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Regards,<br>
&gt;&gt;&gt; Ivan<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Tue, Jul 11, 2017 at 8:54 AM, Simon Cruanes &lt;<a href=3D"=
mailto:simon.cruanes.2007@m4x.org">simon.cruanes.2007@m4x.org</a>&gt;<br>
&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Hello,<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Iterators in OCaml have been the topic of many discussions=
. Another<br>
&gt;&gt;&gt;&gt; option for fast iterators is <a href=3D"https://github.com=
/c-cube/sequence" rel=3D"noreferrer" target=3D"_blank">https://github.com/c=
-cube/<wbr>sequence</a> ,<br>
&gt;&gt;&gt;&gt; which (with flambda) should compile down to loops and test=
s on this kind<br>
&gt;&gt;&gt;&gt; of benchmark. With the attached additional file on 4.04.0+=
flambda,<br>
&gt;&gt;&gt;&gt; I obtain the following (where sequence is test-seq):<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; $ for i in test-* ; do echo $i ; time ./$i ; done<br>
&gt;&gt;&gt;&gt; test-c_loop<br>
&gt;&gt;&gt;&gt; 5000000100000000<br>
&gt;&gt;&gt;&gt; ./$i=C2=A0 0.08s user 0.00s system 97% cpu 0.085 total<br>
&gt;&gt;&gt;&gt; test-f_loop<br>
&gt;&gt;&gt;&gt; 5000000100000000<br>
&gt;&gt;&gt;&gt; ./$i=C2=A0 0.10s user 0.00s system 96% cpu 0.100 total<br>
&gt;&gt;&gt;&gt; test-loop<br>
&gt;&gt;&gt;&gt; 5000000100000000<br>
&gt;&gt;&gt;&gt; ./$i=C2=A0 0.18s user 0.00s system 97% cpu 0.184 total<br>
&gt;&gt;&gt;&gt; test-seq<br>
&gt;&gt;&gt;&gt; 5000000100000000<br>
&gt;&gt;&gt;&gt; ./$i=C2=A0 0.11s user 0.00s system 97% cpu 0.113 total<br>
&gt;&gt;&gt;&gt; test-stream<br>
&gt;&gt;&gt;&gt; 5000000100000000<br>
&gt;&gt;&gt;&gt; ./$i=C2=A0 0.44s user 0.00s system 98% cpu 0.449 total<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; Note that sequence is imperative underneath, but can be sa=
fely used as a<br>
&gt;&gt;&gt;&gt; functional structure.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; --<br>
&gt;&gt;&gt;&gt; Simon Cruanes<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; <a href=3D"http://weusepgp.info/" rel=3D"noreferrer" targe=
t=3D"_blank">http://weusepgp.info/</a><br>
&gt;&gt;&gt;&gt; key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3=C2=A0 7=
D8D 4AC0 1D08 49AA<br>
&gt;&gt;&gt;&gt; 62B6<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Caml-list mailing list.=C2=A0 Subscription management and archives=
:<br>
&gt;&gt; <a href=3D"https://sympa.inria.fr/sympa/arc/caml-list" rel=3D"nore=
ferrer" target=3D"_blank">https://sympa.inria.fr/sympa/<wbr>arc/caml-list</=
a><br>
&gt;&gt; Beginner&#39;s list: <a href=3D"http://groups.yahoo.com/group/ocam=
l_beginners" rel=3D"noreferrer" target=3D"_blank">http://groups.yahoo.com/g=
roup/<wbr>ocaml_beginners</a><br>
&gt;&gt; Bug reports: <a href=3D"http://caml.inria.fr/bin/caml-bugs" rel=3D=
"noreferrer" target=3D"_blank">http://caml.inria.fr/bin/caml-<wbr>bugs</a><=
br>
</div></div></blockquote></div><br></div>

--f403045f3f0a24967605540f3e65--