From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ivg@ieee.org>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by sympa.inria.fr (Postfix) with ESMTPS id DBC2A80198
	for <caml-list@sympa.inria.fr>; Mon, 10 Jul 2017 23:29:23 +0200 (CEST)
Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=ivg@ieee.org; spf=Pass smtp.mailfrom=ivg@ieee.org; spf=None smtp.helo=postmaster@mail-yw0-f175.google.com
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  ivg@ieee.org) identity=pra; client-ip=209.85.161.175;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org";
  x-conformance=sidf_compatible
Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of
  ivg@ieee.org designates 209.85.161.175 as permitted sender)
  identity=mailfrom; client-ip=209.85.161.175;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="ivg@ieee.org"; x-sender="ivg@ieee.org";
  x-conformance=sidf_compatible; x-record-type="v=spf1"
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@mail-yw0-f175.google.com) identity=helo;
  client-ip=209.85.161.175;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="ivg@ieee.org";
  x-sender="postmaster@mail-yw0-f175.google.com";
  x-conformance=sidf_compatible
IronPort-PHdr: =?us-ascii?q?9a23=3AP8+f4R84xT4nRP9uRHKM819IXTAuvvDOBiVQ1KB2?=
 =?us-ascii?q?0+McTK2v8tzYMVDF4r011RmSDNqds6oMotGVmpioYXYH75eFvSJKW713fDhBt/?=
 =?us-ascii?q?8rmRc9CtWOE0zxIa2iRSU7GMNfSA0tpCnjYgBaF8nkelLdvGC54yIMFRXjLwp1?=
 =?us-ascii?q?Ifn+FpLPg8it2e2//5/ebx9UiDahfLh/MAi4oQLNu8cMnIBsMLwxyhzHontJf+?=
 =?us-ascii?q?RZ22ZlLk+Nkhj/+8m94odt/zxftPw9+cFAV776f7kjQrxDEDsmKWE169b1uhTF?=
 =?us-ascii?q?UACC+2ETUmQSkhpPHgjF8BT3VYr/vyfmquZw3jSRMsrrQ7ApQjSi97lkRwP0iC?=
 =?us-ascii?q?kJMD459XvYis12jKlGpB6sqBhyz4vSbYqINvRxY7ndcMsYSmpPXshfWS9PDJ6i?=
 =?us-ascii?q?YYQTFOcOJ/pUopPnqlcSsRezBw+hD/7vxD9SgX/22LU33ec/EQ7c2gwrAtMAsH?=
 =?us-ascii?q?PIrNXyKqcdTeC1zKjUzTXYcvhb3jb96JbHch06oPGDQ6x/ftTLxUkoDQPFgUyd?=
 =?us-ascii?q?pIr4ND2b0eQNtnKU7+tmVe+3im4nrRtxojm1ycs2hInJnJoZylTD9SV+2IY5P9?=
 =?us-ascii?q?i4SEpjbd6/DJtQrT+VOJFzQs84RmFovD42yrIHuZ6nfCgK1Y8oywTDZPyAdoiE?=
 =?us-ascii?q?+hLiW/yRITd/g3JpYq6whxG38US4xO3zTs200FFNripdiNXMs3QN2wTd6sifVv?=
 =?us-ascii?q?R9+UKh2S6L1w/N9uFLP1o4mrbcK54k2rIwkZkTsUHCHi/0gkn2i7WWdkoi9+O1?=
 =?us-ascii?q?6Orneq3rqoGAO4JwkA3zMaQjltaiDek5LwQCRXWX9Oa82bDl4Eb3Wq9Fjucsna?=
 =?us-ascii?q?ncqJ3aJdoUpqq+AwJN14Ys8Re/DzO/3NUYk3gLMEtJeByag4XrO1zCOv/4DfC4?=
 =?us-ascii?q?g1SjlDdk2erKMaHmApXINnTDkbHhcqhh60NE1gY/0dRS64hXB7wBOv7/R078uM?=
 =?us-ascii?q?HCAhMkMQG5w/7rCNBn2YMfXWKPDLWZMKTXsVKQ5+IvPeaMaZQUuDnjNfcq+eTi?=
 =?us-ascii?q?jXgjmV8SZaWpx4cYaGikHvR6JEWUeWbjjc0EEWcOpwY+SO3qiEaeUTNIfHazX6?=
 =?us-ascii?q?c85ikhB468DIfDQJqtgL2b0yuhEJ1WfDMONlfZOHPlZ4iVE9oDbziVPIc1mzgJ?=
 =?us-ascii?q?Rf6mTYswkx2Guwrzyr4hJe3RrH42r5Xmgflr7uubuhYu8iJ/D8WByCnZTn97tm?=
 =?us-ascii?q?IFSjJw27pw9x8ugmyf2LR11qQLXedY4OlEB1trOA=3D=3D?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0DjAACb8GNZhq+hVdFdHQEFAQsBGAEFA?=
 =?us-ascii?q?QsBhBOBFAeEbTKaToJChWyPBwNcLIVLAoMlB0MUAQEBAQEBAQEBAQESAQEBCAs?=
 =?us-ascii?q?LCCgvgjMMgloBAgEBASMEGQEBLAsBBAsLCwMXHQICIQESAQUBChIGExINiXgDC?=
 =?us-ascii?q?AUIEJ8MP4sfa4FsOoMFAQEFhCwDCoN4AQEBAQEFAQEBAQEBARkIEoMWL4Mdg3m?=
 =?us-ascii?q?BDIJXKYIXgmaCYYgADIFblQE7h0iHCUuEboIlPoEPjjCLeYd/FB+BFQ8ngSuBA?=
 =?us-ascii?q?ghJGgaEOiofgg8kNgEEiF8BAQE?=
X-IPAS-Result: =?us-ascii?q?A0DjAACb8GNZhq+hVdFdHQEFAQsBGAEFAQsBhBOBFAeEbTK?=
 =?us-ascii?q?aToJChWyPBwNcLIVLAoMlB0MUAQEBAQEBAQEBAQESAQEBCAsLCCgvgjMMgloBA?=
 =?us-ascii?q?gEBASMEGQEBLAsBBAsLCwMXHQICIQESAQUBChIGExINiXgDCAUIEJ8MP4sfa4F?=
 =?us-ascii?q?sOoMFAQEFhCwDCoN4AQEBAQEFAQEBAQEBARkIEoMWL4Mdg3mBDIJXKYIXgmaCY?=
 =?us-ascii?q?YgADIFblQE7h0iHCUuEboIlPoEPjjCLeYd/FB+BFQ8ngSuBAghJGgaEOiofgg8?=
 =?us-ascii?q?kNgEEiF8BAQE?=
X-IronPort-AV: E=Sophos;i="5.40,342,1496095200"; 
   d="scan'208,217";a="231133871"
Received: from mail-yw0-f175.google.com ([209.85.161.175])
  by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 10 Jul 2017 23:29:22 +0200
Received: by mail-yw0-f175.google.com with SMTP id v193so41352921ywg.2
        for <caml-list@inria.fr>; Mon, 10 Jul 2017 14:29:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=ieee-org.20150623.gappssmtp.com; s=20150623;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=x3R1usWDbTNyov6jU9sHaHB4pJqYFCHwAl8X4VYtmk8=;
        b=Kt0NaYkJYliZO/nK/2GP0jxl70cWwGLiYPhdA3iOUvbhIsxUXUBrV0zNanJBQIRMY1
         sTWob1mBchWVO6Hh583MmjXAgXUYUfzekI/cFr8HSlbntlpsF5ze1azRy5fpvSjuOF4z
         yrqr+eotPamMz6JvYM//j6gGpaN6k2mZaZBT8ULNpWZAgWhqlN8kVmOpsKegy4T87dDc
         hIshwN0QReJTYPcBqKd3qmuD9pgm3wCGWztGefsxSXKCZ4dGvdGwM58Iz5JH7zekS1yr
         YLeGGZl6lwXqy3ZwmbeoT9ojwqEYdD2U2Ir+uD4MNG5kbwJ+RVp/jnvzSjiRyGmoq/aW
         +BLw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=x3R1usWDbTNyov6jU9sHaHB4pJqYFCHwAl8X4VYtmk8=;
        b=uMlzk+r2WFBDjwGFmpJzPRRq33NZ3fppV1z92hBW/AKm90YxpQX53dU4G9c6f3mw4N
         h6oS9DkYyhDjDmQIrl5Kg9kJomZfN/BQStusD9EhGxy1XxIjNxyqRGWwHlI+HLDz4RBB
         HjSyRmVJhwyepDkGtoc0DS7JEbc2toPCAH3qc5JraEdLHf6pi6LdhVhIizw9DkkE1t8j
         aCb9LjQy9ydFvReFwmm8J/O7A9aL1Uv3+alqz3yxZKA2jEAghEwo3LLUSRXvHT7o/h2l
         bWPIL10aqiLwsN8CA1EDu8GRc/zTBQDwtFZnDjpAoCq+TsZECD5KZdKV1IseTQiaYChZ
         NI1A==
X-Gm-Message-State: AIVw1134PTZOFOeMS7re4fEjNjrj+FTqYlk0vSzCq5Ig4gg4I4uIRPAQ
	s1HyFUqoKRNaez4GLUeNB9tLz1H+h25t
X-Received: by 10.129.107.198 with SMTP id g189mr2265245ywc.314.1499722160126;
 Mon, 10 Jul 2017 14:29:20 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.129.39.7 with HTTP; Mon, 10 Jul 2017 14:29:19 -0700 (PDT)
In-Reply-To: <CAJannG4mcHu6rDsb+LBfydCmhhN1F+OW7pUWDyWTnkvRP6r46w@mail.gmail.com>
References: <CAJannG4mcHu6rDsb+LBfydCmhhN1F+OW7pUWDyWTnkvRP6r46w@mail.gmail.com>
From: Ivan Gotovchits <ivg@ieee.org>
Date: Mon, 10 Jul 2017 17:29:19 -0400
Message-ID: <CALdWJ+wFOTRDtLxAVwqbox-6YHL8OK0t5atBKnEmK9pHHBDMig@mail.gmail.com>
To: Alexey Egorov <alex.only.d@gmail.com>
Cc: caml users <caml-list@inria.fr>
Content-Type: multipart/alternative; boundary="001a11473b369dfbbb0553fd46bd"
Subject: Re: [Caml-list] Optimizing pure-functional streams

--001a11473b369dfbbb0553fd46bd
Content-Type: text/plain; charset="UTF-8"

Hi Alexey,

The recursion version is slower because you're allocating a tuple on each
recursive call. Try to use a curried form of a function, and you will a
performance that is even better than the for-loop (10% better in my case).
E.g.,


let loop high =
  let rec loop i = function
    | t when i > high -> t
    | t when i mod 2 = 0 -> loop (i + 1) (t + i * 2)
    | t -> loop (i + 1) t in
  loop 1 0


The for-loop version is indeed slower than the C version because of
tagging, c.f., the OCaml implementation:

movq $1, %rbx
movq $3, %rdi
cmpq %rax, %rdi
jg .L100


.L101:
movq %rdi, %rsi
sarq $1, %rsi
movq %rsi, %rdx
shr movq $1, %rbx
movq $3, %rdi
cmpq %rax, %rdi
jg .L100

.L101:
movq %rdi, %rsi
sarq $1, %rsi
moq $63, %rdx
movq %rsi, %rcx
addq %rdx, %rcx
andq $-2, %rcx
subq %rcx, %rsi
leaq 1(%rsi,%rsi), %rsi
cmpq $1, %rsi
jne .L102
leaq -2(%rbx,%rdi,2), %rbx
.L102:
movq %rdi, %rsi
addq $2, %rdi
cmpq %rax, %rsi
jne .L101
.L100:
movq %rbx, %rax
ret


With the GCC output:


testl %edi, %edi
jle .L6
addl $1, %edi
movl $4, %ecx
movl $1, %edx
xorl %eax, %eax
jmp .L3
.L5:
leaq (%rax,%rcx), %rsi
testb $1, %dl
cmove %rsi, %rax
addq $2, %rcx
.L3:
addl $1, %edx
cmpl %edi, %edx
jne .L5
rep ret
.L6:
xorl %eax, %eax
ret


Notice this `sar` and `shr` instructions, they are here due to the tagging.
The GCC compiler was also able to leverage the conditional move instruction
`cmove`, though it looks nice I'm not sure how much it actually won. There
is no need to use flambda for the recursive and for-loop as they are
already optimized to the maximum.

Concerning your stream implementation, you may compare it to the Core
Kernel's Sequence. It is basically the same, so you can play the
benchmarking game :)

The general way to implement a stream with a zero performance penalty
(i.e., as fast as a for loop) is to use staging. This topic is explored in
detail in the Oleg Kiselyov et alas Strymonad [1] paper. The
implementation, as well as other interesting discussions about streams, are
available on his homepage [2] (there are tons of very interesting stuff
there!).

Best,
Ivan

[1]: http://okmij.org/ftp/meta-programming/strymonas.pdf
[2]: http://okmij.org/ftp/Streams.html


On Mon, Jul 10, 2017 at 4:26 PM, Alexey Egorov <alex.only.d@gmail.com>
wrote:

> Hello,
>
> Michael Snoyman recently posted very interesting article about Haskell
> streams and Rust iterators (
> https://www.fpcomplete.com/blog/2017/07/iterators-streams-rust-haskell
> ) and I tried to do the same in OCaml.
>
> You can find my code here -
> https://gist.github.com/anonymous/127e9116b362d561c5dfa9cce6b453f3
>
> There is four different versions:
>
> 1) c_loop.c - plain C loop
> 2) f_loop.ml - for-loop in OCaml
> 3) loop.ml - OCaml loop using recursion
> 4) stream.ml - OCaml loop using streams
>
> Here is the results:
>
> c_loop.c:   0m0.105s
> f_loop.ml:  0m0.184s
> loop.ml:     0m0.207s
> stream.ml: 0m0.605s
>
> It's not suprise that C version is fastest - at least, it uses untagged
> ints.
> What surprises me is that recursion is slower than for-loop - can't
> OCaml compile it down to the same imperative loop?
>
> And it's very dissapointing that stream version is 3 times slower than
> recursion-based.
> The problem, I believe, is that OCaml is unable to optimize
> intermediate stream constructors away (as GHC do).
>
> I tried to add inline annotations, which helps a little - is there
> something I can do to improve it further?
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

--001a11473b369dfbbb0553fd46bd
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Alexey,<div><br></div><div>The recursion version is slo=
wer=C2=A0because you&#39;re allocating a tuple on each recursive call. Try =
to use a curried form of a function, and you will a performance that is eve=
n better than the for-loop (10% better in my case). E.g.,=C2=A0</div><block=
quote style=3D"margin:0px 0px 0px 40px;border:none;padding:0px"><div><br></=
div><div><br></div><div><font face=3D"monospace, monospace">let loop high =
=3D</font></div><div><font face=3D"monospace, monospace">=C2=A0 let rec loo=
p i =3D function</font></div><div><font face=3D"monospace, monospace">=C2=
=A0 =C2=A0 | t when i &gt; high -&gt; t</font></div><div><font face=3D"mono=
space, monospace">=C2=A0 =C2=A0 | t when i mod 2 =3D 0 -&gt; loop (i + 1) (=
t + i * 2)</font></div><div><font face=3D"monospace, monospace">=C2=A0 =C2=
=A0 | t -&gt; loop (i + 1) t in</font></div><div><font face=3D"monospace, m=
onospace">=C2=A0 loop 1 0</font></div></blockquote><div><br></div><div>The =
for-loop version is indeed slower=C2=A0than the C version because of taggin=
g, c.f., the OCaml implementation:</div><div><br></div><blockquote style=3D=
"margin:0px 0px 0px 40px;border:none;padding:0px"><div><div><font face=3D"m=
onospace, monospace"><span style=3D"white-space:pre">	</span>movq<span styl=
e=3D"white-space:pre">	</span>$1, %rbx</font></div></div><div><div><font fa=
ce=3D"monospace, monospace"><span style=3D"white-space:pre">	</span>movq<sp=
an style=3D"white-space:pre">	</span>$3, %rdi</font></div></div><div><div><=
font face=3D"monospace, monospace"><span style=3D"white-space:pre">	</span>=
cmpq<span style=3D"white-space:pre">	</span>%rax, %rdi</font></div></div><d=
iv><div><font face=3D"monospace, monospace"><span style=3D"white-space:pre"=
>	</span>jg<span style=3D"white-space:pre">	</span>.L100</font></div></div>=
</blockquote><div><font face=3D"monospace, monospace"><br></font></div><blo=
ckquote style=3D"margin:0px 0px 0px 40px;border:none;padding:0px"><div><div=
><font face=3D"monospace, monospace">.L101:</font></div></div><div><div><fo=
nt face=3D"monospace, monospace"><span style=3D"white-space:pre">	</span>mo=
vq<span style=3D"white-space:pre">	</span>%rdi, %rsi</font></div></div><div=
><div><font face=3D"monospace, monospace"><span style=3D"white-space:pre">	=
</span>sarq<span style=3D"white-space:pre">	</span>$1, %rsi</font></div></d=
iv><div><div><font face=3D"monospace, monospace"><span style=3D"white-space=
:pre">	</span>movq<span style=3D"white-space:pre">	</span>%rsi, %rdx</font>=
</div></div><div><div><font face=3D"monospace, monospace"><span style=3D"wh=
ite-space:pre">	</span>shr<span style=3D"white-space:pre">	</span>movq<span=
 style=3D"white-space:pre">	</span>$1, %rbx</font></div><div><font face=3D"=
monospace, monospace"><span style=3D"white-space:pre">	</span>movq<span sty=
le=3D"white-space:pre">	</span>$3, %rdi</font></div><div><font face=3D"mono=
space, monospace"><span style=3D"white-space:pre">	</span>cmpq<span style=
=3D"white-space:pre">	</span>%rax, %rdi</font></div><div><font face=3D"mono=
space, monospace"><span style=3D"white-space:pre">	</span>jg<span style=3D"=
white-space:pre">	</span>.L100</font></div><div><font face=3D"monospace, mo=
nospace"><br></font></div><div><font face=3D"monospace, monospace">.L101:</=
font></div><div><font face=3D"monospace, monospace"><span style=3D"white-sp=
ace:pre">	</span>movq<span style=3D"white-space:pre">	</span>%rdi, %rsi</fo=
nt></div><div><font face=3D"monospace, monospace"><span style=3D"white-spac=
e:pre">	</span>sarq<span style=3D"white-space:pre">	</span>$1, %rsi</font><=
/div><div><font face=3D"monospace, monospace"><span style=3D"white-space:pr=
e">	</span>moq<span style=3D"white-space:pre">	</span>$63, %rdx</font></div=
></div><div><div><font face=3D"monospace, monospace"><span style=3D"white-s=
pace:pre">	</span>movq<span style=3D"white-space:pre">	</span>%rsi, %rcx</f=
ont></div></div><div><div><font face=3D"monospace, monospace"><span style=
=3D"white-space:pre">	</span>addq<span style=3D"white-space:pre">	</span>%r=
dx, %rcx</font></div></div><div><div><font face=3D"monospace, monospace"><s=
pan style=3D"white-space:pre">	</span>andq<span style=3D"white-space:pre">	=
</span>$-2, %rcx</font></div></div><div><div><font face=3D"monospace, monos=
pace"><span style=3D"white-space:pre">	</span>subq<span style=3D"white-spac=
e:pre">	</span>%rcx, %rsi</font></div></div><div><div><font face=3D"monospa=
ce, monospace"><span style=3D"white-space:pre">	</span>leaq<span style=3D"w=
hite-space:pre">	</span>1(%rsi,%rsi), %rsi</font></div></div><div><div><fon=
t face=3D"monospace, monospace"><span style=3D"white-space:pre">	</span>cmp=
q<span style=3D"white-space:pre">	</span>$1, %rsi</font></div></div><div><d=
iv><font face=3D"monospace, monospace"><span style=3D"white-space:pre">	</s=
pan>jne<span style=3D"white-space:pre">	</span>.L102</font></div></div><div=
><div><font face=3D"monospace, monospace"><span style=3D"white-space:pre">	=
</span>leaq<span style=3D"white-space:pre">	</span>-2(%rbx,%rdi,2), %rbx</f=
ont></div></div><div><div><font face=3D"monospace, monospace">.L102:</font>=
</div></div><div><div><font face=3D"monospace, monospace"><span style=3D"wh=
ite-space:pre">	</span>movq<span style=3D"white-space:pre">	</span>%rdi, %r=
si</font></div></div><div><div><font face=3D"monospace, monospace"><span st=
yle=3D"white-space:pre">	</span>addq<span style=3D"white-space:pre">	</span=
>$2, %rdi</font></div></div><div><div><font face=3D"monospace, monospace"><=
span style=3D"white-space:pre">	</span>cmpq<span style=3D"white-space:pre">=
	</span>%rax, %rsi</font></div></div><div><div><font face=3D"monospace, mon=
ospace"><span style=3D"white-space:pre">	</span>jne<span style=3D"white-spa=
ce:pre">	</span>.L101</font></div></div><div><div><font face=3D"monospace, =
monospace">.L100:</font></div></div><div><div><font face=3D"monospace, mono=
space"><span style=3D"white-space:pre">	</span>movq<span style=3D"white-spa=
ce:pre">	</span>%rbx, %rax</font></div></div><div><div><font face=3D"monosp=
ace, monospace"><span style=3D"white-space:pre">	</span>ret</font></div></d=
iv></blockquote><div><br></div><div>With the GCC output:</div><div><br></di=
v><div><br></div><blockquote style=3D"margin:0px 0px 0px 40px;border:none;p=
adding:0px"><div><div><font face=3D"monospace, monospace"><span style=3D"wh=
ite-space:pre">	</span>testl<span style=3D"white-space:pre">	</span>%edi, %=
edi</font></div></div><div><div><font face=3D"monospace, monospace"><span s=
tyle=3D"white-space:pre">	</span>jle<span style=3D"white-space:pre">	</span=
>.L6</font></div></div><div><div><font face=3D"monospace, monospace"><span =
style=3D"white-space:pre">	</span>addl<span style=3D"white-space:pre">	</sp=
an>$1, %edi</font></div></div><div><div><font face=3D"monospace, monospace"=
><span style=3D"white-space:pre">	</span>movl<span style=3D"white-space:pre=
">	</span>$4, %ecx</font></div></div><div><div><font face=3D"monospace, mon=
ospace"><span style=3D"white-space:pre">	</span>movl<span style=3D"white-sp=
ace:pre">	</span>$1, %edx</font></div></div><div><div><font face=3D"monospa=
ce, monospace"><span style=3D"white-space:pre">	</span>xorl<span style=3D"w=
hite-space:pre">	</span>%eax, %eax</font></div></div><div><div><font face=
=3D"monospace, monospace"><span style=3D"white-space:pre">	</span>jmp<span =
style=3D"white-space:pre">	</span>.L3</font></div></div><div><div><font fac=
e=3D"monospace, monospace">.L5:</font></div></div><div><div><font face=3D"m=
onospace, monospace"><span style=3D"white-space:pre">	</span>leaq<span styl=
e=3D"white-space:pre">	</span>(%rax,%rcx), %rsi</font></div></div><div><div=
><font face=3D"monospace, monospace"><span style=3D"white-space:pre">	</spa=
n>testb<span style=3D"white-space:pre">	</span>$1, %dl</font></div></div><d=
iv><div><font face=3D"monospace, monospace"><span style=3D"white-space:pre"=
>	</span>cmove<span style=3D"white-space:pre">	</span>%rsi, %rax</font></di=
v></div><div><div><font face=3D"monospace, monospace"><span style=3D"white-=
space:pre">	</span>addq<span style=3D"white-space:pre">	</span>$2, %rcx</fo=
nt></div></div><div><div><font face=3D"monospace, monospace">.L3:</font></d=
iv></div><div><div><font face=3D"monospace, monospace"><span style=3D"white=
-space:pre">	</span>addl<span style=3D"white-space:pre">	</span>$1, %edx</f=
ont></div></div><div><div><font face=3D"monospace, monospace"><span style=
=3D"white-space:pre">	</span>cmpl<span style=3D"white-space:pre">	</span>%e=
di, %edx</font></div></div><div><div><font face=3D"monospace, monospace"><s=
pan style=3D"white-space:pre">	</span>jne<span style=3D"white-space:pre">	<=
/span>.L5</font></div></div><div><div><font face=3D"monospace, monospace"><=
span style=3D"white-space:pre">	</span>rep ret</font></div></div><div><div>=
<font face=3D"monospace, monospace">.L6:</font></div></div><div><div><font =
face=3D"monospace, monospace"><span style=3D"white-space:pre">	</span>xorl<=
span style=3D"white-space:pre">	</span>%eax, %eax</font></div></div><div><d=
iv><font face=3D"monospace, monospace"><span style=3D"white-space:pre">	</s=
pan>ret</font></div></div></blockquote><div><br></div><div>Notice this `sar=
`=C2=A0and `shr`=C2=A0instructions, they are here due to the tagging. The G=
CC compiler was also able to leverage the conditional move instruction `cmo=
ve`, though it looks nice I&#39;m not sure how much it actually won. There =
is no need to use flambda for the recursive and for-loop as they are alread=
y optimized to the maximum.=C2=A0</div><div><br></div><div>Concerning your =
stream implementation, you may compare it to the Core Kernel&#39;s Sequence=
. It is basically the same, so you can play the benchmarking game :)</div><=
div><br></div><div>The general way to implement a stream with a zero perfor=
mance penalty (i.e., as fast as a for loop) is to use staging. This topic i=
s explored in detail in the Oleg Kiselyov et alas Strymonad [1] paper. The =
implementation,=C2=A0as well as other interesting discussions about streams=
, are available on his homepage [2] (there=C2=A0are tons of very interestin=
g stuff there!).</div><div><br></div><div>Best,</div><div>Ivan</div><div><b=
r></div><div>[1]:=C2=A0<a href=3D"http://okmij.org/ftp/meta-programming/str=
ymonas.pdf">http://okmij.org/ftp/meta-programming/strymonas.pdf</a></div><d=
iv>[2]: <a href=3D"http://okmij.org/ftp/Streams.html">http://okmij.org/ftp/=
Streams.html</a></div><div><br></div><div><br></div><div><br><div><br></div=
></div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On M=
on, Jul 10, 2017 at 4:26 PM, Alexey Egorov <span dir=3D"ltr">&lt;<a href=3D=
"mailto:alex.only.d@gmail.com" target=3D"_blank">alex.only.d@gmail.com</a>&=
gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
<br>
Michael Snoyman recently posted very interesting article about Haskell<br>
streams and Rust iterators (<br>
<a href=3D"https://www.fpcomplete.com/blog/2017/07/iterators-streams-rust-h=
askell" rel=3D"noreferrer" target=3D"_blank">https://www.fpcomplete.com/<wb=
r>blog/2017/07/iterators-<wbr>streams-rust-haskell</a><br>
) and I tried to do the same in OCaml.<br>
<br>
You can find my code here -<br>
<a href=3D"https://gist.github.com/anonymous/127e9116b362d561c5dfa9cce6b453=
f3" rel=3D"noreferrer" target=3D"_blank">https://gist.github.com/<wbr>anony=
mous/<wbr>127e9116b362d561c5dfa9cce6b453<wbr>f3</a><br>
<br>
There is four different versions:<br>
<br>
1) c_loop.c - plain C loop<br>
2) <a href=3D"http://f_loop.ml" rel=3D"noreferrer" target=3D"_blank">f_loop=
.ml</a> - for-loop in OCaml<br>
3) <a href=3D"http://loop.ml" rel=3D"noreferrer" target=3D"_blank">loop.ml<=
/a> - OCaml loop using recursion<br>
4) <a href=3D"http://stream.ml" rel=3D"noreferrer" target=3D"_blank">stream=
.ml</a> - OCaml loop using streams<br>
<br>
Here is the results:<br>
<br>
c_loop.c:=C2=A0 =C2=A00m0.105s<br>
<a href=3D"http://f_loop.ml" rel=3D"noreferrer" target=3D"_blank">f_loop.ml=
</a>:=C2=A0 0m0.184s<br>
<a href=3D"http://loop.ml" rel=3D"noreferrer" target=3D"_blank">loop.ml</a>=
:=C2=A0 =C2=A0 =C2=A00m0.207s<br>
<a href=3D"http://stream.ml" rel=3D"noreferrer" target=3D"_blank">stream.ml=
</a>: 0m0.605s<br>
<br>
It&#39;s not suprise that C version is fastest - at least, it uses untagged=
 ints.<br>
What surprises me is that recursion is slower than for-loop - can&#39;t<br>
OCaml compile it down to the same imperative loop?<br>
<br>
And it&#39;s very dissapointing that stream version is 3 times slower than<=
br>
recursion-based.<br>
The problem, I believe, is that OCaml is unable to optimize<br>
intermediate stream constructors away (as GHC do).<br>
<br>
I tried to add inline annotations, which helps a little - is there<br>
something I can do to improve it further?<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
--<br>
Caml-list mailing list.=C2=A0 Subscription management and archives:<br>
<a href=3D"https://sympa.inria.fr/sympa/arc/caml-list" rel=3D"noreferrer" t=
arget=3D"_blank">https://sympa.inria.fr/sympa/<wbr>arc/caml-list</a><br>
Beginner&#39;s list: <a href=3D"http://groups.yahoo.com/group/ocaml_beginne=
rs" rel=3D"noreferrer" target=3D"_blank">http://groups.yahoo.com/group/<wbr=
>ocaml_beginners</a><br>
Bug reports: <a href=3D"http://caml.inria.fr/bin/caml-bugs" rel=3D"noreferr=
er" target=3D"_blank">http://caml.inria.fr/bin/caml-<wbr>bugs</a><br>
</font></span></blockquote></div><br></div>

--001a11473b369dfbbb0553fd46bd--