TWIMC,

I've played a little bit with different optimization options in
flambda 4.04, and finally, all three versions of the loop: curried,
uncurried, and the for-loop, have the same performance, though they still
loose about 30% to the C version, due to tagging.

Basically, this means, that flambda was able to get rid of the allocation.
I don't actually know which of the options finally made the difference, but
this is how I compiled it.

ocamlopt.opt -c -S -inlining-report -unbox-closures -O3 -rounds 8
-inline-max-depth 256 -inline-max-unroll 1024 -o loop.cmx loop.ml
ocamlopt.opt loop.cmx -o loop.native


Regards,
Ivan


On Tue, Jul 11, 2017 at 8:54 AM, Simon Cruanes <simon.cruanes.2007@m4x.org>
wrote:

> Hello,
>
> Iterators in OCaml have been the topic of many discussions. Another
> option for fast iterators is https://github.com/c-cube/sequence ,
> which (with flambda) should compile down to loops and tests on this kind
> of benchmark. With the attached additional file on 4.04.0+flambda,
> I obtain the following (where sequence is test-seq):
>
> $ for i in test-* ; do echo $i ; time ./$i ; done
> test-c_loop
> 5000000100000000
> ./$i  0.08s user 0.00s system 97% cpu 0.085 total
> test-f_loop
> 5000000100000000
> ./$i  0.10s user 0.00s system 96% cpu 0.100 total
> test-loop
> 5000000100000000
> ./$i  0.18s user 0.00s system 97% cpu 0.184 total
> test-seq
> 5000000100000000
> ./$i  0.11s user 0.00s system 97% cpu 0.113 total
> test-stream
> 5000000100000000
> ./$i  0.44s user 0.00s system 98% cpu 0.449 total
>
>
> Note that sequence is imperative underneath, but can be safely used as a
> functional structure.
>
> --
> Simon Cruanes
>
> http://weusepgp.info/
> key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3  7D8D 4AC0 1D08 49AA
> 62B6
>