caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Local functions with arguments or closures ?
@ 2012-06-07 17:34 Daniel Bünzli
  2012-06-07 18:54 ` Yitzhak Mandelbaum
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Daniel Bünzli @ 2012-06-07 17:34 UTC (permalink / raw)
  To: caml-list

Hello, 

In the past I remember having indirectly benchmarked two different styles for writing local functions :

    let f x = 
      let v = ... in
      let rec loop x v = ... in 
      loop x v 

or 

    let f x = 
        let v = ... in
        let rec loop () = ... (* uses v and x *) in 
        loop ()

without being able to reach a conclusion. Is there any particular style that is definitively faster ? Does it maybe depend on the number of arguments and/or on whether loop is recursive or not ? 

This question keeps coming back in my mind when I write local functions... I'm sure someone with some knowledge of the compiler's internals can provide a more reasonable answer than benchmarks.  

Best,

Daniel





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Local functions with arguments or closures ?
  2012-06-07 17:34 [Caml-list] Local functions with arguments or closures ? Daniel Bünzli
@ 2012-06-07 18:54 ` Yitzhak Mandelbaum
  2012-06-07 19:07 ` Gabriel Scherer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Yitzhak Mandelbaum @ 2012-06-07 18:54 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list

Daniel,

I can't speak definitively, but in my experience, the cost of using a closure had a noticeable performance impact when writing loops as recursive functions. 

Yitzhak

On Jun 7, 2012, at 1:34 PM, Daniel Bünzli wrote:

> Hello, 
> 
> In the past I remember having indirectly benchmarked two different styles for writing local functions :
> 
>    let f x = 
>      let v = ... in
>      let rec loop x v = ... in 
>      loop x v 
> 
> or 
> 
>    let f x = 
>        let v = ... in
>        let rec loop () = ... (* uses v and x *) in 
>        loop ()
> 
> without being able to reach a conclusion. Is there any particular style that is definitively faster ? Does it maybe depend on the number of arguments and/or on whether loop is recursive or not ? 
> 
> This question keeps coming back in my mind when I write local functions... I'm sure someone with some knowledge of the compiler's internals can provide a more reasonable answer than benchmarks.  
> 
> Best,
> 
> Daniel
> 
> 
> 
> 
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 

-----------------------------
Yitzhak Mandelbaum




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Local functions with arguments or closures ?
  2012-06-07 17:34 [Caml-list] Local functions with arguments or closures ? Daniel Bünzli
  2012-06-07 18:54 ` Yitzhak Mandelbaum
@ 2012-06-07 19:07 ` Gabriel Scherer
  2012-06-07 20:09   ` AW: " Gerd Stolpmann
  2012-06-07 23:45 ` Ville-Pertti Keinonen
  2012-06-16 23:14 ` Goswin von Brederlow
  3 siblings, 1 reply; 7+ messages in thread
From: Gabriel Scherer @ 2012-06-07 19:07 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list

I do not know these parts of the compiler well, but here is how
I investigated your question.

TL;DR: for toy tail-recursive code that all fits in register, the
first one might be slightly faster. It's crazy to take this into
account to write code, except in a hot loop, premature optimization
blah blah blah.

1. I had to come up with a compilable piece of code that would be
reasonably representative of what you're trying to do. It actually was
the hardest part. Here is my attempt:

  let f1 x =
    let v = !x in
    let rec loop1 x v = function
      | 0 -> ()
      | n ->
        x := !x + v;
        loop1 x v (n - 1);
    in
    loop1 x v 10

  let f2 x =
    let v = !x in
    let rec loop2 = function
      | 0 -> ()
      | n ->
        x := !x + v;
        loop2 (n - 1);
    in
    loop2 10

2. Using the "print the intermediate representation" features of the
native compiler, I looked at my favorite intermediate
representations: -dlambda and -dcmm. In -dlambda, there is no
difference (but the passing of more arguments), variables are accessed
by names like in the source code (-dlambda output is
very readable). -dcmm shows a difference:

  (function camlOptim__loop1_1033 (x/1034: addr v/1035: addr n/1036: addr)
   (if (!= n/1036 1)
     (seq (store x/1034 (+ (+ (load x/1034) v/1035) -1))
       (app "camlOptim__loop1_1033" x/1034 v/1035 (+ n/1036 -2) addr))
     1a))

  (function camlOptim__loop2_1040 (n/1041: addr env/1049: addr)
   (if (!= n/1041 1)
     (seq
       (store (load (+a env/1049 8))
         (+ (+ (load (load (+a env/1049 8))) (load (+a env/1049 12))) -1))
       (app "camlOptim__loop2_1040" (+ n/1041 -2) env/1049 addr))
     1a))

The first function uses the argument directly, while the second one
loads them from an "env" argument corresponding, I suppose, to closure
conversion.

3. I finally checked that lower-level code preserved this difference
(it could have optimized it away). Instead of reading the assembly
output of -S, I read -dlinear which has remnants of variable names and
is therefore more readable.

  *** Linearized code
  camlOptim__loop1_1033:
    if n/10[%ecx] ==s 1 goto L100
    A/12[%edx] := [x/8[%eax]]
    I/13[%edx] := A/12[%edx] + v/9[%ebx] + -1
    [x/8[%eax]] := I/13[%edx]
    I/14[%ecx] := I/14[%ecx] + -2
    tailcall "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
    L100:
    A/11[%eax] := 1
    return R/0[%eax]

  *** Linearized code
  camlOptim__loop2_1040:
    if n/8[%eax] ==s 1 goto L102
    A/11[%esi] := [env/9[%ebx] + 8]
    A/12[%edx] := [env/9[%ebx] + 12]
    A/13[%ecx] := [env/9[%ebx] + 8]
    A/14[%ecx] := [A/13[%ecx]]
    I/15[%ecx] := A/14[%ecx] + A/12[%edx] + -1
    [A/11[%esi]] := I/15[%ecx]
    I/16[%eax] := I/16[%eax] + -2
    tailcall "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
    L102:
    A/10[%eax] := 1
    return R/0[%eax]

The first code "looks" more efficient, because it does less loads.

4. Reflexion phase. Is the difference noticeable? Your question seems
to indicate that it is note. Here are a few reasons why that could be
the case
:
- magic magic, talking about performance without benchmarking is silly
- the processor is good at caching/fetching/whatever and reduces the
  cost of those loads drastically
- for non-toy code, the performance different becomes neglectible
  (after all, it's just a few loads)
- for a non-tail-recursive function, there is an argument passing cost
  that is comparable to the cost of the loads

The last hypothesis is easy to check (if you call reading low-level
code without running it "checking"): just swap the order of the
recursive call and the asignment:

  let f1 x =
    let v = !x in
    let rec loop1 x v = function
      | 0 -> ()
      | n ->
        loop1 x v (n - 1);
        x := !x + v;
    in
    loop1 x v 10

  let f2 x =
    let v = !x in
    let rec loop2 = function
      | 0 -> ()
      | n ->
        loop2 (n - 1);
        x := !x + v;
    in
    loop2 10

As was to be expected, the non-tail-call messes with the fine register
allocation we had, and the first version gets two spills (one for "x"
and one for "v"), while the second one only spills the environment
(basically a record of "x" and "v"):

  camlOptim__loop1_1033:
    if n/10[%ecx] ==s 1 goto L100
    spilled-v/16[s1] := v/9[%ebx] (spill)
    spilled-x/17[s0] := x/8[%eax] (spill)
    I/12[%ecx] := I/12[%ecx] + -2
    {spilled-v/16[s1]* spilled-x/17[s0]*}
    call "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
    x/18[%ebx] := spilled-x/17[s0] (reload)
    A/13[%ecx] := [x/18[%ebx]]
    v/19[%eax] := spilled-v/16[s1] (reload)
    I/14[%eax] := A/13[%ecx] + v/19[%eax] + -1
    [x/18[%ebx]] := I/14[%eax]
    A/15[%eax] := 1
    reload retaddr
    return R/0[%eax]
    L100:
    A/11[%eax] := 1
    reload retaddr
    return R/0[%eax]

  camlOptim__loop2_1040:
    if n/8[%eax] ==s 1 goto L103
    spilled-env/18[s0] := env/9[%ebx] (spill)
    I/11[%eax] := I/11[%eax] + -2
    {spilled-env/18[s0]*}
    call "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
    env/19[%eax] := spilled-env/18[s0] (reload)
    A/12[%ecx] := [env/19[%eax] + 8]
    A/13[%ebx] := [env/19[%eax] + 12]
    A/14[%eax] := [env/19[%eax] + 8]
    A/15[%eax] := [A/14[%eax]]
    I/16[%eax] := A/15[%eax] + A/13[%ebx] + -1
    [A/12[%ecx]] := I/16[%eax]
    A/17[%eax] := 1
    reload retaddr
    return R/0[%eax]
    L103:
    A/10[%eax] := 1
    reload retaddr
    return R/0[%eax]

You get two spills and two reloads in the first version, vs. one
spill, one releoad, and two loads from the env in the second
version. It is not the case anymore that the first one is "clearly"
better.

On Thu, Jun 7, 2012 at 7:34 PM, Daniel Bünzli
<daniel.buenzli@erratique.ch> wrote:
>
> Hello,
>
> In the past I remember having indirectly benchmarked two different styles for writing local functions :
>
>    let f x =
>      let v = ... in
>      let rec loop x v = ... in
>      loop x v
>
> or
>
>    let f x =
>        let v = ... in
>        let rec loop () = ... (* uses v and x *) in
>        loop ()
>
> without being able to reach a conclusion. Is there any particular style that is definitively faster ? Does it maybe depend on the number of arguments and/or on whether loop is recursive or not ?
>
> This question keeps coming back in my mind when I write local functions... I'm sure someone with some knowledge of the compiler's internals can provide a more reasonable answer than benchmarks.
>
> Best,
>
> Daniel
>
>
>
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* AW: [Caml-list] Local functions with arguments or closures ?
  2012-06-07 19:07 ` Gabriel Scherer
@ 2012-06-07 20:09   ` Gerd Stolpmann
  2012-06-07 22:10     ` Alain Coste
  0 siblings, 1 reply; 7+ messages in thread
From: Gerd Stolpmann @ 2012-06-07 20:09 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Daniel Bünzli, caml-list

I think it depends very much would you do inside your loop. In  
Gabriel's example, there is only an addition, so you will probably see  
a difference in runtime. But that's not what you do in practice here.  
Imagine the loop iterates over a data structure, and let's assume it is  
big enough that it is not completely cached. Then, the few extra  
register loads will make no difference, they are just hidden by the  
speculative execution done by modern CPUs (i.e. the loads are done  
while the CPU waits for the data from RAM).

So yes, passing the arguments explicitly is probably only worth for  
extremely simple cases that do not access data massively, but otherwise  
use the style you like.

Gerd

Am 07.06.2012 21:07:17 schrieb(en) Gabriel Scherer:
> I do not know these parts of the compiler well, but here is how
> I investigated your question.
> 
> TL;DR: for toy tail-recursive code that all fits in register, the
> first one might be slightly faster. It's crazy to take this into
> account to write code, except in a hot loop, premature optimization
> blah blah blah.
> 
> 1. I had to come up with a compilable piece of code that would be
> reasonably representative of what you're trying to do. It actually was
> the hardest part. Here is my attempt:
> 
>   let f1 x =
>     let v = !x in
>     let rec loop1 x v = function
>       | 0 -> ()
>       | n ->
>         x := !x + v;
>         loop1 x v (n - 1);
>     in
>     loop1 x v 10
> 
>   let f2 x =
>     let v = !x in
>     let rec loop2 = function
>       | 0 -> ()
>       | n ->
>         x := !x + v;
>         loop2 (n - 1);
>     in
>     loop2 10
> 
> 2. Using the "print the intermediate representation" features of the
> native compiler, I looked at my favorite intermediate
> representations: -dlambda and -dcmm. In -dlambda, there is no
> difference (but the passing of more arguments), variables are accessed
> by names like in the source code (-dlambda output is
> very readable). -dcmm shows a difference:
> 
>   (function camlOptim__loop1_1033 (x/1034: addr v/1035: addr n/1036:  
> addr)
>    (if (!= n/1036 1)
>      (seq (store x/1034 (+ (+ (load x/1034) v/1035) -1))
>        (app "camlOptim__loop1_1033" x/1034 v/1035 (+ n/1036 -2) addr))
>      1a))
> 
>   (function camlOptim__loop2_1040 (n/1041: addr env/1049: addr)
>    (if (!= n/1041 1)
>      (seq
>        (store (load (+a env/1049 8))
>          (+ (+ (load (load (+a env/1049 8))) (load (+a env/1049 12)))  
> -1))
>        (app "camlOptim__loop2_1040" (+ n/1041 -2) env/1049 addr))
>      1a))
> 
> The first function uses the argument directly, while the second one
> loads them from an "env" argument corresponding, I suppose, to closure
> conversion.
> 
> 3. I finally checked that lower-level code preserved this difference
> (it could have optimized it away). Instead of reading the assembly
> output of -S, I read -dlinear which has remnants of variable names and
> is therefore more readable.
> 
>   *** Linearized code
>   camlOptim__loop1_1033:
>     if n/10[%ecx] ==s 1 goto L100
>     A/12[%edx] := [x/8[%eax]]
>     I/13[%edx] := A/12[%edx] + v/9[%ebx] + -1
>     [x/8[%eax]] := I/13[%edx]
>     I/14[%ecx] := I/14[%ecx] + -2
>     tailcall "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
>     L100:
>     A/11[%eax] := 1
>     return R/0[%eax]
> 
>   *** Linearized code
>   camlOptim__loop2_1040:
>     if n/8[%eax] ==s 1 goto L102
>     A/11[%esi] := [env/9[%ebx] + 8]
>     A/12[%edx] := [env/9[%ebx] + 12]
>     A/13[%ecx] := [env/9[%ebx] + 8]
>     A/14[%ecx] := [A/13[%ecx]]
>     I/15[%ecx] := A/14[%ecx] + A/12[%edx] + -1
>     [A/11[%esi]] := I/15[%ecx]
>     I/16[%eax] := I/16[%eax] + -2
>     tailcall "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
>     L102:
>     A/10[%eax] := 1
>     return R/0[%eax]
> 
> The first code "looks" more efficient, because it does less loads.
> 
> 4. Reflexion phase. Is the difference noticeable? Your question seems
> to indicate that it is note. Here are a few reasons why that could be
> the case
> :
> - magic magic, talking about performance without benchmarking is silly
> - the processor is good at caching/fetching/whatever and reduces the
>   cost of those loads drastically
> - for non-toy code, the performance different becomes neglectible
>   (after all, it's just a few loads)
> - for a non-tail-recursive function, there is an argument passing cost
>   that is comparable to the cost of the loads
> 
> The last hypothesis is easy to check (if you call reading low-level
> code without running it "checking"): just swap the order of the
> recursive call and the asignment:
> 
>   let f1 x =
>     let v = !x in
>     let rec loop1 x v = function
>       | 0 -> ()
>       | n ->
>         loop1 x v (n - 1);
>         x := !x + v;
>     in
>     loop1 x v 10
> 
>   let f2 x =
>     let v = !x in
>     let rec loop2 = function
>       | 0 -> ()
>       | n ->
>         loop2 (n - 1);
>         x := !x + v;
>     in
>     loop2 10
> 
> As was to be expected, the non-tail-call messes with the fine register
> allocation we had, and the first version gets two spills (one for "x"
> and one for "v"), while the second one only spills the environment
> (basically a record of "x" and "v"):
> 
>   camlOptim__loop1_1033:
>     if n/10[%ecx] ==s 1 goto L100
>     spilled-v/16[s1] := v/9[%ebx] (spill)
>     spilled-x/17[s0] := x/8[%eax] (spill)
>     I/12[%ecx] := I/12[%ecx] + -2
>     {spilled-v/16[s1]* spilled-x/17[s0]*}
>     call "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
>     x/18[%ebx] := spilled-x/17[s0] (reload)
>     A/13[%ecx] := [x/18[%ebx]]
>     v/19[%eax] := spilled-v/16[s1] (reload)
>     I/14[%eax] := A/13[%ecx] + v/19[%eax] + -1
>     [x/18[%ebx]] := I/14[%eax]
>     A/15[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
>     L100:
>     A/11[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
> 
>   camlOptim__loop2_1040:
>     if n/8[%eax] ==s 1 goto L103
>     spilled-env/18[s0] := env/9[%ebx] (spill)
>     I/11[%eax] := I/11[%eax] + -2
>     {spilled-env/18[s0]*}
>     call "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
>     env/19[%eax] := spilled-env/18[s0] (reload)
>     A/12[%ecx] := [env/19[%eax] + 8]
>     A/13[%ebx] := [env/19[%eax] + 12]
>     A/14[%eax] := [env/19[%eax] + 8]
>     A/15[%eax] := [A/14[%eax]]
>     I/16[%eax] := A/15[%eax] + A/13[%ebx] + -1
>     [A/12[%ecx]] := I/16[%eax]
>     A/17[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
>     L103:
>     A/10[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
> 
> You get two spills and two reloads in the first version, vs. one
> spill, one releoad, and two loads from the env in the second
> version. It is not the case anymore that the first one is "clearly"
> better.
> 
> On Thu, Jun 7, 2012 at 7:34 PM, Daniel Bünzli
> <daniel.buenzli@erratique.ch> wrote:
> >
> > Hello,
> >
> > In the past I remember having indirectly benchmarked two different  
> styles for writing local functions :
> >
> >    let f x =
> >      let v = ... in
> >      let rec loop x v = ... in
> >      loop x v
> >
> > or
> >
> >    let f x =
> >        let v = ... in
> >        let rec loop () = ... (* uses v and x *) in
> >        loop ()
> >
> > without being able to reach a conclusion. Is there any particular  
> style that is definitively faster ? Does it maybe depend on the  
> number of arguments and/or on whether loop is recursive or not ?
> >
> > This question keeps coming back in my mind when I write local  
> functions... I'm sure someone with some knowledge of the compiler's  
> internals can provide a more reasonable answer than benchmarks.
> >
> > Best,
> >
> > Daniel
> >
> >
> >
> >
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa-roc.inria.fr/wws/info/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> >
> 
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 
> 



-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Local functions with arguments or closures ?
  2012-06-07 20:09   ` AW: " Gerd Stolpmann
@ 2012-06-07 22:10     ` Alain Coste
  0 siblings, 0 replies; 7+ messages in thread
From: Alain Coste @ 2012-06-07 22:10 UTC (permalink / raw)
  To: caml-list

Hello,

This question recalls me a related one I am often asking myself  when 
writing large functions (say 50 to 100 lines) which use several auxilliary 
functions sharing the same parameters. For example

let f x =
    let g1 () = .. reference to x ... in
    .......
    let gn () = .... reference to x .... in
    calls to to g1() ... gn()

I would like to have an idea of the cost of putting the code of g1...gn 
"inside" f  rather than defining them separately and passing x as a 
parameter, or using a reference

With x as parameter:

let g1 x = .....
....
let gn x = ....
let f x = calls to g1 x .... gn x

Or with a reference:

let f =
    let xr = ref 0 in
    let g1 () = ...!xr ... in
    ....
    let gn () = ... !xr... in
    fun x -> xr := x; .. calls to g1()...gn()

Some years ago I had the same problem with an interpreted language, and the 
answer was clearly in favor of the second or third solution, as the body of 
the function f was evaluated only when the function was applied (and as many 
times it was applied).
But it seems that with a compiler things are different as the body of the 
large function is treated at compile time, and only once. So I suppose the 
extra cost is only to build a closure, but does not depend on the size of 
the body of the large function f.
I generally use the first solution, but I sometimes have some doubts 
concerning efficiency...

Alain Coste

----- Original Message ----- 
From: "Gerd Stolpmann" <info@gerd-stolpmann.de>
To: "Gabriel Scherer" <gabriel.scherer@gmail.com>
Cc: "Daniel Bünzli" <daniel.buenzli@erratique.ch>; "caml-list" 
<caml-list@inria.fr>
Sent: Thursday, June 07, 2012 10:09 PM
Subject: AW: [Caml-list] Local functions with arguments or closures ?


I think it depends very much would you do inside your loop. In
Gabriel's example, there is only an addition, so you will probably see
a difference in runtime. But that's not what you do in practice here.
Imagine the loop iterates over a data structure, and let's assume it is
big enough that it is not completely cached. Then, the few extra
register loads will make no difference, they are just hidden by the
speculative execution done by modern CPUs (i.e. the loads are done
while the CPU waits for the data from RAM).

So yes, passing the arguments explicitly is probably only worth for
extremely simple cases that do not access data massively, but otherwise
use the style you like.

Gerd


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Local functions with arguments or closures ?
  2012-06-07 17:34 [Caml-list] Local functions with arguments or closures ? Daniel Bünzli
  2012-06-07 18:54 ` Yitzhak Mandelbaum
  2012-06-07 19:07 ` Gabriel Scherer
@ 2012-06-07 23:45 ` Ville-Pertti Keinonen
  2012-06-16 23:14 ` Goswin von Brederlow
  3 siblings, 0 replies; 7+ messages in thread
From: Ville-Pertti Keinonen @ 2012-06-07 23:45 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list


On Jun 7, 2012, at 8:34 PM, Daniel Bünzli wrote:

> In the past I remember having indirectly benchmarked two different styles for writing local functions :
> without being able to reach a conclusion. Is there any particular style that is definitively faster ? Does it maybe depend on the number of arguments and/or on whether loop is recursive or not ? 

As others have said the difference on modern CPUs shouldn't be too big for the loop itself.

However, if the inner function is a helper function (or a loop that only iterates a few times) and it's f that is called in a tight loop…it could be the case that without closures, f would allocate no memory, in which case there will be a definite difference.

It's still premature optimization unless you're actually running into problems.

Lambda lifting should be done by the compiler, not the programmer.  It probably wasn't worth it for OCaml because closures are dirt cheap compared to many other language implementations, and spending any significant time not allocating memory is quite rare.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Local functions with arguments or closures ?
  2012-06-07 17:34 [Caml-list] Local functions with arguments or closures ? Daniel Bünzli
                   ` (2 preceding siblings ...)
  2012-06-07 23:45 ` Ville-Pertti Keinonen
@ 2012-06-16 23:14 ` Goswin von Brederlow
  3 siblings, 0 replies; 7+ messages in thread
From: Goswin von Brederlow @ 2012-06-16 23:14 UTC (permalink / raw)
  To: Daniel Buenzli; +Cc: caml-list

Daniel Bünzli <daniel.buenzli@erratique.ch> writes:

> Hello, 
>
> In the past I remember having indirectly benchmarked two different styles for writing local functions :
>
>     let f x = 
>       let v = ... in
>       let rec loop x v = ... in 
>       loop x v 
>
> or 
>
>     let f x = 
>         let v = ... in
>         let rec loop () = ... (* uses v and x *) in 
>         loop ()
>
> without being able to reach a conclusion. Is there any particular style that is definitively faster ? Does it maybe depend on the number of arguments and/or on whether loop is recursive or not ? 
>
> This question keeps coming back in my mind when I write local functions... I'm sure someone with some knowledge of the compiler's internals can provide a more reasonable answer than benchmarks.  
>
> Best,
>
> Daniel

Closure arguments prevent what little optimization there is in ocaml. So
better avoid them where time counts. But your example doesn't have any
closure arguments.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-16 23:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-07 17:34 [Caml-list] Local functions with arguments or closures ? Daniel Bünzli
2012-06-07 18:54 ` Yitzhak Mandelbaum
2012-06-07 19:07 ` Gabriel Scherer
2012-06-07 20:09   ` AW: " Gerd Stolpmann
2012-06-07 22:10     ` Alain Coste
2012-06-07 23:45 ` Ville-Pertti Keinonen
2012-06-16 23:14 ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).