caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Gabriel Scherer <gabriel.scherer@gmail.com>
Cc: "Daniel Bünzli" <daniel.buenzli@erratique.ch>,
	caml-list <caml-list@inria.fr>
Subject: AW: [Caml-list] Local functions with arguments or closures ?
Date: Thu, 07 Jun 2012 22:09:54 +0200	[thread overview]
Message-ID: <1339099794.4950.8@samsung> (raw)
In-Reply-To: <CAPFanBHNR7zhMmmZ4qeiyhuF9nyQisDqBdqLQX258LGx-SogpQ@mail.gmail.com> (from gabriel.scherer@gmail.com on Thu Jun  7 21:07:17 2012)

I think it depends very much would you do inside your loop. In  
Gabriel's example, there is only an addition, so you will probably see  
a difference in runtime. But that's not what you do in practice here.  
Imagine the loop iterates over a data structure, and let's assume it is  
big enough that it is not completely cached. Then, the few extra  
register loads will make no difference, they are just hidden by the  
speculative execution done by modern CPUs (i.e. the loads are done  
while the CPU waits for the data from RAM).

So yes, passing the arguments explicitly is probably only worth for  
extremely simple cases that do not access data massively, but otherwise  
use the style you like.

Gerd

Am 07.06.2012 21:07:17 schrieb(en) Gabriel Scherer:
> I do not know these parts of the compiler well, but here is how
> I investigated your question.
> 
> TL;DR: for toy tail-recursive code that all fits in register, the
> first one might be slightly faster. It's crazy to take this into
> account to write code, except in a hot loop, premature optimization
> blah blah blah.
> 
> 1. I had to come up with a compilable piece of code that would be
> reasonably representative of what you're trying to do. It actually was
> the hardest part. Here is my attempt:
> 
>   let f1 x =
>     let v = !x in
>     let rec loop1 x v = function
>       | 0 -> ()
>       | n ->
>         x := !x + v;
>         loop1 x v (n - 1);
>     in
>     loop1 x v 10
> 
>   let f2 x =
>     let v = !x in
>     let rec loop2 = function
>       | 0 -> ()
>       | n ->
>         x := !x + v;
>         loop2 (n - 1);
>     in
>     loop2 10
> 
> 2. Using the "print the intermediate representation" features of the
> native compiler, I looked at my favorite intermediate
> representations: -dlambda and -dcmm. In -dlambda, there is no
> difference (but the passing of more arguments), variables are accessed
> by names like in the source code (-dlambda output is
> very readable). -dcmm shows a difference:
> 
>   (function camlOptim__loop1_1033 (x/1034: addr v/1035: addr n/1036:  
> addr)
>    (if (!= n/1036 1)
>      (seq (store x/1034 (+ (+ (load x/1034) v/1035) -1))
>        (app "camlOptim__loop1_1033" x/1034 v/1035 (+ n/1036 -2) addr))
>      1a))
> 
>   (function camlOptim__loop2_1040 (n/1041: addr env/1049: addr)
>    (if (!= n/1041 1)
>      (seq
>        (store (load (+a env/1049 8))
>          (+ (+ (load (load (+a env/1049 8))) (load (+a env/1049 12)))  
> -1))
>        (app "camlOptim__loop2_1040" (+ n/1041 -2) env/1049 addr))
>      1a))
> 
> The first function uses the argument directly, while the second one
> loads them from an "env" argument corresponding, I suppose, to closure
> conversion.
> 
> 3. I finally checked that lower-level code preserved this difference
> (it could have optimized it away). Instead of reading the assembly
> output of -S, I read -dlinear which has remnants of variable names and
> is therefore more readable.
> 
>   *** Linearized code
>   camlOptim__loop1_1033:
>     if n/10[%ecx] ==s 1 goto L100
>     A/12[%edx] := [x/8[%eax]]
>     I/13[%edx] := A/12[%edx] + v/9[%ebx] + -1
>     [x/8[%eax]] := I/13[%edx]
>     I/14[%ecx] := I/14[%ecx] + -2
>     tailcall "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
>     L100:
>     A/11[%eax] := 1
>     return R/0[%eax]
> 
>   *** Linearized code
>   camlOptim__loop2_1040:
>     if n/8[%eax] ==s 1 goto L102
>     A/11[%esi] := [env/9[%ebx] + 8]
>     A/12[%edx] := [env/9[%ebx] + 12]
>     A/13[%ecx] := [env/9[%ebx] + 8]
>     A/14[%ecx] := [A/13[%ecx]]
>     I/15[%ecx] := A/14[%ecx] + A/12[%edx] + -1
>     [A/11[%esi]] := I/15[%ecx]
>     I/16[%eax] := I/16[%eax] + -2
>     tailcall "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
>     L102:
>     A/10[%eax] := 1
>     return R/0[%eax]
> 
> The first code "looks" more efficient, because it does less loads.
> 
> 4. Reflexion phase. Is the difference noticeable? Your question seems
> to indicate that it is note. Here are a few reasons why that could be
> the case
> :
> - magic magic, talking about performance without benchmarking is silly
> - the processor is good at caching/fetching/whatever and reduces the
>   cost of those loads drastically
> - for non-toy code, the performance different becomes neglectible
>   (after all, it's just a few loads)
> - for a non-tail-recursive function, there is an argument passing cost
>   that is comparable to the cost of the loads
> 
> The last hypothesis is easy to check (if you call reading low-level
> code without running it "checking"): just swap the order of the
> recursive call and the asignment:
> 
>   let f1 x =
>     let v = !x in
>     let rec loop1 x v = function
>       | 0 -> ()
>       | n ->
>         loop1 x v (n - 1);
>         x := !x + v;
>     in
>     loop1 x v 10
> 
>   let f2 x =
>     let v = !x in
>     let rec loop2 = function
>       | 0 -> ()
>       | n ->
>         loop2 (n - 1);
>         x := !x + v;
>     in
>     loop2 10
> 
> As was to be expected, the non-tail-call messes with the fine register
> allocation we had, and the first version gets two spills (one for "x"
> and one for "v"), while the second one only spills the environment
> (basically a record of "x" and "v"):
> 
>   camlOptim__loop1_1033:
>     if n/10[%ecx] ==s 1 goto L100
>     spilled-v/16[s1] := v/9[%ebx] (spill)
>     spilled-x/17[s0] := x/8[%eax] (spill)
>     I/12[%ecx] := I/12[%ecx] + -2
>     {spilled-v/16[s1]* spilled-x/17[s0]*}
>     call "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
>     x/18[%ebx] := spilled-x/17[s0] (reload)
>     A/13[%ecx] := [x/18[%ebx]]
>     v/19[%eax] := spilled-v/16[s1] (reload)
>     I/14[%eax] := A/13[%ecx] + v/19[%eax] + -1
>     [x/18[%ebx]] := I/14[%eax]
>     A/15[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
>     L100:
>     A/11[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
> 
>   camlOptim__loop2_1040:
>     if n/8[%eax] ==s 1 goto L103
>     spilled-env/18[s0] := env/9[%ebx] (spill)
>     I/11[%eax] := I/11[%eax] + -2
>     {spilled-env/18[s0]*}
>     call "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
>     env/19[%eax] := spilled-env/18[s0] (reload)
>     A/12[%ecx] := [env/19[%eax] + 8]
>     A/13[%ebx] := [env/19[%eax] + 12]
>     A/14[%eax] := [env/19[%eax] + 8]
>     A/15[%eax] := [A/14[%eax]]
>     I/16[%eax] := A/15[%eax] + A/13[%ebx] + -1
>     [A/12[%ecx]] := I/16[%eax]
>     A/17[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
>     L103:
>     A/10[%eax] := 1
>     reload retaddr
>     return R/0[%eax]
> 
> You get two spills and two reloads in the first version, vs. one
> spill, one releoad, and two loads from the env in the second
> version. It is not the case anymore that the first one is "clearly"
> better.
> 
> On Thu, Jun 7, 2012 at 7:34 PM, Daniel Bünzli
> <daniel.buenzli@erratique.ch> wrote:
> >
> > Hello,
> >
> > In the past I remember having indirectly benchmarked two different  
> styles for writing local functions :
> >
> >    let f x =
> >      let v = ... in
> >      let rec loop x v = ... in
> >      loop x v
> >
> > or
> >
> >    let f x =
> >        let v = ... in
> >        let rec loop () = ... (* uses v and x *) in
> >        loop ()
> >
> > without being able to reach a conclusion. Is there any particular  
> style that is definitively faster ? Does it maybe depend on the  
> number of arguments and/or on whether loop is recursive or not ?
> >
> > This question keeps coming back in my mind when I write local  
> functions... I'm sure someone with some knowledge of the compiler's  
> internals can provide a more reasonable answer than benchmarks.
> >
> > Best,
> >
> > Daniel
> >
> >
> >
> >
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa-roc.inria.fr/wws/info/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> >
> 
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 
> 



-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------

  reply	other threads:[~2012-06-07 20:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-07 17:34 Daniel Bünzli
2012-06-07 18:54 ` Yitzhak Mandelbaum
2012-06-07 19:07 ` Gabriel Scherer
2012-06-07 20:09   ` Gerd Stolpmann [this message]
2012-06-07 22:10     ` Alain Coste
2012-06-07 23:45 ` Ville-Pertti Keinonen
2012-06-16 23:14 ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1339099794.4950.8@samsung \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=daniel.buenzli@erratique.ch \
    --cc=gabriel.scherer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).