caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gabriel Scherer <gabriel.scherer@gmail.com>
To: "Daniel Bünzli" <daniel.buenzli@erratique.ch>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Local functions with arguments or closures ?
Date: Thu, 7 Jun 2012 21:07:17 +0200	[thread overview]
Message-ID: <CAPFanBHNR7zhMmmZ4qeiyhuF9nyQisDqBdqLQX258LGx-SogpQ@mail.gmail.com> (raw)
In-Reply-To: <CE0EBC4C252E4B178BD00227099BA4D4@erratique.ch>

I do not know these parts of the compiler well, but here is how
I investigated your question.

TL;DR: for toy tail-recursive code that all fits in register, the
first one might be slightly faster. It's crazy to take this into
account to write code, except in a hot loop, premature optimization
blah blah blah.

1. I had to come up with a compilable piece of code that would be
reasonably representative of what you're trying to do. It actually was
the hardest part. Here is my attempt:

  let f1 x =
    let v = !x in
    let rec loop1 x v = function
      | 0 -> ()
      | n ->
        x := !x + v;
        loop1 x v (n - 1);
    in
    loop1 x v 10

  let f2 x =
    let v = !x in
    let rec loop2 = function
      | 0 -> ()
      | n ->
        x := !x + v;
        loop2 (n - 1);
    in
    loop2 10

2. Using the "print the intermediate representation" features of the
native compiler, I looked at my favorite intermediate
representations: -dlambda and -dcmm. In -dlambda, there is no
difference (but the passing of more arguments), variables are accessed
by names like in the source code (-dlambda output is
very readable). -dcmm shows a difference:

  (function camlOptim__loop1_1033 (x/1034: addr v/1035: addr n/1036: addr)
   (if (!= n/1036 1)
     (seq (store x/1034 (+ (+ (load x/1034) v/1035) -1))
       (app "camlOptim__loop1_1033" x/1034 v/1035 (+ n/1036 -2) addr))
     1a))

  (function camlOptim__loop2_1040 (n/1041: addr env/1049: addr)
   (if (!= n/1041 1)
     (seq
       (store (load (+a env/1049 8))
         (+ (+ (load (load (+a env/1049 8))) (load (+a env/1049 12))) -1))
       (app "camlOptim__loop2_1040" (+ n/1041 -2) env/1049 addr))
     1a))

The first function uses the argument directly, while the second one
loads them from an "env" argument corresponding, I suppose, to closure
conversion.

3. I finally checked that lower-level code preserved this difference
(it could have optimized it away). Instead of reading the assembly
output of -S, I read -dlinear which has remnants of variable names and
is therefore more readable.

  *** Linearized code
  camlOptim__loop1_1033:
    if n/10[%ecx] ==s 1 goto L100
    A/12[%edx] := [x/8[%eax]]
    I/13[%edx] := A/12[%edx] + v/9[%ebx] + -1
    [x/8[%eax]] := I/13[%edx]
    I/14[%ecx] := I/14[%ecx] + -2
    tailcall "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
    L100:
    A/11[%eax] := 1
    return R/0[%eax]

  *** Linearized code
  camlOptim__loop2_1040:
    if n/8[%eax] ==s 1 goto L102
    A/11[%esi] := [env/9[%ebx] + 8]
    A/12[%edx] := [env/9[%ebx] + 12]
    A/13[%ecx] := [env/9[%ebx] + 8]
    A/14[%ecx] := [A/13[%ecx]]
    I/15[%ecx] := A/14[%ecx] + A/12[%edx] + -1
    [A/11[%esi]] := I/15[%ecx]
    I/16[%eax] := I/16[%eax] + -2
    tailcall "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
    L102:
    A/10[%eax] := 1
    return R/0[%eax]

The first code "looks" more efficient, because it does less loads.

4. Reflexion phase. Is the difference noticeable? Your question seems
to indicate that it is note. Here are a few reasons why that could be
the case
:
- magic magic, talking about performance without benchmarking is silly
- the processor is good at caching/fetching/whatever and reduces the
  cost of those loads drastically
- for non-toy code, the performance different becomes neglectible
  (after all, it's just a few loads)
- for a non-tail-recursive function, there is an argument passing cost
  that is comparable to the cost of the loads

The last hypothesis is easy to check (if you call reading low-level
code without running it "checking"): just swap the order of the
recursive call and the asignment:

  let f1 x =
    let v = !x in
    let rec loop1 x v = function
      | 0 -> ()
      | n ->
        loop1 x v (n - 1);
        x := !x + v;
    in
    loop1 x v 10

  let f2 x =
    let v = !x in
    let rec loop2 = function
      | 0 -> ()
      | n ->
        loop2 (n - 1);
        x := !x + v;
    in
    loop2 10

As was to be expected, the non-tail-call messes with the fine register
allocation we had, and the first version gets two spills (one for "x"
and one for "v"), while the second one only spills the environment
(basically a record of "x" and "v"):

  camlOptim__loop1_1033:
    if n/10[%ecx] ==s 1 goto L100
    spilled-v/16[s1] := v/9[%ebx] (spill)
    spilled-x/17[s0] := x/8[%eax] (spill)
    I/12[%ecx] := I/12[%ecx] + -2
    {spilled-v/16[s1]* spilled-x/17[s0]*}
    call "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx]
    x/18[%ebx] := spilled-x/17[s0] (reload)
    A/13[%ecx] := [x/18[%ebx]]
    v/19[%eax] := spilled-v/16[s1] (reload)
    I/14[%eax] := A/13[%ecx] + v/19[%eax] + -1
    [x/18[%ebx]] := I/14[%eax]
    A/15[%eax] := 1
    reload retaddr
    return R/0[%eax]
    L100:
    A/11[%eax] := 1
    reload retaddr
    return R/0[%eax]

  camlOptim__loop2_1040:
    if n/8[%eax] ==s 1 goto L103
    spilled-env/18[s0] := env/9[%ebx] (spill)
    I/11[%eax] := I/11[%eax] + -2
    {spilled-env/18[s0]*}
    call "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx]
    env/19[%eax] := spilled-env/18[s0] (reload)
    A/12[%ecx] := [env/19[%eax] + 8]
    A/13[%ebx] := [env/19[%eax] + 12]
    A/14[%eax] := [env/19[%eax] + 8]
    A/15[%eax] := [A/14[%eax]]
    I/16[%eax] := A/15[%eax] + A/13[%ebx] + -1
    [A/12[%ecx]] := I/16[%eax]
    A/17[%eax] := 1
    reload retaddr
    return R/0[%eax]
    L103:
    A/10[%eax] := 1
    reload retaddr
    return R/0[%eax]

You get two spills and two reloads in the first version, vs. one
spill, one releoad, and two loads from the env in the second
version. It is not the case anymore that the first one is "clearly"
better.

On Thu, Jun 7, 2012 at 7:34 PM, Daniel Bünzli
<daniel.buenzli@erratique.ch> wrote:
>
> Hello,
>
> In the past I remember having indirectly benchmarked two different styles for writing local functions :
>
>    let f x =
>      let v = ... in
>      let rec loop x v = ... in
>      loop x v
>
> or
>
>    let f x =
>        let v = ... in
>        let rec loop () = ... (* uses v and x *) in
>        loop ()
>
> without being able to reach a conclusion. Is there any particular style that is definitively faster ? Does it maybe depend on the number of arguments and/or on whether loop is recursive or not ?
>
> This question keeps coming back in my mind when I write local functions... I'm sure someone with some knowledge of the compiler's internals can provide a more reasonable answer than benchmarks.
>
> Best,
>
> Daniel
>
>
>
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

  parent reply	other threads:[~2012-06-07 19:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-07 17:34 Daniel Bünzli
2012-06-07 18:54 ` Yitzhak Mandelbaum
2012-06-07 19:07 ` Gabriel Scherer [this message]
2012-06-07 20:09   ` AW: " Gerd Stolpmann
2012-06-07 22:10     ` Alain Coste
2012-06-07 23:45 ` Ville-Pertti Keinonen
2012-06-16 23:14 ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPFanBHNR7zhMmmZ4qeiyhuF9nyQisDqBdqLQX258LGx-SogpQ@mail.gmail.com \
    --to=gabriel.scherer@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=daniel.buenzli@erratique.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).