From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by sympa.inria.fr (Postfix) with ESMTPS id 5DC137ED26 for ; Thu, 7 Jun 2012 21:07:59 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcBAJL70E/RVaC2kGdsb2JhbABFo02QVggiAQEBAQkJDQcUBCOCGAEBAQQSAhMZARsSCwEDDAYFCw0NCBkiAREBBQEKEgYTEhCHWgEDCwuaZwkDjCKCcIR+ChknAwpXiHEBBQyLEYJngyUDlR2BEo0LPoQB X-IronPort-AV: E=Sophos;i="4.75,732,1330902000"; d="scan'208";a="146953251" Received: from mail-gh0-f182.google.com ([209.85.160.182]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-MD5; 07 Jun 2012 21:07:58 +0200 Received: by ghbz22 with SMTP id z22so1069879ghb.27 for ; Thu, 07 Jun 2012 12:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=EwzYEsVTVTdVrq//veEiVQmuhRb9j9svG0/pC320a80=; b=BI51SaIZrOkRwmhC8weCF3QWQgaruKruHeBXhjI7GSxubaWYqnGEPZimxgKCR1n4vi iWL2IeEq1rJ1WaXSPxv09VNldgtUQ5q7rum9UNOhikSsvCDU0i32Uc0CWNOMfaoN+97H VJHGC/mUHWyWU/u+uEiRzB8YL0LorRS0EA5IbK4uiP1NeooCo+pMkpj1JYOjas8VTA07 KtYw6lE2WyeL2oHp4WdtHpstB5gBYeyVGU0ZFDncGd08AkG205j2MaCmE5iAopmdjgYa 9isLD5ryjn2OOe5oZAmZpHc/dxxYGY9cpcjsdmPUQYMKz+EFVQW/gy2+JjCV0ajbOceL Ezhg== Received: by 10.50.217.137 with SMTP id oy9mr1948420igc.56.1339096077247; Thu, 07 Jun 2012 12:07:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.94.39 with HTTP; Thu, 7 Jun 2012 12:07:17 -0700 (PDT) In-Reply-To: References: From: Gabriel Scherer Date: Thu, 7 Jun 2012 21:07:17 +0200 Message-ID: To: =?ISO-8859-1?Q?Daniel_B=FCnzli?= Cc: caml-list Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Caml-list] Local functions with arguments or closures ? I do not know these parts of the compiler well, but here is how I investigated your question. TL;DR: for toy tail-recursive code that all fits in register, the first one might be slightly faster. It's crazy to take this into account to write code, except in a hot loop, premature optimization blah blah blah. 1. I had to come up with a compilable piece of code that would be reasonably representative of what you're trying to do. It actually was the hardest part. Here is my attempt: let f1 x =3D let v =3D !x in let rec loop1 x v =3D function | 0 -> () | n -> x :=3D !x + v; loop1 x v (n - 1); in loop1 x v 10 let f2 x =3D let v =3D !x in let rec loop2 =3D function | 0 -> () | n -> x :=3D !x + v; loop2 (n - 1); in loop2 10 2. Using the "print the intermediate representation" features of the native compiler, I looked at my favorite intermediate representations: -dlambda and -dcmm. In -dlambda, there is no difference (but the passing of more arguments), variables are accessed by names like in the source code (-dlambda output is very readable). -dcmm shows a difference: (function camlOptim__loop1_1033 (x/1034: addr v/1035: addr n/1036: addr) (if (!=3D n/1036 1) (seq (store x/1034 (+ (+ (load x/1034) v/1035) -1)) (app "camlOptim__loop1_1033" x/1034 v/1035 (+ n/1036 -2) addr)) 1a)) (function camlOptim__loop2_1040 (n/1041: addr env/1049: addr) (if (!=3D n/1041 1) (seq (store (load (+a env/1049 8)) (+ (+ (load (load (+a env/1049 8))) (load (+a env/1049 12))) -1)) (app "camlOptim__loop2_1040" (+ n/1041 -2) env/1049 addr)) 1a)) The first function uses the argument directly, while the second one loads them from an "env" argument corresponding, I suppose, to closure conversion. 3. I finally checked that lower-level code preserved this difference (it could have optimized it away). Instead of reading the assembly output of -S, I read -dlinear which has remnants of variable names and is therefore more readable. *** Linearized code camlOptim__loop1_1033: if n/10[%ecx] =3D=3Ds 1 goto L100 A/12[%edx] :=3D [x/8[%eax]] I/13[%edx] :=3D A/12[%edx] + v/9[%ebx] + -1 [x/8[%eax]] :=3D I/13[%edx] I/14[%ecx] :=3D I/14[%ecx] + -2 tailcall "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx] L100: A/11[%eax] :=3D 1 return R/0[%eax] *** Linearized code camlOptim__loop2_1040: if n/8[%eax] =3D=3Ds 1 goto L102 A/11[%esi] :=3D [env/9[%ebx] + 8] A/12[%edx] :=3D [env/9[%ebx] + 12] A/13[%ecx] :=3D [env/9[%ebx] + 8] A/14[%ecx] :=3D [A/13[%ecx]] I/15[%ecx] :=3D A/14[%ecx] + A/12[%edx] + -1 [A/11[%esi]] :=3D I/15[%ecx] I/16[%eax] :=3D I/16[%eax] + -2 tailcall "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx] L102: A/10[%eax] :=3D 1 return R/0[%eax] The first code "looks" more efficient, because it does less loads. 4. Reflexion phase. Is the difference noticeable? Your question seems to indicate that it is note. Here are a few reasons why that could be the case : - magic magic, talking about performance without benchmarking is silly - the processor is good at caching/fetching/whatever and reduces the cost of those loads drastically - for non-toy code, the performance different becomes neglectible (after all, it's just a few loads) - for a non-tail-recursive function, there is an argument passing cost that is comparable to the cost of the loads The last hypothesis is easy to check (if you call reading low-level code without running it "checking"): just swap the order of the recursive call and the asignment: let f1 x =3D let v =3D !x in let rec loop1 x v =3D function | 0 -> () | n -> loop1 x v (n - 1); x :=3D !x + v; in loop1 x v 10 let f2 x =3D let v =3D !x in let rec loop2 =3D function | 0 -> () | n -> loop2 (n - 1); x :=3D !x + v; in loop2 10 As was to be expected, the non-tail-call messes with the fine register allocation we had, and the first version gets two spills (one for "x" and one for "v"), while the second one only spills the environment (basically a record of "x" and "v"): camlOptim__loop1_1033: if n/10[%ecx] =3D=3Ds 1 goto L100 spilled-v/16[s1] :=3D v/9[%ebx] (spill) spilled-x/17[s0] :=3D x/8[%eax] (spill) I/12[%ecx] :=3D I/12[%ecx] + -2 {spilled-v/16[s1]* spilled-x/17[s0]*} call "camlOptim__loop1_1033" R/0[%eax] R/1[%ebx] R/2[%ecx] x/18[%ebx] :=3D spilled-x/17[s0] (reload) A/13[%ecx] :=3D [x/18[%ebx]] v/19[%eax] :=3D spilled-v/16[s1] (reload) I/14[%eax] :=3D A/13[%ecx] + v/19[%eax] + -1 [x/18[%ebx]] :=3D I/14[%eax] A/15[%eax] :=3D 1 reload retaddr return R/0[%eax] L100: A/11[%eax] :=3D 1 reload retaddr return R/0[%eax] camlOptim__loop2_1040: if n/8[%eax] =3D=3Ds 1 goto L103 spilled-env/18[s0] :=3D env/9[%ebx] (spill) I/11[%eax] :=3D I/11[%eax] + -2 {spilled-env/18[s0]*} call "camlOptim__loop2_1040" R/0[%eax] R/1[%ebx] env/19[%eax] :=3D spilled-env/18[s0] (reload) A/12[%ecx] :=3D [env/19[%eax] + 8] A/13[%ebx] :=3D [env/19[%eax] + 12] A/14[%eax] :=3D [env/19[%eax] + 8] A/15[%eax] :=3D [A/14[%eax]] I/16[%eax] :=3D A/15[%eax] + A/13[%ebx] + -1 [A/12[%ecx]] :=3D I/16[%eax] A/17[%eax] :=3D 1 reload retaddr return R/0[%eax] L103: A/10[%eax] :=3D 1 reload retaddr return R/0[%eax] You get two spills and two reloads in the first version, vs. one spill, one releoad, and two loads from the env in the second version. It is not the case anymore that the first one is "clearly" better. On Thu, Jun 7, 2012 at 7:34 PM, Daniel B=FCnzli wrote: > > Hello, > > In the past I remember having indirectly benchmarked two different styles= for writing local functions : > > =A0 =A0let f x =3D > =A0 =A0 =A0let v =3D ... in > =A0 =A0 =A0let rec loop x v =3D ... in > =A0 =A0 =A0loop x v > > or > > =A0 =A0let f x =3D > =A0 =A0 =A0 =A0let v =3D ... in > =A0 =A0 =A0 =A0let rec loop () =3D ... (* uses v and x *) in > =A0 =A0 =A0 =A0loop () > > without being able to reach a conclusion. Is there any particular style t= hat is definitively faster ? Does it maybe depend on the number of argument= s and/or on whether loop is recursive or not ? > > This question keeps coming back in my mind when I write local functions..= . I'm sure someone with some knowledge of the compiler's internals can prov= ide a more reasonable answer than benchmarks. > > Best, > > Daniel > > > > > > -- > Caml-list mailing list. =A0Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >