caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Benchmarking different dispatch types
@ 2007-01-18  1:12 Nathaniel Gray
  2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nathaniel Gray @ 2007-01-18  1:12 UTC (permalink / raw)
  To: Caml List

As somebody trying to understand the performance of OCaml, I've often
wondered about the performance of the different forms of function
dispatch.  How do method calls compare to function calls?  How about
closure calls?  So I tried using the Benchmark library[1] to do a
quick test:

========
(* Test method dispatch vs. function dispatch vs. closure dispatch
   Make sure to compile with -inline 0
 *)

let f x =
   x + 100
let call_f () = f 1

let o = object
   method f_o x = x + 100
end
let call_o () = o#f_o 1

let f_c () x = x + 100
let f_c' = f_c ()
let call_fc () = f_c' 1

let o_c = object
   method f_oc () x = x + 100
end
let f_oc' = o_c#f_oc ()
let call_foc () = f_oc' 1


open Benchmark

let _ =
   let results = latencyN 40000
      [("function", call_f, ());
       ("method", call_o, ());
       ("closure", call_fc, ());
       ("obj. closure", call_foc, ())]
   in
   tabulate results

========

Here's the output (on a PPC G4 1.25 GHz):

========
Latencies for 40000 iterations of function, method, closure, obj. closure:
  function:  0 WALL ( 0.00 usr + -0.00 sys =  0.00 CPU) @
305343511.45/s (n=40000)
            (warning: too few iterations for a reliable count)
    method:  0 WALL ( 0.00 usr + -0.00 sys =  0.00 CPU) @
27081922.82/s (n=40000)
            (warning: too few iterations for a reliable count)
   closure:  0 WALL ( 0.00 usr +  0.00 sys =  0.00 CPU) @
30280090.84/s (n=40000)
            (warning: too few iterations for a reliable count)
obj. closure:  0 WALL ( 0.00 usr +  0.00 sys =  0.00 CPU) @
26058631.92/s (n=40000)
            (warning: too few iterations for a reliable count)
                    Rate       method obj. closure      closure     function
      method  25974026/s           --          -5%         -16%         -90%
obj. closure  27210884/s           5%           --         -12%         -89%
     closure  31007752/s          19%          14%           --         -88%
    function 254777070/s         881%         836%         722%           --

Interesting, but are they meaningful?  The warnings from Benchmark are
troubling, but I didn't have any immediate ideas on how to get rid of
them.  Any suggestions?

Thanks,
-n8

[1] http://ocaml-benchmark.sourceforge.net

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  1:12 Benchmarking different dispatch types Nathaniel Gray
@ 2007-01-18  2:17 ` Edgar Friendly
  2007-01-18  3:03   ` Jonathan Roewen
                     ` (3 more replies)
  2007-01-18 16:56 ` William D. Neumann
  2007-01-19  0:50 ` Jacques Garrigue
  2 siblings, 4 replies; 11+ messages in thread
From: Edgar Friendly @ 2007-01-18  2:17 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Caml List

Nathaniel Gray wrote:
> 
> Here's the output (on a PPC G4 1.25 GHz):
> 
> ========
> Latencies for 40000 iterations of function, method, closure, obj. closure:
>  function:  0 WALL ( 0.00 usr + -0.00 sys =  0.00 CPU) @
> 305343511.45/s (n=40000)
>            (warning: too few iterations for a reliable count)
>    method:  0 WALL ( 0.00 usr + -0.00 sys =  0.00 CPU) @
> 27081922.82/s (n=40000)
>            (warning: too few iterations for a reliable count)
>   closure:  0 WALL ( 0.00 usr +  0.00 sys =  0.00 CPU) @
> 30280090.84/s (n=40000)
>            (warning: too few iterations for a reliable count)
> obj. closure:  0 WALL ( 0.00 usr +  0.00 sys =  0.00 CPU) @
> 26058631.92/s (n=40000)
>            (warning: too few iterations for a reliable count)
>                    Rate       method obj. closure      closure     function
>      method  25974026/s           --          -5%         -16%         -90%
> obj. closure  27210884/s           5%           --         -12%        
> -89%
>     closure  31007752/s          19%          14%           --         -88%
>    function 254777070/s         881%         836%         722%           --
> 
> Interesting, but are they meaningful?  The warnings from Benchmark are
> troubling, but I didn't have any immediate ideas on how to get rid of
> them.  Any suggestions?
> 
> Thanks,
> -n8
> 
> [1] http://ocaml-benchmark.sourceforge.net
> 

well, running only 40,000 iterations is way too low because timing
errors are going to get in the way of an accurate answer.  On my
computer, I bumped the iterations up to max_int, and still the function
call was still taking less than one CPU second of time (which I guess is
the requirement for the warning to disappear).

Here's my numbers from an Athlon XP-M 2000+ (1.53GHz), compiled with
ocaml 3.09.3, cmd. line:
$ ocamlfind ocamlopt -package "benchmark" -inline 0 unix.cmxa
benchmark.cmxa  dispatch.ml


Latencies for 1073741823 iterations of function, method, closure, obj.
closure:
  function:  0 WALL (-0.02 usr + -0.00 sys = -0.02 CPU)
            (warning: too few iterations for a reliable count)
    method: 15 WALL (11.34 usr +  0.49 sys = 11.83 CPU) @ 90764313.02/s
(n=1073741823)
   closure:  4 WALL ( 2.60 usr + -0.60 sys =  2.00 CPU) @ 536870911.50/s
(n=1073741823)
obj. closure:  8 WALL ( 4.31 usr +  0.03 sys =  4.34 CPU) @
247405950.00/s (n=1073741823)
                       Rate     function       method obj. closure
closure
    function -5.36871e+10/s           --      -59250%      -21800%
-10100%
      method     90764313/s        -100%           --         -63%
   -83%
obj. closure    247405950/s        -100%         173%           --
   -54%
     closure    536870911/s        -101%         491%         117%
     --

Either function calls are just that stupidly efficient, or there's some
optimization still going on. I'm guessing the second.

E.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
@ 2007-01-18  3:03   ` Jonathan Roewen
  2007-01-18 23:57     ` Nathaniel Gray
  2007-01-18 15:52   ` Remi Vanicat
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Jonathan Roewen @ 2007-01-18  3:03 UTC (permalink / raw)
  To: Edgar Friendly; +Cc: Nathaniel Gray, Caml List

>From what I understand, anything other than a function is bound to be
fairly slow in comparison.

For closures, you first have to build the closure from the
environment, and then invoke it; and methods require some sort of
lookup. Functions, on the other hand, are fairly simply: it's just a
call to a known address (putting aside passing the actual arguments to
the function/method call).

Sure, there may be some optimisations to be gained in some places, or
a bit over-simplistic in other cases, but the basic premise is the
same for supporting the overall view the numbers give (not so much the
ratios) IMO.

Jonathan

On 1/18/07, Edgar Friendly <thelema314@gmail.com> wrote:
> Nathaniel Gray wrote:
> >
> > Here's the output (on a PPC G4 1.25 GHz):
> >
> > ========
> > Latencies for 40000 iterations of function, method, closure, obj. closure:
> >  function:  0 WALL ( 0.00 usr + -0.00 sys =  0.00 CPU) @
> > 305343511.45/s (n=40000)
> >            (warning: too few iterations for a reliable count)
> >    method:  0 WALL ( 0.00 usr + -0.00 sys =  0.00 CPU) @
> > 27081922.82/s (n=40000)
> >            (warning: too few iterations for a reliable count)
> >   closure:  0 WALL ( 0.00 usr +  0.00 sys =  0.00 CPU) @
> > 30280090.84/s (n=40000)
> >            (warning: too few iterations for a reliable count)
> > obj. closure:  0 WALL ( 0.00 usr +  0.00 sys =  0.00 CPU) @
> > 26058631.92/s (n=40000)
> >            (warning: too few iterations for a reliable count)
> >                    Rate       method obj. closure      closure     function
> >      method  25974026/s           --          -5%         -16%         -90%
> > obj. closure  27210884/s           5%           --         -12%
> > -89%
> >     closure  31007752/s          19%          14%           --         -88%
> >    function 254777070/s         881%         836%         722%           --
> >
> > Interesting, but are they meaningful?  The warnings from Benchmark are
> > troubling, but I didn't have any immediate ideas on how to get rid of
> > them.  Any suggestions?
> >
> > Thanks,
> > -n8
> >
> > [1] http://ocaml-benchmark.sourceforge.net
> >
>
> well, running only 40,000 iterations is way too low because timing
> errors are going to get in the way of an accurate answer.  On my
> computer, I bumped the iterations up to max_int, and still the function
> call was still taking less than one CPU second of time (which I guess is
> the requirement for the warning to disappear).
>
> Here's my numbers from an Athlon XP-M 2000+ (1.53GHz), compiled with
> ocaml 3.09.3, cmd. line:
> $ ocamlfind ocamlopt -package "benchmark" -inline 0 unix.cmxa
> benchmark.cmxa  dispatch.ml
>
>
> Latencies for 1073741823 iterations of function, method, closure, obj.
> closure:
>  function:  0 WALL (-0.02 usr + -0.00 sys = -0.02 CPU)
>            (warning: too few iterations for a reliable count)
>    method: 15 WALL (11.34 usr +  0.49 sys = 11.83 CPU) @ 90764313.02/s
> (n=1073741823)
>   closure:  4 WALL ( 2.60 usr + -0.60 sys =  2.00 CPU) @ 536870911.50/s
> (n=1073741823)
> obj. closure:  8 WALL ( 4.31 usr +  0.03 sys =  4.34 CPU) @
> 247405950.00/s (n=1073741823)
>                       Rate     function       method obj. closure
> closure
>    function -5.36871e+10/s           --      -59250%      -21800%
> -10100%
>      method     90764313/s        -100%           --         -63%
>   -83%
> obj. closure    247405950/s        -100%         173%           --
>   -54%
>     closure    536870911/s        -101%         491%         117%
>     --
>
> Either function calls are just that stupidly efficient, or there's some
> optimization still going on. I'm guessing the second.
>
> E.
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
  2007-01-18  3:03   ` Jonathan Roewen
@ 2007-01-18 15:52   ` Remi Vanicat
  2007-01-18 22:33   ` Nathaniel Gray
  2007-01-31 17:03   ` Christophe TROESTLER
  3 siblings, 0 replies; 11+ messages in thread
From: Remi Vanicat @ 2007-01-18 15:52 UTC (permalink / raw)
  To: Caml List

2007/1/18, Edgar Friendly <thelema314@gmail.com>:
> Either function calls are just that stupidly efficient, or there's some
> optimization still going on. I'm guessing the second.

well, here, the application are termial application that are optimized.

One should probably better do something like :
let f x =
  x + 100
let call_f () = 1 + (f 1)

let o = object
  method f_o x = x + 100
end
let call_o () = 1 + (o#f_o 1)

.... to force a non terminal application.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  1:12 Benchmarking different dispatch types Nathaniel Gray
  2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
@ 2007-01-18 16:56 ` William D. Neumann
  2007-01-19  0:50 ` Jacques Garrigue
  2 siblings, 0 replies; 11+ messages in thread
From: William D. Neumann @ 2007-01-18 16:56 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Caml List

On Wed, 17 Jan 2007, Nathaniel Gray wrote:

> Interesting, but are they meaningful?  The warnings from Benchmark are
> troubling, but I didn't have any immediate ideas on how to get rid of
> them.  Any suggestions?

Well, looking at the benchmark module, it appears that message if one or 
more of the following occurs:
The number of iterations is too low (default minimum = 4)
The amount of cpu time taken is too low (default minimum = 0.4 seconds)
The wallclock time was less than 1 second and the number of iterations was 
less than 1000.

So either make the functions more intensive (which obscures the calling 
time and thus is bad here), do more calls, or supply a smaller value for 
min_cpu:

let _ =
   let results = latencyN 4000000 ~min_cpu:0.1
      [("function", call_f, ());
       ("method", call_o, ());
       ("closure", call_fc, ());
       ("obj. closure", call_foc, ())]
   in
   tabulate results;;

William D. Neumann

---

"There's just so many extra children, we could just feed the
children to these tigers.  We don't need them, we're not doing 
anything with them.

Tigers are noble and sleek; children are loud and messy."

         -- Neko Case

Life is unfair.  Kill yourself or get over it.
 	-- Black Box Recorder


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
  2007-01-18  3:03   ` Jonathan Roewen
  2007-01-18 15:52   ` Remi Vanicat
@ 2007-01-18 22:33   ` Nathaniel Gray
  2007-01-19  0:03     ` Robert Roessler
  2007-01-31 17:03   ` Christophe TROESTLER
  3 siblings, 1 reply; 11+ messages in thread
From: Nathaniel Gray @ 2007-01-18 22:33 UTC (permalink / raw)
  To: Edgar Friendly; +Cc: Caml List

On 1/17/07, Edgar Friendly <thelema314@gmail.com> wrote:
>
> well, running only 40,000 iterations is way too low because timing
> errors are going to get in the way of an accurate answer.

I forgot to mention that I also tried 400,000 and 4,000,000.  400K
produced similar results to 40K, while 4M produced some strange
results that didn't make sense.

> On my
> computer, I bumped the iterations up to max_int, and still the function
> call was still taking less than one CPU second of time (which I guess is
> the requirement for the warning to disappear).
>
> Here's my numbers from an Athlon XP-M 2000+ (1.53GHz), compiled with
> ocaml 3.09.3, cmd. line:
> $ ocamlfind ocamlopt -package "benchmark" -inline 0 unix.cmxa
> benchmark.cmxa  dispatch.ml
>
>
> Latencies for 1073741823 iterations of function, method, closure, obj.
> closure:
>   function:  0 WALL (-0.02 usr + -0.00 sys = -0.02 CPU)
>             (warning: too few iterations for a reliable count)
>     method: 15 WALL (11.34 usr +  0.49 sys = 11.83 CPU) @ 90764313.02/s
> (n=1073741823)
>    closure:  4 WALL ( 2.60 usr + -0.60 sys =  2.00 CPU) @ 536870911.50/s
> (n=1073741823)
> obj. closure:  8 WALL ( 4.31 usr +  0.03 sys =  4.34 CPU) @
> 247405950.00/s (n=1073741823)
>                        Rate     function       method obj. closure
> closure
>     function -5.36871e+10/s           --      -59250%      -21800%
> -10100%
>       method     90764313/s        -100%           --         -63%
>    -83%
> obj. closure    247405950/s        -100%         173%           --
>    -54%
>      closure    536870911/s        -101%         491%         117%
>      --
>
> Either function calls are just that stupidly efficient, or there's some
> optimization still going on. I'm guessing the second.

These results are clearly garbage, since the rate of function calls is
negative.  Or perhaps there's some time-travel going on...

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  3:03   ` Jonathan Roewen
@ 2007-01-18 23:57     ` Nathaniel Gray
  0 siblings, 0 replies; 11+ messages in thread
From: Nathaniel Gray @ 2007-01-18 23:57 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: Edgar Friendly, Caml List

On 1/17/07, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
> From what I understand, anything other than a function is bound to be
> fairly slow in comparison.

Sure, but I'm really interested in understanding the "penalty" for
using objects instead of, say, closures, and how much (if anything)
you get back by turning a method into a closure (as I did with the
obj. closure test).  It looks like there isn't much penalty, which
makes objects much more attractive to me for certain applications, and
there's not much gain to be had by "closurizing" method calls.

> For closures, you first have to build the closure from the
> environment, and then invoke it; and methods require some sort of
> lookup. Functions, on the other hand, are fairly simply: it's just a
> call to a known address (putting aside passing the actual arguments to
> the function/method call).

Well, for closures you don't need to build the closure at call-time,
that happens when it is created.

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18 22:33   ` Nathaniel Gray
@ 2007-01-19  0:03     ` Robert Roessler
  0 siblings, 0 replies; 11+ messages in thread
From: Robert Roessler @ 2007-01-19  0:03 UTC (permalink / raw)
  To: Caml-list

Nathaniel Gray wrote:
> On 1/17/07, Edgar Friendly <thelema314@gmail.com> wrote:
>>
>> well, running only 40,000 iterations is way too low because timing
>> errors are going to get in the way of an accurate answer.
> 
> I forgot to mention that I also tried 400,000 and 4,000,000.  400K
> produced similar results to 40K, while 4M produced some strange
> results that didn't make sense.
> ...
> These results are clearly garbage, since the rate of function calls is
> negative.  Or perhaps there's some time-travel going on...

Or maybe there is a slight issue with overflow of whatever numeric
representation is being used (I haven't examined the benchmarking
package in any detail)?

Robert Roessler
roessler@rftp.com
http://www.rftp.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  1:12 Benchmarking different dispatch types Nathaniel Gray
  2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
  2007-01-18 16:56 ` William D. Neumann
@ 2007-01-19  0:50 ` Jacques Garrigue
  2007-01-19  8:30   ` Nathaniel Gray
  2 siblings, 1 reply; 11+ messages in thread
From: Jacques Garrigue @ 2007-01-19  0:50 UTC (permalink / raw)
  To: n8gray; +Cc: caml-list

From: "Nathaniel Gray" <n8gray@gmail.com>
> As somebody trying to understand the performance of OCaml, I've often
> wondered about the performance of the different forms of function
> dispatch.  How do method calls compare to function calls?  How about
> closure calls?  So I tried using the Benchmark library[1] to do a
> quick test:
> 
> ========
> (* Test method dispatch vs. function dispatch vs. closure dispatch
>    Make sure to compile with -inline 0
>  *)
> 
> let f x =
>    x + 100
> let call_f () = f 1
> 
> let o = object
>    method f_o x = x + 100
> end
> let call_o () = o#f_o 1
> 
> let f_c () x = x + 100
> let f_c' = f_c ()
> let call_fc () = f_c' 1
> 
> let o_c = object
>    method f_oc () x = x + 100
> end
> let f_oc' = o_c#f_oc ()
> let call_foc () = f_oc' 1
[...]
>                     Rate       method obj. closure      closure     function
>       method  25974026/s           --          -5%         -16%         -90%
> obj. closure  27210884/s           5%           --         -12%         -89%
>      closure  31007752/s          19%          14%           --         -88%
>     function 254777070/s         881%         836%         722%           --

There are a few problems in your methodology.
One is that you are running your test only once inside a function.
So what you are measuring ends up being (at least) the cost a calling
a closure + the real cost of your test. Usually the wrapping function
should itself be a loop.
  let call_f () = for i = 1 to 1000 do ignore (f 1 + 1) done

Another problem is that with such micro-benchmarks, all kinds of
optimizations may skew results, either by the compiler or the CPU.
You disabled one with -inline 0, but there is noway to discard others
if you don't know what triggers them.

For instance, when calling a method, normally you would have to search for
it in the method list stored inside the object. This is done by a
binary search, with logarithmic cost in the number of methods in the
list. Since having to do it for every method call would badly impact
performance, each call point caches the offset in the list for the
last object called. If the last object was from the same class, then
no search is done. There are only a few extra memory reads, to verify
that indeed this is the right offset.
So if want to measure the cost in the worst situation, you have to
alternate calls (at the same point) between objects from different
classes, for which the offset is different.
In practice, hopefully this worst pattern doesn't occur too often, so
it is still safe to assume that method calls 

You should also look at the generated assembler (obtained with -S) to
verify that no strange optimization happens.

My own measurements on a Pentium M and PPC (using a slightly different
benchmark, using loops and several different methods and functions)
give (comparing to a direct function call):
                    Pentium M   PPC G4
Closure:            1.2x        3.2x
Method:             2.9x        5.6x
Unoptimized method: 6.9x        13x

I'm a bit surprised by the low cost of a closure, particularly on
pentium M, but this may be related to some CPU optimization.
Note that with inlining you get a more than 10x speedup.
This suggests that even in the best case method calls are actually
about twice as expensive as closure calls, and 5 times in a
particularly bad case.

Jacques Garrigue


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-19  0:50 ` Jacques Garrigue
@ 2007-01-19  8:30   ` Nathaniel Gray
  0 siblings, 0 replies; 11+ messages in thread
From: Nathaniel Gray @ 2007-01-19  8:30 UTC (permalink / raw)
  To: Jacques Garrigue; +Cc: caml-list

Jacques, thanks for the very useful reply!  A few more comments below...

On 1/18/07, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
> There are a few problems in your methodology.
> One is that you are running your test only once inside a function.
> So what you are measuring ends up being (at least) the cost a calling
> a closure + the real cost of your test. Usually the wrapping function
> should itself be a loop.
>   let call_f () = for i = 1 to 1000 do ignore (f 1 + 1) done

Right.  I realized this myself and made an attempt to mitigate the
problem, but probably not nearly enough.

> Another problem is that with such micro-benchmarks, all kinds of
> optimizations may skew results, either by the compiler or the CPU.
> You disabled one with -inline 0, but there is noway to discard others
> if you don't know what triggers them.
>
> For instance, when calling a method, normally you would have to search for
> it in the method list stored inside the object. This is done by a
> binary search, with logarithmic cost in the number of methods in the
> list. Since having to do it for every method call would badly impact
> performance, each call point caches the offset in the list for the
> last object called. If the last object was from the same class, then
> no search is done. There are only a few extra memory reads, to verify
> that indeed this is the right offset.

Ah, now this is the juicy stuff!

> So if want to measure the cost in the worst situation, you have to
> alternate calls (at the same point) between objects from different
> classes, for which the offset is different.
> In practice, hopefully this worst pattern doesn't occur too often, so
> it is still safe to assume that method calls
>
> You should also look at the generated assembler (obtained with -S) to
> verify that no strange optimization happens.

I did glance at it but haven't had time to look in any detail.

>
> My own measurements on a Pentium M and PPC (using a slightly different
> benchmark, using loops and several different methods and functions)
> give (comparing to a direct function call):
>                     Pentium M   PPC G4
> Closure:            1.2x        3.2x
> Method:             2.9x        5.6x
> Unoptimized method: 6.9x        13x
>
> I'm a bit surprised by the low cost of a closure, particularly on
> pentium M, but this may be related to some CPU optimization.
> Note that with inlining you get a more than 10x speedup.
> This suggests that even in the best case method calls are actually
> about twice as expensive as closure calls, and 5 times in a
> particularly bad case.

Perhaps you could share your benchmark code?  :-)

Thanks,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] Benchmarking different dispatch types
  2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
                     ` (2 preceding siblings ...)
  2007-01-18 22:33   ` Nathaniel Gray
@ 2007-01-31 17:03   ` Christophe TROESTLER
  3 siblings, 0 replies; 11+ messages in thread
From: Christophe TROESTLER @ 2007-01-31 17:03 UTC (permalink / raw)
  To: O'Caml Mailing List

[-- Attachment #1: Type: Text/Plain, Size: 2594 bytes --]

>             (warning: too few iterations for a reliable count)
>
> Interesting, but are they meaningful?  The warnings from Benchmark are
> troubling, but I didn't have any immediate ideas on how to get rid of
> them.

The warnings mean what they say.  The running time of the calls (as
opposed to the whole process: calls + repetition loop), is estimated
to be less than 0.1 second and is therefore too imprecise to produce
meaningful results.

On Wed, 17 Jan 2007, "Nathaniel Gray" <n8gray@gmail.com> wrote:
> 
>    let results = latencyN 40000
>       [("function", call_f, ());
>        ("method", call_o, ());
>        ("closure", call_fc, ());
>        ("obj. closure", call_foc, ())]

Knowing the right number of calls to have a meaningful measurment is
not always easy, espcially for fast running code like yours.
Actually, as Edgar Friendly pointed out, in your case, even max_int is
not enough...  In order to get rid of this you should:

- Use the throughputN function which allows you to set the running
  time of the function instead of the number of calls;

- Use the new version Benchmark module which now uses Int64.t to store
  the number of calls.  Moreover, with this new version,...

On Wed, 17 Jan 2007, Edgar Friendly <thelema314@gmail.com> wrote:
> 
>                        Rate
>     function -5.36871e+10/s
               ^
  ...this cannot happen anymore.

You are also advised to set the ~repeat flag to a value of 5-10 in
order to trigger some statistical tests that will put the relative
rates of the tested functions in between brackets if they are not
significantly different.

For your problem, your code (attached) gives the following results (on
a Intel 2Ghz dual core):

                    Rate               method obj. closure    closure   function
      method 156624689+-  738397/s         --         -60%       -68%       -84%
obj. closure 387882394+- 4665352/s       148%           --       -20%       -61%
     closure 486459254+- 1330609/s       211%          25%         --       -51%
    function 984016184+-24771072/s       528%         154%       102%         --

Since the (i,j) entry is rij = (ri / rj - 1) where ri is the rate of
line i and rj the rate of col j, if we want to compare the times ti =
1/ri, one gets tj = (1 + rij) ti.  Thus one has to look at the last
line to have the times relative to the function one:

t(closure)     = (1 + 102%) t(function) ~ 2x
t(obj.closure) = (1 + 154%) t(function) ~ 2.5x
t(method)      = (1 + 528%) t(function) ~ 6.3x

This is quite coherent with Jacques Garrigue results.

Hope this helps,
ChriS

[-- Attachment #2: gray.ml --]
[-- Type: Text/Plain, Size: 779 bytes --]

(* Test method dispatch vs. function dispatch vs. closure dispatch *)

let f x = x + 100
let call_f () = f 1

let o = object
   method f_o x = x + 100
end
let call_o () = o#f_o 1

let f_c () x = x + 100
let f_c' = f_c ()
let call_fc () = f_c' 1

let o_c = object
   method f_oc () x = x + 100
end
let f_oc' = o_c#f_oc ()
let call_foc () = f_oc' 1


open Benchmark

let () =
  let results = latencyN ~repeat:5 1100000000L
(*   let results = throughputN ~repeat:5 1 *)
    [("function", call_f, ());
     ("method", call_o, ());
     ("closure", call_fc, ());
     ("obj. closure", call_foc, ())]
  in
  tabulate results

(* ocamlopt -o gray.com -inline 0 -I benchmark unix.cmxa benchmark.cmxa gray.ml
   ocamlc -o gray.exe -inline 0 -I benchmark unix.cma benchmark.cma gray.ml
*)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-01-31 17:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-18  1:12 Benchmarking different dispatch types Nathaniel Gray
2007-01-18  2:17 ` [Caml-list] " Edgar Friendly
2007-01-18  3:03   ` Jonathan Roewen
2007-01-18 23:57     ` Nathaniel Gray
2007-01-18 15:52   ` Remi Vanicat
2007-01-18 22:33   ` Nathaniel Gray
2007-01-19  0:03     ` Robert Roessler
2007-01-31 17:03   ` Christophe TROESTLER
2007-01-18 16:56 ` William D. Neumann
2007-01-19  0:50 ` Jacques Garrigue
2007-01-19  8:30   ` Nathaniel Gray

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).