From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 328A4BC0A for ; Fri, 19 Jan 2007 01:50:39 +0100 (CET) Received: from kurims.kurims.kyoto-u.ac.jp (kurims.kurims.kyoto-u.ac.jp [130.54.16.1]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l0J0oacw024157 for ; Fri, 19 Jan 2007 01:50:38 +0100 Received: from localhost (orion [130.54.16.5]) by kurims.kurims.kyoto-u.ac.jp (8.13.7/8.13.7) with ESMTP id l0J0oXsS029117; Fri, 19 Jan 2007 09:50:33 +0900 (JST) Date: Fri, 19 Jan 2007 09:50:31 +0900 (JST) Message-Id: <20070119.095031.85416862.garrigue@math.nagoya-u.ac.jp> To: n8gray@gmail.com Cc: caml-list@inria.fr Subject: Re: [Caml-list] Benchmarking different dispatch types From: Jacques Garrigue In-Reply-To: References: X-Mailer: Mew version 4.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 45B015DC.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 -inline:01 foc:01 compiler:01 -inline:01 logarithmic:01 unoptimized:01 inlining:01 speedup:01 16%:98 12%:98 89%:98 19%:98 1.2:98 3.2:98 From: "Nathaniel Gray" > As somebody trying to understand the performance of OCaml, I've often > wondered about the performance of the different forms of function > dispatch. How do method calls compare to function calls? How about > closure calls? So I tried using the Benchmark library[1] to do a > quick test: > > ======== > (* Test method dispatch vs. function dispatch vs. closure dispatch > Make sure to compile with -inline 0 > *) > > let f x = > x + 100 > let call_f () = f 1 > > let o = object > method f_o x = x + 100 > end > let call_o () = o#f_o 1 > > let f_c () x = x + 100 > let f_c' = f_c () > let call_fc () = f_c' 1 > > let o_c = object > method f_oc () x = x + 100 > end > let f_oc' = o_c#f_oc () > let call_foc () = f_oc' 1 [...] > Rate method obj. closure closure function > method 25974026/s -- -5% -16% -90% > obj. closure 27210884/s 5% -- -12% -89% > closure 31007752/s 19% 14% -- -88% > function 254777070/s 881% 836% 722% -- There are a few problems in your methodology. One is that you are running your test only once inside a function. So what you are measuring ends up being (at least) the cost a calling a closure + the real cost of your test. Usually the wrapping function should itself be a loop. let call_f () = for i = 1 to 1000 do ignore (f 1 + 1) done Another problem is that with such micro-benchmarks, all kinds of optimizations may skew results, either by the compiler or the CPU. You disabled one with -inline 0, but there is noway to discard others if you don't know what triggers them. For instance, when calling a method, normally you would have to search for it in the method list stored inside the object. This is done by a binary search, with logarithmic cost in the number of methods in the list. Since having to do it for every method call would badly impact performance, each call point caches the offset in the list for the last object called. If the last object was from the same class, then no search is done. There are only a few extra memory reads, to verify that indeed this is the right offset. So if want to measure the cost in the worst situation, you have to alternate calls (at the same point) between objects from different classes, for which the offset is different. In practice, hopefully this worst pattern doesn't occur too often, so it is still safe to assume that method calls You should also look at the generated assembler (obtained with -S) to verify that no strange optimization happens. My own measurements on a Pentium M and PPC (using a slightly different benchmark, using loops and several different methods and functions) give (comparing to a direct function call): Pentium M PPC G4 Closure: 1.2x 3.2x Method: 2.9x 5.6x Unoptimized method: 6.9x 13x I'm a bit surprised by the low cost of a closure, particularly on pentium M, but this may be related to some CPU optimization. Note that with inlining you get a more than 10x speedup. This suggests that even in the best case method calls are actually about twice as expensive as closure calls, and 5 times in a particularly bad case. Jacques Garrigue