From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.2 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 5777FBC0A for ; Fri, 19 Jan 2007 09:30:36 +0100 (CET) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.171]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l0J8UZ14010399 for ; Fri, 19 Jan 2007 09:30:36 +0100 Received: by ug-out-1314.google.com with SMTP id k3so372492ugf for ; Fri, 19 Jan 2007 00:30:35 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=hZ5Paa1qBocQIkI+7FEq4kRbKml/a7DpJ2QHaBjO2HHM9ZxqlzzSnce41W3yeF2ceiszGzbuxwI3S2hGhmXQ8scZxoO25Kd4Ce1k8T5PfpFlRFMVhPbi2Dij7LW38Do6XeRKKyfbyUP2DbRvu0LxCZMosrUyiEvH/K5qzHEbkAc= Received: by 10.78.160.2 with SMTP id i2mr1794425hue.1169195435373; Fri, 19 Jan 2007 00:30:35 -0800 (PST) Received: by 10.78.198.14 with HTTP; Fri, 19 Jan 2007 00:30:35 -0800 (PST) Message-ID: Date: Fri, 19 Jan 2007 00:30:35 -0800 From: "Nathaniel Gray" To: "Jacques Garrigue" Subject: Re: [Caml-list] Benchmarking different dispatch types Cc: caml-list@inria.fr In-Reply-To: <20070119.095031.85416862.garrigue@math.nagoya-u.ac.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070119.095031.85416862.garrigue@math.nagoya-u.ac.jp> X-j-chkmail-Score: MSGID : 45B081AB.001 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1 X-Miltered: at discorde with ID 45B081AB.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; compiler:01 -inline:01 logarithmic:01 unoptimized:01 inlining:01 speedup:01 mitigate:98 1.2:98 3.2:98 wrote:01 caml-list:01 functions:01 binary:01 measurements:01 closure:01 Jacques, thanks for the very useful reply! A few more comments below... On 1/18/07, Jacques Garrigue wrote: > There are a few problems in your methodology. > One is that you are running your test only once inside a function. > So what you are measuring ends up being (at least) the cost a calling > a closure + the real cost of your test. Usually the wrapping function > should itself be a loop. > let call_f () = for i = 1 to 1000 do ignore (f 1 + 1) done Right. I realized this myself and made an attempt to mitigate the problem, but probably not nearly enough. > Another problem is that with such micro-benchmarks, all kinds of > optimizations may skew results, either by the compiler or the CPU. > You disabled one with -inline 0, but there is noway to discard others > if you don't know what triggers them. > > For instance, when calling a method, normally you would have to search for > it in the method list stored inside the object. This is done by a > binary search, with logarithmic cost in the number of methods in the > list. Since having to do it for every method call would badly impact > performance, each call point caches the offset in the list for the > last object called. If the last object was from the same class, then > no search is done. There are only a few extra memory reads, to verify > that indeed this is the right offset. Ah, now this is the juicy stuff! > So if want to measure the cost in the worst situation, you have to > alternate calls (at the same point) between objects from different > classes, for which the offset is different. > In practice, hopefully this worst pattern doesn't occur too often, so > it is still safe to assume that method calls > > You should also look at the generated assembler (obtained with -S) to > verify that no strange optimization happens. I did glance at it but haven't had time to look in any detail. > > My own measurements on a Pentium M and PPC (using a slightly different > benchmark, using loops and several different methods and functions) > give (comparing to a direct function call): > Pentium M PPC G4 > Closure: 1.2x 3.2x > Method: 2.9x 5.6x > Unoptimized method: 6.9x 13x > > I'm a bit surprised by the low cost of a closure, particularly on > pentium M, but this may be related to some CPU optimization. > Note that with inlining you get a more than 10x speedup. > This suggests that even in the best case method calls are actually > about twice as expensive as closure calls, and 5 times in a > particularly bad case. Perhaps you could share your benchmark code? :-) Thanks, -n8 -- >>>-- Nathaniel Gray -- Caltech Computer Science ------> >>>-- Mojave Project -- http://mojave.cs.caltech.edu -->