From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <n8gray@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: *
X-Spam-Status: No, score=1.2 required=5.0 tests=AWL,SPF_NEUTRAL 
	autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38])
	by yquem.inria.fr (Postfix) with ESMTP id 5777FBC0A
	for <caml-list@yquem.inria.fr>; Fri, 19 Jan 2007 09:30:36 +0100 (CET)
Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.171])
	by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l0J8UZ14010399
	for <caml-list@inria.fr>; Fri, 19 Jan 2007 09:30:36 +0100
Received: by ug-out-1314.google.com with SMTP id k3so372492ugf
        for <caml-list@inria.fr>; Fri, 19 Jan 2007 00:30:35 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=hZ5Paa1qBocQIkI+7FEq4kRbKml/a7DpJ2QHaBjO2HHM9ZxqlzzSnce41W3yeF2ceiszGzbuxwI3S2hGhmXQ8scZxoO25Kd4Ce1k8T5PfpFlRFMVhPbi2Dij7LW38Do6XeRKKyfbyUP2DbRvu0LxCZMosrUyiEvH/K5qzHEbkAc=
Received: by 10.78.160.2 with SMTP id i2mr1794425hue.1169195435373;
        Fri, 19 Jan 2007 00:30:35 -0800 (PST)
Received: by 10.78.198.14 with HTTP; Fri, 19 Jan 2007 00:30:35 -0800 (PST)
Message-ID: <aee06c9e0701190030k229ab33dr9361fe58458627c2@mail.gmail.com>
Date: Fri, 19 Jan 2007 00:30:35 -0800
From: "Nathaniel Gray" <n8gray@gmail.com>
To: "Jacques Garrigue" <garrigue@math.nagoya-u.ac.jp>
Subject: Re: [Caml-list] Benchmarking different dispatch types
Cc: caml-list@inria.fr
In-Reply-To: <20070119.095031.85416862.garrigue@math.nagoya-u.ac.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <aee06c9e0701171712k33761ed8r6dc30c46a9df6de7@mail.gmail.com>
	 <20070119.095031.85416862.garrigue@math.nagoya-u.ac.jp>
X-j-chkmail-Score: MSGID : 45B081AB.001 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1
X-Miltered: at discorde with ID 45B081AB.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; compiler:01 -inline:01 logarithmic:01 unoptimized:01 inlining:01 speedup:01 mitigate:98 1.2:98 3.2:98 wrote:01 caml-list:01 functions:01 binary:01 measurements:01 closure:01 

Jacques, thanks for the very useful reply!  A few more comments below...

On 1/18/07, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
> There are a few problems in your methodology.
> One is that you are running your test only once inside a function.
> So what you are measuring ends up being (at least) the cost a calling
> a closure + the real cost of your test. Usually the wrapping function
> should itself be a loop.
>   let call_f () = for i = 1 to 1000 do ignore (f 1 + 1) done

Right.  I realized this myself and made an attempt to mitigate the
problem, but probably not nearly enough.

> Another problem is that with such micro-benchmarks, all kinds of
> optimizations may skew results, either by the compiler or the CPU.
> You disabled one with -inline 0, but there is noway to discard others
> if you don't know what triggers them.
>
> For instance, when calling a method, normally you would have to search for
> it in the method list stored inside the object. This is done by a
> binary search, with logarithmic cost in the number of methods in the
> list. Since having to do it for every method call would badly impact
> performance, each call point caches the offset in the list for the
> last object called. If the last object was from the same class, then
> no search is done. There are only a few extra memory reads, to verify
> that indeed this is the right offset.

Ah, now this is the juicy stuff!

> So if want to measure the cost in the worst situation, you have to
> alternate calls (at the same point) between objects from different
> classes, for which the offset is different.
> In practice, hopefully this worst pattern doesn't occur too often, so
> it is still safe to assume that method calls
>
> You should also look at the generated assembler (obtained with -S) to
> verify that no strange optimization happens.

I did glance at it but haven't had time to look in any detail.

>
> My own measurements on a Pentium M and PPC (using a slightly different
> benchmark, using loops and several different methods and functions)
> give (comparing to a direct function call):
>                     Pentium M   PPC G4
> Closure:            1.2x        3.2x
> Method:             2.9x        5.6x
> Unoptimized method: 6.9x        13x
>
> I'm a bit surprised by the low cost of a closure, particularly on
> pentium M, but this may be related to some CPU optimization.
> Note that with inlining you get a more than 10x speedup.
> This suggests that even in the best case method calls are actually
> about twice as expensive as closure calls, and 5 times in a
> particularly bad case.

Perhaps you could share your benchmark code?  :-)

Thanks,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->