From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA31981; Mon, 8 Dec 2003 21:36:46 +0100 (MET)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA31181 for <caml-list@pauillac.inria.fr>; Mon, 8 Dec 2003 21:36:45 +0100 (MET)
Received: from herd.plethora.net (herd.plethora.net [205.166.146.1])
	by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hB8Kag121292;
	Mon, 8 Dec 2003 21:36:42 +0100 (MET)
Received: from bhurt.plethora.net (bhurt.plethora.net [205.166.146.49])
	by herd.plethora.net (8.11.6/8.10.1) with ESMTP id hB8KaaC02775;
	Mon, 8 Dec 2003 14:36:38 -0600 (CST)
Date: Mon, 8 Dec 2003 15:37:17 -0600 (CST)
From: Brian Hurt <bhurt@spnz.org>
X-X-Sender: bhurt@localhost.localdomain
To: Xavier Leroy <xavier.leroy@inria.fr>
cc: Ocaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Object-oriented access bottleneck
In-Reply-To: <20031208200228.A26466@pauillac.inria.fr>
Message-ID: <Pine.LNX.4.44.0312081514270.5009-101000@localhost.localdomain>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-425715405-1070919437=:5009"
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; caml-list:01 bottleneck:01 mime-aware:01 docserver:01 slower:01 slower:01 statically:01 run-time:01 powerpc:01 603:99 inlined:01 invocation:01 timings:01 gcc:01 usr:01 
X-Attachments: type="APPLICATION/x-gzip" name="fcall.tar.gz" name="fcall.tar.gz" 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

--8323328-425715405-1070919437=:5009
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Mon, 8 Dec 2003, Xavier Leroy wrote:

> > How much slower is a member function call compared to a non-member 
> > function call?  I was given the impression that it was signifigantly 
> > slower, but you're making it sound like it isn't.
> 
> In the non-member case, it depends very much whether the function
> being called is statically known or is computed at run-time.  In the
> first case, a call to a static address is generated.  In the latter
> case, a call to a function pointer is generated.
> 
> Here are some figures I collected circa 1998 on a PowerPC 603 processor:
> 
>    type of call              cost of call
> 
> inlined function               0 cycles
> call to known function         4 cycles
> call to unknown function      11 cycles
> method invocation             18 cycles

Attached is some microbenchmarking code I just whomped together.  It gives 
both C and Ocaml timings on tight loops of functions calls.  The results:

$ ./fcall2
time0: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop
time1: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop
time2: 1000000000 loops in 5.700000 seconds = 5.700000 nanoseconds/loop
time3: 1000000000 loops in 10.110000 seconds = 10.110000 nanoseconds/loop
$ ./fcall
time0(): 1000000000 loops in 2.570000 seconds = 2.570000 nanoseconds/loops
time1(): 1000000000 loops in 5.170000 seconds = 5.170000 nanoseconds/loops
time2(): 1000000000 loops in 5.860000 seconds = 5.860000 nanoseconds/loops
$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--host=i386-redhat-linux
Thread model: posix
gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
$ ocamlopt -v
The Objective Caml native-code compiler, version 3.07
Standard library directory: /usr/local/lib/ocaml
$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2200+
stepping        : 1
cpu MHz         : 1800.109
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3591.37
$

My conclusions:

- Despite my best efforts, ocaml inlined (and removed) the call to foo.  
In general this is good, but it did defeat my efforts to time how long a 
direct function call took.  For some reason it didn't also eliminate the 
for loops.  If I had to guess, Ocaml's call to a known function is about 
the same speed as C's.

- Ocaml loops faster than C.  Go figure.

- Ocaml's call to an unknown function (passed in via argument) is a little 
slower than C's call via function pointer, by maybe 2 clocks.  Not 
signifigant.

- Ocaml's call to a member function is slower than a call to an unknown 
function by a factor of about 2.

- All of these times are small enough to be insignifigant.  By comparison, 
on this CPU a mispredicted branch costs about 8ns (~14 clocks).  Outside 
of a microbenchmark, they are unlikely to be noticable, as other factors 
are likely to overwhealm the minor costs shown here (garbage collection, 
cache performance, etc.).

- 4 loads at 2 clocks apeice (they hit cache) is about 4.4ns- the vast
majority of the difference between a call to an unknown function and a 
call to a member function.  As such, signifigant improvements are highly 
unlikely.  Even my design for a hash table uses three loads, and would 
only be 1.1ns or so faster (at best).

- Ocaml is surprisingly fast.

- This thread devolved into pendancy several posts ago :-).

> 
> I'm pretty sure that more modern processors have a bigger gap between
> the "call to known function" and "call to unknown function" cases.

Actually, no.  The 603 did a little bit of speculation, but not much.  
Modern CPUs speculate aggressively.  The biggest problem with call via 
pointer is the load to use penalty- in a direct call, the address it's 
jumping to is right there.  With a call through a pointer, it's got to 
load the pointer before it knows where to jump.  But even then, some CPUs 
do data speculation.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

--8323328-425715405-1070919437=:5009
Content-Type: APPLICATION/x-gzip; name="fcall.tar.gz"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.44.0312081537170.5009@localhost.localdomain>
Content-Description: 
Content-Disposition: attachment; filename="fcall.tar.gz"

H4sICEfu1D8CA2ZjYWxsLnRhcgDtmltPo0AUx/taPsWJxl3QijMDLYmoL0Zf
1kSzvm7S0ALKLjJNAXc3G7/7zgVo66X1wVKN5/dCmcsZPO3/XFrjcZCmNLDH
nfVBKCGDgdshAu/RVdL3SId43qBPB2Tg9MV6OqBOB0inBcq8CKYAndFtOS2W
rFs1/0HZTrJxWoYRbKWcT3L7dsswoj9FNM3gnichxJyb8oXlG4YaKJK7iOgh
+GeA4J6nQZGkEZRZntxkUQgpz24g8dVszKdgJnAMxIcEjuDi8vLq2oe9vaTe
/2A8zNmmb2pbr+Km5T9zFFNHgbkbl9lwYumD3/RkaXfh7HrGeC/vf6z0v1b5
r9I/o8Sp9S+U70j9uy5F/beq/yOpCPv2xJiN5EWYcDlkrIoSc0HBfzJBX5po
9FdLr44xTdARWhKiSbIC7oIkg4XQME75+NewAPn+FT1x4RMttJCXI6HbPBrz
LMx9LTq1SuhV7aolqR+7upEGHi+obIhh09RmLTDVwn1t0bIOZjOnF5en366H
V2ffh9dnp5WJyVQ8fmxuVWcdwk5agvIiiL9oJ547Q9xkQcargQO16Ee21Wui
hsBcCEeWDjy92kgPaJ1UiU1gtx6fe0i1Qbl6uV9oi36h78ovS93CzC/i07nE
NevwD3tP/pHWplFRCiUTX8qzg3xcVP5ngX2Xbq7+d2m/zv+u68r6nxGC9X8r
GGlUVGHleC4IgC8CoZxSaQtMC46Nrix6Zc1LoeDVnpAb3a5pGd2QZ9H8Hrpi
j1w3FOPn6vM3skVQlTtkkn/GGoN4mbH42U0O8GWb+HbIk2Jh56fV/2ij+ifM
rfXPGGGy/hc3qP/W9F+JT1Qr1ieVwSfP/2uV/8r+X9Dkf0oG6vs/h6H+W9H/
OA3yHEbCB8fARz+jsei1o+KWh1ClRxUWIMrCpiYYyrQqX9Rd0vXf3JYZV2dw
o3teFZVNf12tVt3S48VqqulldJNkV6bl/JVqhGzdD4Huo0U3FL6+GRK9kNGV
bUv9XHpnvcdc7H3sZuLABr1PNVBxyoNiyOOh/DpiwZLqGl/tELoGh9CP7BA2
XwS+uWvYR3aNA2YW/ZbqfPvPjPMeHSOqcczJm+n/k43V/4zROv87fdeV/T91
B5j/W8n/90Gqg8ChiAKFoe6rHFtmogDYP1HXZpy+MC7irDk/YT1ZIQLOkS4q
Zia+Bj7YNpw0i1H+m+r/N6d/wrxZ/089rX8P9d8GUpyy/X+kahTGZ+v/1yf/
1fonzf//OMRVv/97Hvb/CIIgCIIgCIIgCIIgCIIgCIIgCIIgCIIgCIIgCIIg
CPKE/zGc3wkAUAAA
--8323328-425715405-1070919437=:5009--

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners