caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Brian Hurt <bhurt@spnz.org>
To: Xavier Leroy <xavier.leroy@inria.fr>
Cc: Ocaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Object-oriented access bottleneck
Date: Mon, 8 Dec 2003 15:37:17 -0600 (CST)	[thread overview]
Message-ID: <Pine.LNX.4.44.0312081514270.5009-101000@localhost.localdomain> (raw)
In-Reply-To: <20031208200228.A26466@pauillac.inria.fr>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4711 bytes --]

On Mon, 8 Dec 2003, Xavier Leroy wrote:

> > How much slower is a member function call compared to a non-member 
> > function call?  I was given the impression that it was signifigantly 
> > slower, but you're making it sound like it isn't.
> 
> In the non-member case, it depends very much whether the function
> being called is statically known or is computed at run-time.  In the
> first case, a call to a static address is generated.  In the latter
> case, a call to a function pointer is generated.
> 
> Here are some figures I collected circa 1998 on a PowerPC 603 processor:
> 
>    type of call              cost of call
> 
> inlined function               0 cycles
> call to known function         4 cycles
> call to unknown function      11 cycles
> method invocation             18 cycles

Attached is some microbenchmarking code I just whomped together.  It gives 
both C and Ocaml timings on tight loops of functions calls.  The results:

$ ./fcall2
time0: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop
time1: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop
time2: 1000000000 loops in 5.700000 seconds = 5.700000 nanoseconds/loop
time3: 1000000000 loops in 10.110000 seconds = 10.110000 nanoseconds/loop
$ ./fcall
time0(): 1000000000 loops in 2.570000 seconds = 2.570000 nanoseconds/loops
time1(): 1000000000 loops in 5.170000 seconds = 5.170000 nanoseconds/loops
time2(): 1000000000 loops in 5.860000 seconds = 5.860000 nanoseconds/loops
$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--host=i386-redhat-linux
Thread model: posix
gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
$ ocamlopt -v
The Objective Caml native-code compiler, version 3.07
Standard library directory: /usr/local/lib/ocaml
$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) XP 2200+
stepping        : 1
cpu MHz         : 1800.109
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3591.37
$

My conclusions:

- Despite my best efforts, ocaml inlined (and removed) the call to foo.  
In general this is good, but it did defeat my efforts to time how long a 
direct function call took.  For some reason it didn't also eliminate the 
for loops.  If I had to guess, Ocaml's call to a known function is about 
the same speed as C's.

- Ocaml loops faster than C.  Go figure.

- Ocaml's call to an unknown function (passed in via argument) is a little 
slower than C's call via function pointer, by maybe 2 clocks.  Not 
signifigant.

- Ocaml's call to a member function is slower than a call to an unknown 
function by a factor of about 2.

- All of these times are small enough to be insignifigant.  By comparison, 
on this CPU a mispredicted branch costs about 8ns (~14 clocks).  Outside 
of a microbenchmark, they are unlikely to be noticable, as other factors 
are likely to overwhealm the minor costs shown here (garbage collection, 
cache performance, etc.).

- 4 loads at 2 clocks apeice (they hit cache) is about 4.4ns- the vast
majority of the difference between a call to an unknown function and a 
call to a member function.  As such, signifigant improvements are highly 
unlikely.  Even my design for a hash table uses three loads, and would 
only be 1.1ns or so faster (at best).

- Ocaml is surprisingly fast.

- This thread devolved into pendancy several posts ago :-).

> 
> I'm pretty sure that more modern processors have a bigger gap between
> the "call to known function" and "call to unknown function" cases.

Actually, no.  The 603 did a little bit of speculation, but not much.  
Modern CPUs speculate aggressively.  The biggest problem with call via 
pointer is the load to use penalty- in a direct call, the address it's 
jumping to is right there.  With a call through a pointer, it's got to 
load the pointer before it knows where to jump.  But even then, some CPUs 
do data speculation.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

[-- Attachment #2: Type: APPLICATION/x-gzip, Size: 1002 bytes --]

  reply	other threads:[~2003-12-08 20:36 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-07  2:39 Nuutti Kotivuori
2003-12-07  2:59 ` Nicolas Cannasse
2003-12-07 11:22   ` Benjamin Geer
2003-12-07 14:12     ` Nicolas Cannasse
2003-12-07 18:04   ` Nuutti Kotivuori
2003-12-07 10:27 ` Jacques Garrigue
2003-12-07 19:46   ` Nuutti Kotivuori
2003-12-08  1:07     ` Jacques Garrigue
2003-12-08 15:08       ` Nuutti Kotivuori
2003-12-08 15:42         ` Richard Jones
2003-12-09  0:26           ` Nicolas Cannasse
2003-12-09 12:10             ` Nuutti Kotivuori
2003-12-09 13:17               ` Olivier Andrieu
2003-12-09 13:53                 ` Nuutti Kotivuori
2003-12-08 17:51       ` Brian Hurt
2003-12-08 18:19         ` brogoff
2003-12-08 20:09           ` Brian Hurt
2003-12-08 19:02         ` Xavier Leroy
2003-12-08 21:37           ` Brian Hurt [this message]
2003-12-08 21:06             ` Nuutti Kotivuori
2003-12-08 22:30             ` malc
2003-12-07 18:23 ` Brian Hurt
2003-12-07 18:14   ` Nuutti Kotivuori
2003-12-07 19:30     ` Brian Hurt
2003-12-07 23:50       ` Abdulaziz Ghuloum
2003-12-08 17:29         ` Brian Hurt
2003-12-08 18:48           ` Nuutti Kotivuori
2003-12-08 10:17       ` Nuutti Kotivuori
2003-12-08 19:51       ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0312081514270.5009-101000@localhost.localdomain \
    --to=bhurt@spnz.org \
    --cc=caml-list@inria.fr \
    --cc=xavier.leroy@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).