From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA31981; Mon, 8 Dec 2003 21:36:46 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA31181 for ; Mon, 8 Dec 2003 21:36:45 +0100 (MET) Received: from herd.plethora.net (herd.plethora.net [205.166.146.1]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hB8Kag121292; Mon, 8 Dec 2003 21:36:42 +0100 (MET) Received: from bhurt.plethora.net (bhurt.plethora.net [205.166.146.49]) by herd.plethora.net (8.11.6/8.10.1) with ESMTP id hB8KaaC02775; Mon, 8 Dec 2003 14:36:38 -0600 (CST) Date: Mon, 8 Dec 2003 15:37:17 -0600 (CST) From: Brian Hurt X-X-Sender: bhurt@localhost.localdomain To: Xavier Leroy cc: Ocaml Mailing List Subject: Re: [Caml-list] Object-oriented access bottleneck In-Reply-To: <20031208200228.A26466@pauillac.inria.fr> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-425715405-1070919437=:5009" X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 bottleneck:01 mime-aware:01 docserver:01 slower:01 slower:01 statically:01 run-time:01 powerpc:01 603:99 inlined:01 invocation:01 timings:01 gcc:01 usr:01 X-Attachments: type="APPLICATION/x-gzip" name="fcall.tar.gz" name="fcall.tar.gz" Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --8323328-425715405-1070919437=:5009 Content-Type: TEXT/PLAIN; charset=US-ASCII On Mon, 8 Dec 2003, Xavier Leroy wrote: > > How much slower is a member function call compared to a non-member > > function call? I was given the impression that it was signifigantly > > slower, but you're making it sound like it isn't. > > In the non-member case, it depends very much whether the function > being called is statically known or is computed at run-time. In the > first case, a call to a static address is generated. In the latter > case, a call to a function pointer is generated. > > Here are some figures I collected circa 1998 on a PowerPC 603 processor: > > type of call cost of call > > inlined function 0 cycles > call to known function 4 cycles > call to unknown function 11 cycles > method invocation 18 cycles Attached is some microbenchmarking code I just whomped together. It gives both C and Ocaml timings on tight loops of functions calls. The results: $ ./fcall2 time0: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop time1: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop time2: 1000000000 loops in 5.700000 seconds = 5.700000 nanoseconds/loop time3: 1000000000 loops in 10.110000 seconds = 10.110000 nanoseconds/loop $ ./fcall time0(): 1000000000 loops in 2.570000 seconds = 2.570000 nanoseconds/loops time1(): 1000000000 loops in 5.170000 seconds = 5.170000 nanoseconds/loops time2(): 1000000000 loops in 5.860000 seconds = 5.860000 nanoseconds/loops $ gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --host=i386-redhat-linux Thread model: posix gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) $ ocamlopt -v The Objective Caml native-code compiler, version 3.07 Standard library directory: /usr/local/lib/ocaml $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Athlon(tm) XP 2200+ stepping : 1 cpu MHz : 1800.109 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow bogomips : 3591.37 $ My conclusions: - Despite my best efforts, ocaml inlined (and removed) the call to foo. In general this is good, but it did defeat my efforts to time how long a direct function call took. For some reason it didn't also eliminate the for loops. If I had to guess, Ocaml's call to a known function is about the same speed as C's. - Ocaml loops faster than C. Go figure. - Ocaml's call to an unknown function (passed in via argument) is a little slower than C's call via function pointer, by maybe 2 clocks. Not signifigant. - Ocaml's call to a member function is slower than a call to an unknown function by a factor of about 2. - All of these times are small enough to be insignifigant. By comparison, on this CPU a mispredicted branch costs about 8ns (~14 clocks). Outside of a microbenchmark, they are unlikely to be noticable, as other factors are likely to overwhealm the minor costs shown here (garbage collection, cache performance, etc.). - 4 loads at 2 clocks apeice (they hit cache) is about 4.4ns- the vast majority of the difference between a call to an unknown function and a call to a member function. As such, signifigant improvements are highly unlikely. Even my design for a hash table uses three loads, and would only be 1.1ns or so faster (at best). - Ocaml is surprisingly fast. - This thread devolved into pendancy several posts ago :-). > > I'm pretty sure that more modern processors have a bigger gap between > the "call to known function" and "call to unknown function" cases. Actually, no. The 603 did a little bit of speculation, but not much. Modern CPUs speculate aggressively. The biggest problem with call via pointer is the load to use penalty- in a direct call, the address it's jumping to is right there. With a call through a pointer, it's got to load the pointer before it knows where to jump. But even then, some CPUs do data speculation. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian --8323328-425715405-1070919437=:5009 Content-Type: APPLICATION/x-gzip; name="fcall.tar.gz" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="fcall.tar.gz" H4sICEfu1D8CA2ZjYWxsLnRhcgDtmltPo0AUx/taPsWJxl3QijMDLYmoL0Zf 1kSzvm7S0ALKLjJNAXc3G7/7zgVo66X1wVKN5/dCmcsZPO3/XFrjcZCmNLDH nfVBKCGDgdshAu/RVdL3SId43qBPB2Tg9MV6OqBOB0inBcq8CKYAndFtOS2W rFs1/0HZTrJxWoYRbKWcT3L7dsswoj9FNM3gnichxJyb8oXlG4YaKJK7iOgh +GeA4J6nQZGkEZRZntxkUQgpz24g8dVszKdgJnAMxIcEjuDi8vLq2oe9vaTe /2A8zNmmb2pbr+Km5T9zFFNHgbkbl9lwYumD3/RkaXfh7HrGeC/vf6z0v1b5 r9I/o8Sp9S+U70j9uy5F/beq/yOpCPv2xJiN5EWYcDlkrIoSc0HBfzJBX5po 9FdLr44xTdARWhKiSbIC7oIkg4XQME75+NewAPn+FT1x4RMttJCXI6HbPBrz LMx9LTq1SuhV7aolqR+7upEGHi+obIhh09RmLTDVwn1t0bIOZjOnF5en366H V2ffh9dnp5WJyVQ8fmxuVWcdwk5agvIiiL9oJ547Q9xkQcargQO16Ee21Wui hsBcCEeWDjy92kgPaJ1UiU1gtx6fe0i1Qbl6uV9oi36h78ovS93CzC/i07nE NevwD3tP/pHWplFRCiUTX8qzg3xcVP5ngX2Xbq7+d2m/zv+u68r6nxGC9X8r GGlUVGHleC4IgC8CoZxSaQtMC46Nrix6Zc1LoeDVnpAb3a5pGd2QZ9H8Hrpi j1w3FOPn6vM3skVQlTtkkn/GGoN4mbH42U0O8GWb+HbIk2Jh56fV/2ij+ifM rfXPGGGy/hc3qP/W9F+JT1Qr1ieVwSfP/2uV/8r+X9Dkf0oG6vs/h6H+W9H/ OA3yHEbCB8fARz+jsei1o+KWh1ClRxUWIMrCpiYYyrQqX9Rd0vXf3JYZV2dw o3teFZVNf12tVt3S48VqqulldJNkV6bl/JVqhGzdD4Huo0U3FL6+GRK9kNGV bUv9XHpnvcdc7H3sZuLABr1PNVBxyoNiyOOh/DpiwZLqGl/tELoGh9CP7BA2 XwS+uWvYR3aNA2YW/ZbqfPvPjPMeHSOqcczJm+n/k43V/4zROv87fdeV/T91 B5j/W8n/90Gqg8ChiAKFoe6rHFtmogDYP1HXZpy+MC7irDk/YT1ZIQLOkS4q Zia+Bj7YNpw0i1H+m+r/N6d/wrxZ/089rX8P9d8GUpyy/X+kahTGZ+v/1yf/ 1fonzf//OMRVv/97Hvb/CIIgCIIgCIIgCIIgCIIgCIIgCIIgCIIgCIIgCIIg CPKE/zGc3wkAUAAA --8323328-425715405-1070919437=:5009-- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners