On Mon, 8 Dec 2003, Xavier Leroy wrote: > > How much slower is a member function call compared to a non-member > > function call? I was given the impression that it was signifigantly > > slower, but you're making it sound like it isn't. > > In the non-member case, it depends very much whether the function > being called is statically known or is computed at run-time. In the > first case, a call to a static address is generated. In the latter > case, a call to a function pointer is generated. > > Here are some figures I collected circa 1998 on a PowerPC 603 processor: > > type of call cost of call > > inlined function 0 cycles > call to known function 4 cycles > call to unknown function 11 cycles > method invocation 18 cycles Attached is some microbenchmarking code I just whomped together. It gives both C and Ocaml timings on tight loops of functions calls. The results: $ ./fcall2 time0: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop time1: 1000000000 loops in 1.680000 seconds = 1.680000 nanoseconds/loop time2: 1000000000 loops in 5.700000 seconds = 5.700000 nanoseconds/loop time3: 1000000000 loops in 10.110000 seconds = 10.110000 nanoseconds/loop $ ./fcall time0(): 1000000000 loops in 2.570000 seconds = 2.570000 nanoseconds/loops time1(): 1000000000 loops in 5.170000 seconds = 5.170000 nanoseconds/loops time2(): 1000000000 loops in 5.860000 seconds = 5.860000 nanoseconds/loops $ gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --host=i386-redhat-linux Thread model: posix gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) $ ocamlopt -v The Objective Caml native-code compiler, version 3.07 Standard library directory: /usr/local/lib/ocaml $ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Athlon(tm) XP 2200+ stepping : 1 cpu MHz : 1800.109 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow bogomips : 3591.37 $ My conclusions: - Despite my best efforts, ocaml inlined (and removed) the call to foo. In general this is good, but it did defeat my efforts to time how long a direct function call took. For some reason it didn't also eliminate the for loops. If I had to guess, Ocaml's call to a known function is about the same speed as C's. - Ocaml loops faster than C. Go figure. - Ocaml's call to an unknown function (passed in via argument) is a little slower than C's call via function pointer, by maybe 2 clocks. Not signifigant. - Ocaml's call to a member function is slower than a call to an unknown function by a factor of about 2. - All of these times are small enough to be insignifigant. By comparison, on this CPU a mispredicted branch costs about 8ns (~14 clocks). Outside of a microbenchmark, they are unlikely to be noticable, as other factors are likely to overwhealm the minor costs shown here (garbage collection, cache performance, etc.). - 4 loads at 2 clocks apeice (they hit cache) is about 4.4ns- the vast majority of the difference between a call to an unknown function and a call to a member function. As such, signifigant improvements are highly unlikely. Even my design for a hash table uses three loads, and would only be 1.1ns or so faster (at best). - Ocaml is surprisingly fast. - This thread devolved into pendancy several posts ago :-). > > I'm pretty sure that more modern processors have a bigger gap between > the "call to known function" and "call to unknown function" cases. Actually, no. The 603 did a little bit of speculation, but not much. Modern CPUs speculate aggressively. The biggest problem with call via pointer is the load to use penalty- in a direct call, the address it's jumping to is right there. With a call through a pointer, it's got to load the pointer before it knows where to jump. But even then, some CPUs do data speculation. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian