From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id AAA12957; Mon, 8 Dec 2003 00:50:19 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id AAA12994 for ; Mon, 8 Dec 2003 00:50:18 +0100 (MET) Received: from julesburg.uits.indiana.edu (julesburg.uits.indiana.edu [129.79.1.75]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hB7NoH110749 for ; Mon, 8 Dec 2003 00:50:17 +0100 (MET) Received: from stjoseph.uits.indiana.edu (stjoseph.uits.indiana.edu [129.79.1.78]) by julesburg.uits.indiana.edu (8.12.10/8.12.10/IUPO) with ESMTP id hB7NoDHE027639 for ; Sun, 7 Dec 2003 18:50:13 -0500 (EST) Received: from cs.indiana.edu (dial-123-125.dial.indiana.edu [156.56.123.125]) by stjoseph.uits.indiana.edu (8.12.10/8.12.10/IUPO) with ESMTP id hB7NoEun023696 for ; Sun, 7 Dec 2003 18:50:14 -0500 (EST) Message-ID: <3FD3BCB2.3090506@cs.indiana.edu> Date: Sun, 07 Dec 2003 18:50:10 -0500 From: Abdulaziz Ghuloum User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6a) Gecko/20031002 X-Accept-Language: en-us, en MIME-Version: 1.0 To: caml-list@inria.fr Subject: Re: [Caml-list] Object-oriented access bottleneck References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 bottleneck:01 inlining:01 benchmarked:01 slower:01 misses:01 inlining:01 misses:01 todays:99 avoiding:01 inlined:01 ,,,:99 compiler:01 penalty:02 node:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Brian Hurt wrote: >I actually question the value of inlining as a performance improvement, >unless it leads to other signifigant optimizations. Function calls simply >aren't that expensive anymore, on today's OOO super-scalar >speculative-execution CPUs. A direct call, i.e. one not through a >function pointer, I benchmarked out at about 1.5 clocks on an AMD K6-3. >Probably less on a more advanced CPU. Indirect calls, i.e. through a >function pointer, are slower only due to the load to use penalty. If the >pointer is in L1 cache, an indirect call is probably only 3-8 clocks. > >Cache misses are the big cost. Hitting L1 cache, the cheapest memory >access, is generally 2-4 clocks. L2 cache is generally 6-30 clocks. >Missing cache entirely and having to go to main memory is 100-300+ clocks. >Inlining expands the code size, and thus means you're likely having more >expensive cache misses. At 300 clocks/cache miss, it doesn't take all >that many cache misses to totally overwhealm the small advantages gained >by inlining functions. > > Hello, Do you happen to have a pointer to a document listing the (approximate) timing of the various instructions on todays hardware? You have listed a few and I was wondering if you have a more comprehensive study. You say "Inlining expands the code size and thus you're likely having more expensive cache misses". I wonder how true that is. For example, consider a simple expression such as {if p() e0 e1}. If the compiler decides to inline p (in case p is small enough, leaf node, etc ...), then in addition to the benefits of inlining (no need to save your live variable, or restore them later, constant folding, copy propagation, ...), you're also avoiding the jump to p. Since p can be anywhere in memory, a cache miss is very probable. If p was inlined, its location is now close and is less likely to cause a cache miss. Not inlining causes the PC to be all over the place cauing more cache misses. Am I missing something? Aziz,,, ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners