From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA07346; Sun, 27 May 2001 10:47:08 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA07157 for ; Sun, 27 May 2001 10:47:08 +0200 (MET DST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f4R8l6r19818; Sun, 27 May 2001 10:47:06 +0200 (MET DST) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA07348; Sun, 27 May 2001 10:47:06 +0200 (MET DST) Date: Sun, 27 May 2001 10:47:06 +0200 From: Xavier Leroy To: Mattias Waldau Cc: caml-list@inria.fr Subject: Re: [Caml-list] Obsessed by speed Message-ID: <20010527104706.A7262@pauillac.inria.fr> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from mattias.waldau@abc.se on Fri, May 18, 2001 at 09:51:07PM +0200 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > 1. If I define Array.iter + a function that uses it, > will ocamlopt combine these two functions two one? > (I looked in the assembler code, but it seemed as > ocamlopt didn't combine them) You're right, the function inlining pass in ocamlopt is rather conservative and doesn't inline and beta-reduce function arguments to higher-order functions. Inlining is a delicate trade-off between too little and too much (e.g. code size and compile time explosion), and OCaml errs on the conservative side. > 2. Do we need special Pentium-4 adaptions to utilize > its very good float performance? I'm not familiar with the P4 micro-architecture, so I don't know. ocamlopt uses the standard IA32 stack-based model for floating-point code. Apparently, the P4 can now do 64-bit float arithmetic between true registers (the SSE2 model), and ocamlopt could generate better (and simpler!) floating-point code for this model. Don't know how much of a performance difference that would make, though. At any rate, the ocamlopt port for AMD's x86-64 architecture will use the SSE2 model. > 3. Would ocamlopt benefit from a peephole optimizer of > the assembler code? Or is the assembler code already > optimal? No assembly code is ever optimal, especially if machine-generated :-) A peephole optimizer could remove some cosmetic inefficiencies, but I doubt this would make a significant speed difference. Today's processors have enough instruction-level parallelism and dynamic instruction scheduling that a few redundant integer operations here and there don't really hurt. Other higher-level optimizations not currently performed could improve performance more, e.g. common subexpression elimination on memory loads. > 4. What is unboxed and what isn't? > I have noticed that there is a > continuos work to unbox more. Very briefly: Always unboxed: int, char, bool, constant constructors of datatypes. Locally unboxed (in expressions and let...in): float, int32, nativeint (and int64 on 64-bit processors) Unboxed inside arrays: float Unboxed inside big arrays: all numerical types Always boxed: everything else (records, non-constant datatype constructors, tuples, arrays, etc) > 5. How do you make sense of gprof-output? Any suggestions? The "info" docs for gprof contain a tutorial. The function names that appear are of the form Modulename_functionname_NNN where NNN is a unique integer. Be careful with the call graph reported by gprof: it is totally inaccurate for higher-order functions. Hope this answers your questions. - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr. Archives: http://caml.inria.fr