From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA07346; Sun, 27 May 2001 10:47:08 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA07157 for <caml-list@pauillac.inria.fr>; Sun, 27 May 2001 10:47:08 +0200 (MET DST)
Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35])
	by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f4R8l6r19818;
	Sun, 27 May 2001 10:47:06 +0200 (MET DST)
Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA07348; Sun, 27 May 2001 10:47:06 +0200 (MET DST)
Date: Sun, 27 May 2001 10:47:06 +0200
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: Mattias Waldau <mattias.waldau@abc.se>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Obsessed by speed
Message-ID: <20010527104706.A7262@pauillac.inria.fr>
References: <sni5r1c7.fsf@mail.ru> <AAEBJHFJOIPMMIILCEPBIEBBCLAA.mattias.waldau@abc.se>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <AAEBJHFJOIPMMIILCEPBIEBBCLAA.mattias.waldau@abc.se>; from mattias.waldau@abc.se on Fri, May 18, 2001 at 09:51:07PM +0200
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

> 1. If I define Array.iter + a function that uses it,
>    will ocamlopt combine these two functions two one?
>    (I looked in the assembler code, but it seemed as
>     ocamlopt didn't combine them)

You're right, the function inlining pass in ocamlopt is rather
conservative and doesn't inline and beta-reduce function arguments to
higher-order functions.  Inlining is a delicate trade-off between too
little and too much (e.g. code size and compile time explosion), and
OCaml errs on the conservative side.

> 2. Do we need special Pentium-4 adaptions to utilize
>    its very good float performance?

I'm not familiar with the P4 micro-architecture, so I don't know.
ocamlopt uses the standard IA32 stack-based model for floating-point
code.  Apparently, the P4 can now do 64-bit float arithmetic between
true registers (the SSE2 model), and ocamlopt could generate better
(and simpler!) floating-point code for this model.  Don't know how
much of a performance difference that would make, though.  

At any rate, the ocamlopt port for AMD's x86-64 architecture will use
the SSE2 model.

> 3. Would ocamlopt benefit from a peephole optimizer of
>    the assembler code? Or is the assembler code already
>    optimal?

No assembly code is ever optimal, especially if machine-generated :-)
A peephole optimizer could remove some cosmetic inefficiencies, but I
doubt this would make a significant speed difference.  Today's
processors have enough instruction-level parallelism and dynamic
instruction scheduling that a few redundant integer operations here
and there don't really hurt.  

Other higher-level optimizations not currently performed could improve
performance more, e.g. common subexpression elimination on memory loads.

> 4. What is unboxed and what isn't?
>    I have noticed that there is a
>    continuos work to unbox more.

Very briefly:

Always unboxed:
  int, char, bool, constant constructors of datatypes.
Locally unboxed (in expressions and let...in): 
  float, int32, nativeint (and int64 on 64-bit processors)
Unboxed inside arrays:
  float
Unboxed inside big arrays:
  all numerical types
Always boxed:
  everything else (records, non-constant datatype constructors,
  tuples, arrays, etc)

> 5. How do you make sense of gprof-output? Any suggestions?

The "info" docs for gprof contain a tutorial.
The function names that appear are of the form Modulename_functionname_NNN
where NNN is a unique integer.
Be careful with the call graph reported by gprof: it is totally
inaccurate for higher-order functions.

Hope this answers your questions.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr