caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] native code optimization priorities
@ 2001-10-31  3:08 Chris Hecker
  2001-10-31  7:50 ` Fabrice Le Fessant
  2001-11-06 14:06 ` [Caml-list] native code optimization priorities Xavier Leroy
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Hecker @ 2001-10-31  3:08 UTC (permalink / raw)
  To: caml-list


Hi, this is just a general question about the caml development team's priorities with respect to the native code compiler's optimized code generation (and bytecode where appropriate), and some specific questions that go along with that.  

I think optimizations are far less important than new features since Moore's law works on the former but not the latter.  So, in some sense, I hope adding new features[*] is prioritized much higher than optimization.

However, I have a bunch of small things I'd like to implement (or see implemented) for making native numerical code faster.  This is primarily for my video game work, but the kinds of things I have in mind will also help any numerically intensive application.  So, here are my questions:

0.  How important is optimization to the team?

1.  Are there any new (big or small) optimizations planned or in the works?

2.  What's the relative priority of new features versus compiler optimizations?

3.  Is there some kind of standard suite of test applications the caml team runs to figure out whether an optimization is worth it to include?  

4.  Are numerical operations an important area for ocaml to succeed?  Put another way, if an optimization helps numerical code but does not help other code (or even slightly hurts it), how would that patch be received?  What about command line options for optimization (of which there very few now) to offset this affect?

5.  How does the team feel about optimizations added to the x86 code generator that don't help other platforms?

Thanks,
Chris

* My personal favorites one more time: overloading, module recursion, generics!

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] native code optimization priorities
  2001-10-31  3:08 [Caml-list] native code optimization priorities Chris Hecker
@ 2001-10-31  7:50 ` Fabrice Le Fessant
  2001-11-06 14:20   ` [Caml-list] compiler patches in the CDK Xavier Leroy
  2001-11-06 14:06 ` [Caml-list] native code optimization priorities Xavier Leroy
  1 sibling, 1 reply; 7+ messages in thread
From: Fabrice Le Fessant @ 2001-10-31  7:50 UTC (permalink / raw)
  To: Chris Hecker; +Cc: caml-list


I'm not part of the Ocaml devel team, but as an "old" ocaml user, I would
reply:

>  0.  How important is optimization to the team?
>  2.  What's the relative priority of new features versus compiler
>  optimizations?

Optimizations are welcome, if they don't complexify too much the compiler.

>  3.  Is there some kind of standard suite of test applications the
>  3.  caml team runs to figure out whether an optimization is worth
>  3.  it to include?

Look at the CVS version of ocaml, there are test directories I
think. Coq compilation is often used for evaluating optimizations.

>  4.  Are numerical operations an important area for ocaml to
>  4.  succeed?  Put another way, if an optimization helps numerical
>  4.  code but does not help other code (or even slightly hurts it),
>  4.  how would that patch be received?  What about command line
>  4.  options for optimization (of which there very few now) to
>  4.  offset this affect?

Most current users look more interested in "symbolic" computations,
than in "numerical" applications. However, this might change if you
add such an optimization patch. But, if your patch degrades "symbolic"
performances, you MUST ADD AN OPTION to trigger it ONLY on numerical
applications.

Notice that, as discussed before on this mailing-list, I would welcome
such a patch in the CDK.
  
>  5.  How does the team feel about optimizations added to the x86
>  5.  code generator that don't help other platforms?

x86 optimization is better than nothing.

Finally, I would say it might be interesting to have an optional pass
in the compiler, where user-contributed optimizations might be added.
Then, there would be some space for an independant project, something
like ocaml-opts.sourceforge.net that would develop this pass.

Regards,

--
Fabrice
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] compiler patches in the CDK
  2001-11-06 14:20   ` [Caml-list] compiler patches in the CDK Xavier Leroy
@ 2001-11-06 13:49     ` Fabrice Le Fessant
  0 siblings, 0 replies; 7+ messages in thread
From: Fabrice Le Fessant @ 2001-11-06 13:49 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list


Xavier wrote:
>  This is one thing I'm not sure to understand about the CDK.
>  
>  My initial view of the CDK is as a pre-packaged binary installation of
>  OCaml plus lots of user-contributed libraries and tools: a very
>  convenient thing indeed for users who want an OCaml development
>  environment that works and that is rich enough, without the hassle of
>  tracking down and installing all the bits themselves.  Excellent idea.
>  
>  But then we learn that the CDK also includes some experimental, not
>  much tested patches to the OCaml compilers, and that by doing this
>  Fabrice intends the CDK to serve also as a beta-test for these
>  experimental extensions and changes.
>  
>  So, is the CDK a stable, convenient distribution for users who
>  want something that works with no hassle, or an experimental
>  distribution for users who want to sit on the bleeding edge and
>  beta-test things?

I understand that the idea of untested patches being included in the
CDK can frighten users. Two replies:

 1) Most patches which were included in the CDK until a recent date
    were very simple patches, which only modify small well delimited
    parts of the compiler. Bugs in these patches are very
    unlikely. However, it is true that I've added some experimental
    patches very recently, with the idea that the CDK should also
    welcome contributed patches to the compiler as it welcomes
    contributed libraries, some of these patches being often asked for
    on the caml mailing-list. I've tried to read these patches
    carefully, before including them, to reduce the risk of
    introducing bugs. In particular, most of them require the use of
    special keywords or options to trigger them, and so, should not
    introduce bugs for users that don't use them. 

 2) As a result of your mail, and of the discussion of this morning, I
    will remove all experimental patches from the compiler distributed
    in the CDK. However, since I think some of the experimental
    patches can still be useful for some users, I will investigate if
    I can add a second compiler, something like ocamlc-patched and
    ocamlopt-patched, that will contain some of the patches and still
    be compatible with object files generated by ocamlc and ocamlopt.

Hope this answers your curiosity.

- Fabrice

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] native code optimization priorities
  2001-10-31  3:08 [Caml-list] native code optimization priorities Chris Hecker
  2001-10-31  7:50 ` Fabrice Le Fessant
@ 2001-11-06 14:06 ` Xavier Leroy
       [not found]   ` <20011106154533.D27723@chopin.ai.univie.ac.at>
       [not found]   ` <Pine.SOL.4.20.0111061141330.10389-100000@godzilla.ics.uci.edu>
  1 sibling, 2 replies; 7+ messages in thread
From: Xavier Leroy @ 2001-11-06 14:06 UTC (permalink / raw)
  To: Chris Hecker; +Cc: caml-list

> However, I have a bunch of small things I'd like to implement (or
> see implemented) for making native numerical code faster.  This is
> primarily for my video game work, but the kinds of things I have in
> mind will also help any numerically intensive application.  So, here
> are my questions:
> 
> 0.  How important is optimization to the team?

Generating efficient machine code has always been an important aspect
of OCaml, and I spent quite a bit of work on this at the beginning of
the OCaml development (95-97).  Nowadays, we are largely satisfied
with the performances of the generated code, and get very few requests
for improving it, so this aspect of the OCaml implementation has
received little attention recently.

Also, I believe we've hit the point of diminishing returns: the major
optimizations (that lead to significant speedups on many programs) are
already in the ocamlopt compiler; further optimizations would (I
believe) result in tiny speedups (less than 5%) or be extremely
specific to a couple of test programs.

> 1.  Are there any new (big or small) optimizations planned or in the works?

Not really.  Like other members of the OCaml development teams, I have
vague ideas about things that could be done, e.g. a Pentium-4 back-end
that would use SSE2 registers for floating-point, but this is all
low priority.

Of course, we are committed to track changes in dominant processor
architectures; for instance, if the IA64 becomes widespread (heavens
forbid), some effort will have to be invested in cross-basic-block
instruction scheduling, if-conversion, and perhaps exploitation of
advanced loads.  But the fact is that computer architectures viewed
from the compiler writer's standpoint haven't changed significantly in
the last 5 years: these hardware guys do such a good job of cranking
out better and faster processors that require no change in the compiler...

> 2.  What's the relative priority of new features versus compiler
> optimizations?

As I said above, the demand for more optimizations is low.  Moreover,
advanced compiler optimizations require a lot of implementation and
testing work.

> 3.  Is there some kind of standard suite of test applications the
> caml team runs to figure out whether an optimization is worth it to
> include?

I use intensively the small benchmark suite available at:
        http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/test/
These are mostly small benchmarks, but some of them (KB, fft, nucleic)
predict fairly well the performances of bigger applications.
The ICFP programming contest entries of the last three years have also
been used as benchmarks several times.  Finally, the Coq theorem
prover stresses quite well the compiler and runtime system as far as
symbolic processing is concerned.

> 4.  Are numerical operations an important area for ocaml to succeed?

Although ML is historically rooted in symbolic processing, I did quite
a bit of work on the compiler to achieve decent floating-point performance.
Still, symbolic processing is OCaml's bread-and-butter, and takes
precedence over floating-point performance.

>  Put another way, if an optimization helps numerical code but does
>  not help other code (or even slightly hurts it), how would that patch
>  be received?

Does not help: OK.  Slightly hurts it: that might be a problem.  OCaml
contains one instance of this: float arrays are special-cased in a way
that improves tremendously the performance of floating-point code,
but slows down polymorphic code operating on arrays.  I still think
this was an acceptable trade-off, but not everyone agrees.

Some of my earlier work on type-directed compilation (the Gallium
experimental compiler) was abandoned because while it improved the
performance of floating-point and integer computations, it slowed down
the garbage collector too much, causing pure symbolic processing to
take an unacceptable performance hit.

> What about command line options for optimization (of which there
> very few now) to offset this affect?

Only if we absolutely must.  The problem with having lots of compiler
flags is that it makes testing the compiler much harder -- in
principle, all combinations of flags should be tested...

> 5.  How does the team feel about optimizations added to the x86 code
> generator that don't help other platforms?

Fine with me.  Like all compiler writers, I hate the IA32
architecture, but that's what everyone uses these days.  The ocamlopt
back-end already contains quite a bit of IA32-specific code (in the
instruction selection phase, for instance).

Hope this answers your questions.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] compiler patches in the CDK
  2001-10-31  7:50 ` Fabrice Le Fessant
@ 2001-11-06 14:20   ` Xavier Leroy
  2001-11-06 13:49     ` Fabrice Le Fessant
  0 siblings, 1 reply; 7+ messages in thread
From: Xavier Leroy @ 2001-11-06 14:20 UTC (permalink / raw)
  To: Fabrice Le Fessant; +Cc: caml-list

> >  4.  Are numerical operations an important area for ocaml to
> >  4.  succeed?  Put another way, if an optimization helps numerical
> >  4.  code but does not help other code (or even slightly hurts it),
> >  4.  how would that patch be received?
>
> Notice that, as discussed before on this mailing-list, I would welcome
> such a patch in the CDK.

This is one thing I'm not sure to understand about the CDK.

My initial view of the CDK is as a pre-packaged binary installation of
OCaml plus lots of user-contributed libraries and tools: a very
convenient thing indeed for users who want an OCaml development
environment that works and that is rich enough, without the hassle of
tracking down and installing all the bits themselves.  Excellent idea.

But then we learn that the CDK also includes some experimental, not
much tested patches to the OCaml compilers, and that by doing this
Fabrice intends the CDK to serve also as a beta-test for these
experimental extensions and changes.  

So, is the CDK a stable, convenient distribution for users who
want something that works with no hassle, or an experimental
distribution for users who want to sit on the bleeding edge and
beta-test things?

Just curious.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] native code optimization priorities
       [not found]   ` <20011106154533.D27723@chopin.ai.univie.ac.at>
@ 2001-11-08  9:45     ` Xavier Leroy
  0 siblings, 0 replies; 7+ messages in thread
From: Xavier Leroy @ 2001-11-08  9:45 UTC (permalink / raw)
  To: Markus Mottl; +Cc: caml-list

> Just out of curiosity: what do you as a compiler developer dislike
> about the IA64-architecture so much that you said "heavens forbid"? Not
> that I have any opinion on this - it's only interesting to learn about
> shortcomings of the new architecture. Do you think its features are not
> useful for getting even more efficient code out of ocamlopt? Is it just
> too complicated a design?

This is getting off-topic for this list, but briefly: the IA64
architecture is baroque.  It is very complex, provides lots of dubious
features (register windows, hardware support for software pipelining,
several kinds of load-store speculation), yet lacks some very basic
things (such as indirect addressing with immediate displacement).  We
are very far from the elegance and minimality of classic RISCs such as
the Alpha.  All these fancy features seem targeted to high-performance
Fortran; it is unclear how to exploit them for C, let alone for Caml.

Moreover, it relies on the compiler to make instruction parallelism
explicit.  I believe this is a bad idea compared with what everyone
else is doing these days, i.e. discovers instruction parallelism at
run-time, in the chip (out-of-order execution).

Finally, the first silicon implementation (Itanium) is very late, very
expensive, and slower than a $150 Pentium or Athlon for integer code
(floating-point performance is excellent, though).  Future
implementation will probably be better, but still this might indicate
something wrong in the design of the architecture.

As for using the IA64 features in ocamlopt-generated code, it might be
possible to make good use of predication (conditional instructions)
for short conditional sequences, and of load speculation (exploiting
the fact that a load from an immutable OCaml block cannot interfere
with any store).  However, both features need new optimization passes
that include quite sophisticated heuristics (neither predication nor
load speculation are always a win, both can also swamp processor
resources with useless instructions).

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] native code optimization priorities
       [not found]   ` <Pine.SOL.4.20.0111061141330.10389-100000@godzilla.ics.uci.edu>
@ 2001-11-08  9:59     ` Xavier Leroy
  0 siblings, 0 replies; 7+ messages in thread
From: Xavier Leroy @ 2001-11-08  9:59 UTC (permalink / raw)
  To: Niall Dalton; +Cc: caml-list

> > I have
> > vague ideas about things that could be done, e.g. a Pentium-4 back-end
> > that would use SSE2 registers for floating-point, but this is all
> > low priority.
> 
> May I ask if you ever did implement this, would you limit it to some
> P4 specific technique? I've idly toyed with the idea of implementing
> something for Altivec on the G4.

I'm afraid I wasn't clear enough: the first step would be to use SSE2
registers as normal floating-point registers, storing only one float
per register, and performing single floating-point operations.  This
would already improve float performance quite a lot compared with the
current x86 float stack.  Other processors do not need this hack,
because they already have a sensible register-based float architecture.

The next step, of course, would be to actually use SIMD instructions
to operate on pairs or quadruples of floats.  The standard approach
would be to have special abstract types for these packed floats, with
operations corresponding to what the hardware SIMD unit provides.  The
problem here is that of portability: SSE2 and Altivec, for instance,
do not provide the same SIMD instructions...

> I wondered if it would be possible
> to integrate this into the type inference; if the compiler can infer
> that certain values will never require more than a certain number of
> bits they become candidates for use in a SIMD unit. This is along the
> lines of Bitwidth Analysis (PLDI'00 Stephenson et al, and Larsen and
> Amarasinghe's Exploting Superword Level Parallelism with Multimedia
> Instruction Sets, same conference). Scott Ananian's SM thesis at MIT
> also included a predicated (forward and reverse) SSA variant that used
> a similar optimization to find narrow operations that could be executed in
> parallel. 

We're getting into really advanced stuff here!  It's a research topic
on its own, and I somewhat doubt that we can extract much parallelism
this way, but we'll see.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-11-08  9:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-31  3:08 [Caml-list] native code optimization priorities Chris Hecker
2001-10-31  7:50 ` Fabrice Le Fessant
2001-11-06 14:20   ` [Caml-list] compiler patches in the CDK Xavier Leroy
2001-11-06 13:49     ` Fabrice Le Fessant
2001-11-06 14:06 ` [Caml-list] native code optimization priorities Xavier Leroy
     [not found]   ` <20011106154533.D27723@chopin.ai.univie.ac.at>
2001-11-08  9:45     ` Xavier Leroy
     [not found]   ` <Pine.SOL.4.20.0111061141330.10389-100000@godzilla.ics.uci.edu>
2001-11-08  9:59     ` Xavier Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).