Re: [Caml-list] matrix-matrix multiply - O'Caml is 6 times slower than C

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Xavier Leroy <xavier.leroy@inria.fr>
To: Paul Stodghill <stodghil@CS.Cornell.EDU>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] matrix-matrix multiply - O'Caml is 6 times slower than C
Date: Sun, 20 Oct 2002 11:42:51 +0200	[thread overview]
Message-ID: <20021020114251.A7642@pauillac.inria.fr> (raw)
In-Reply-To: <r99u1jjh5k8.fsf@quimby-xp.cs.cornell.edu>; from stodghil@CS.Cornell.EDU on Fri, Oct 18, 2002 at 04:09:59PM -0400

> Looking at the assembly produced by O'Caml and GCC, it appears that GCC
> is performance loop unrolling (as requested with -funroll-loops) and
> strength reduction in the inner loops. I can easily see why these two
> optimizations can result in such a tremendous performance difference.
> 
> My question is this: I can obviously performance loop unrolling myself
> by hand - does ocamlopt perform strength reduction?

No, it doesn't.  It doesn't hoist loop-invariant computations either.
(See http://caml.inria.fr/ocaml/numerical.html)

Multiplications are pretty expensive on modern processors, so your
code would run much faster if you hoist the computation i*120 out of
the j and k loops, and strength-reduce k*120 in the innermost
loop.

Another option worth considering is to swap the j and k loops, thus
making k*120 and a.(i*120+k) two invariants of the inner loop.  In
this case, you might not even have to strength-reduce k*120.

Other loop optimizations such as loop unrolling would (I guess) make
much less of a difference -- modern processors are very good at branch
prediction, making loop overhead low.

Some more thoughts:

Bagley's code works on a "proper" matrix (an array of
arrays), while yours flatten the matrix into one single array.  There
are pros and cons to either approach, but notice that your approach
(without strength reduction) entails fewer loads but more
multiplications, which can be more expensive...

Tiling loops is good for really large matrices, but yours (in this
example) occupies "only" 115 Kb, thus fits comfortably in L2 cache.
Perhaps tiling isn't beneficial in this case.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

next prev parent reply	other threads:[~2002-10-20  9:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-18 20:09 Paul Stodghill
2002-10-20  9:42 ` Xavier Leroy [this message]
2002-10-20 18:06   ` Paul Stodghill
2002-10-20 21:58     ` Issac Trotts
2002-10-23 10:16       ` [Caml-list] Re: float boxing (was: matrix-matrix multiply) Christophe TROESTLER
2002-10-23 12:08         ` malc
2002-10-24  0:00           ` malc
2002-10-21 12:53     ` [Caml-list] matrix-matrix multiply - O'Caml is 6 times slower than C Xavier Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021020114251.A7642@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=stodghil@CS.Cornell.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).