Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: "Will M. Farr" <farr@MIT.EDU>
Cc: shootout-list@lists.alioth.debian.org, caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
Date: Sat, 15 Jan 2005 12:55:19 +0100	[thread overview]
Message-ID: <20050115115519.GA11037@yquem.inria.fr> (raw)
In-Reply-To: <3D3A6BF5-657B-11D9-A551-000393A34E82@mit.edu>

> Here's an implementation (really 4) along with timing  
> information of the "harmonic" benchmark (essentially summing the  
> harmonic series) [...]
> Here's the code for the fastest implementation:

The following slight modification of your code generates asm code that
is closest to what a C compiler would produce:

let sum_harmonic5 n =
  let sum = ref 1.0 in
  let ifloat = ref 2.0 in
    for i = 2 to n do
      sum := !sum +. 1.0/.(!ifloat);
      ifloat := !ifloat +. 1.0
    done;
    !sum +. 0.0;;

The + 0.0 at the end is ugly but convinces ocamlopt that !sum is best
kept unboxed during the loop.

> (note that I have replaced an int->float conversion for  
> each term with a single floating point operation: ifloat := ifloat +.  
> 1.0).  Since int->float conversions are slow on my machine (PowerBook  
> G4)

Right, the PowerPC does not have an int -> float instruction and that
conversion must be performed with a rather expensive sequence of
instructions (for the gory details, see e.g.
 http://the.wall.riscom.net/books/proc/ppc/cwg/code3.html#303610).

64-bit PPCs have a dedicated instruction to do this conversion,
showing that the IBM/Motorola people learn from their past mistakes...

For Intel processors, it's the reverse conversion (float -> int) that
is slow.  Clearly, the SPEC benchmark doesn't contain much conversions
between floats and ints, otherwise hardware designers would pay more
attention :-)

> this is a big win (about a factor of 2 in time for the C program).  

As others have mentioned, this strongly depends on the processor
instruction set and even on the processor model.  My own benchmarks
(with your Caml code) give the following results:

PPC G4 (Cube)   1 < 2 < 3 < 4 < 5   speed ratio = 1.5
Xeon 2.8        3 < 4 < 1 = 2 < 5   speed ratio = 1.02
Pentium 4 2.0   3 < 1 < 2 < 4 < 5   speed ratio = 1.2
Athlon XP 1.4   4 < 5 < 3 < 1 < 2   speed ratio = 2.2

where 1, 2, 3, 4, 5 refer to the 5 different functions,
1 < 2 means "1 is slower than 2",
and "speed ratio" is the speed difference between fastest and slowest.

The Xeon case is what I was expecting: the running time is dominated by
the time it takes to do the float divisions, everything else is done in
parallel or takes negligible time, so it doesn't matter much how you
write the code.

The Athlon figures are *very* surprising.  It could be the case that
this benchmark falls into a quirk of that (otherwise excellent :-)
processor.  

Actually, this often happens with micro-benchmarks: they are so small
and their mix of operations is so unbalanced that they can easily run
into weird processor behaviors.  So, don't draw conclusions hastily.

John Prevost asks:

> Is the PowerPC ocamlopt back-end less optimized than the x86?

No, not really.  The x86 back-end works harder to work around oddities
in the x86 instruction set (e.g. the lack of floating-point
registers), but that is hardly an optimization, just compensating for
brain damage in the instruction set.  Conversely, the PPC back-end
performs basic-block instruction scheduling while the x86 back-end doesn't.
Instruction scheduling helped with early PPC chips (601, 603) but is
largely irrelevant with modern out-of-order PPC implementations.

> Are you sure that there isn't just a built-in 
> instruction on the x86 that adds an int to a float?

I think there exists one such instruction, but ocamlopt doesn't use
it, and the Intel optimization manuals recommend to do int->float
conversion followed by float addition instead.

- Xavier Leroy

next prev parent reply	other threads:[~2005-01-15 11:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-13 15:53 Will M. Farr
2005-01-13 17:29 ` [Caml-list] " John Prevost
2005-01-13 19:01   ` Will M. Farr
2005-01-13 20:24     ` John Prevost
2005-01-13 20:50       ` Erik de Castro Lopo
2005-01-13 21:32         ` Erik de Castro Lopo
2005-01-15 11:55 ` Xavier Leroy [this message]
2005-01-15 15:49   ` Michal Moskal
2005-01-15 17:01   ` [Caml-list] [FP performance] Ocaml sums the harmonic series Christophe TROESTLER
2005-01-15 17:13   ` [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Yaron Minsky
2005-01-23  2:27 ` Oliver Bandel
2005-01-23  6:07   ` Will M. Farr
2005-01-23 15:18     ` Oliver Bandel
2005-01-16  9:57 Philippe Lelédy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050115115519.GA11037@yquem.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@yquem.inria.fr \
    --cc=farr@MIT.EDU \
    --cc=shootout-list@lists.alioth.debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).