caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
@ 2005-01-13 15:53 Will M. Farr
  2005-01-13 17:29 ` [Caml-list] " John Prevost
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Will M. Farr @ 2005-01-13 15:53 UTC (permalink / raw)
  To: shootout-list; +Cc: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been looking at using ocaml to implement a gravitational n-body  
code, and therefore have quite a bit of interest in its floating-point  
performance.  Also, I'm learning the language by playing around with  
simple programs.  Here's an implementation (really 4) along with timing  
information of the "harmonic" benchmark (essentially summing the  
harmonic series), which can be found here:

http://shootout.alioth.debian.org/sandbox/benchmark.php? 
test=harmonic&lang=all&sort=cpu

After testing different ways of implementing the ocaml harmonic  
benchmark, I have settled on the following program.  For sizes of 1 000  
000 000 terms, it takes about 25% longer than the corresponding  
algorithm in c (note that I have replaced an int->float conversion for  
each term with a single floating point operation: ifloat := ifloat +.  
1.0).  Since int->float conversions are slow on my machine (PowerBook  
G4), this is a big win (about a factor of 2 in time for the C program).  
  Alas, when the number of terms approaches 16 digits, this method will  
lose accuracy, since <~16-digit number> +. 1.0 = <16-digit number +  
difference in last bit of mantissa>.  However, for sizes like the  
shootout seems to be using, this algorithm works fine (and the usual  
int type won't hold anything close to 16 digits anyway!).  I'm cc-ing  
this to the caml list because there may be people there interested in  
the floating point performance of Caml

Here's the code for the fastest implementation:

let sum_harmonic4 n =
   let sum = ref 1.0 in
   let ifloat = ref 2.0 in
     for i = 2 to n do
       sum := !sum +. 1.0/.(!ifloat);
       ifloat := !ifloat +. 1.0
     done;
     !sum;;

let _ =
   let n = int_of_string (Sys.argv.(1)) in
     Printf.printf "%g\n" (sum_harmonic4 n);;

And here's all the implementations I tried (for those interested in  
such things with ocaml):

let sum_harmonic n =
   let rec loop i sum =
     if i <= n then
       loop (i + 1) (sum +. 1.0/.(float_of_int i))
     else
       sum in
     loop 2 1.0;;

let sum_harmonic2 n =
   let sum = ref 1.0 in
   for i = 2 to n do
     sum := !sum +. 1.0/.(float_of_int i)
   done;
     !sum;;

let sum_harmonic3 n =
   let rec loop i ifloat sum =
     if i <= n then
       loop (i + 1) (ifloat +. 1.0) (sum +. 1.0/.ifloat)
     else
       sum in
     loop 2 2.0 1.0;;

let sum_harmonic4 n =
   let sum = ref 1.0 in
   let ifloat = ref 2.0 in
     for i = 2 to n do
       sum := !sum +. 1.0/.(!ifloat);
       ifloat := !ifloat +. 1.0
     done;
     !sum;;

let _ =
   let n = int_of_string (Sys.argv.(1)) in
     Printf.printf "%g\n" (sum_harmonic4 n);;

The timings for my machine (PowerBook G4, 800 Mhz) are as follows:

time ./harmonic 1000000000:
harmonic: 	user    2m1.590s
			sys     0m0.790s

harmonic2: 	user    2m0.340s
			sys     0m0.440s

harmonic3: 	user    1m44.350s
			sys     0m0.740s

harmonic4: 	user    1m12.680s
			sys     0m0.430s

Each invocation was compiled with "ocamlopt -unsafe -noassert -o  
harmonic harmonic.ml".  It looks like using references and loops is *by  
far* the fastest (and also that my PowerBook is pretty slow to convert  
int->float, but I don't think this is related to ocaml, since the C  
version does the same thing).

Hope you all find this interesting.

Will
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)

iD8DBQFB5pl3jFCrhUweU3MRApDzAJ9Ysln/KTQcq4WzxT9060GcDAgKQwCfTsb0
mDm4UyyghIz7m7r4ZpGcI3o=
=dLDI
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-13 15:53 Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Will M. Farr
@ 2005-01-13 17:29 ` John Prevost
  2005-01-13 19:01   ` Will M. Farr
  2005-01-15 11:55 ` Xavier Leroy
  2005-01-23  2:27 ` Oliver Bandel
  2 siblings, 1 reply; 14+ messages in thread
From: John Prevost @ 2005-01-13 17:29 UTC (permalink / raw)
  To: Will M. Farr; +Cc: shootout-list, caml-list

On Thu, 13 Jan 2005 10:53:16 -0500, Will M. Farr <farr@mit.edu> wrote:
> Each invocation was compiled with "ocamlopt -unsafe -noassert
> -o harmonic harmonic.ml".  It looks like using references and
> loops is *by far* the fastest (and also that my PowerBook is
> pretty slow to convert int->float, but I don't think this is
> related to ocaml, since the C version does the same thing).

Note that this is dependent on what CPU you're using.  On my test
system (700MHz AMD Athlon with 256MB of memory), I saw this behavior:

time ./harmonic 1000000000:

harmonic:
  you: 2m01.590s .. 0m00.790s
   me: 0m30.811s .. 0m00.120s

harmonic2:
  you: 2m00.340s .. 0m00.440s
   me: 0m30.847s .. 0m00.140s

harmonic3:
  you: 1m44.350s .. 0m00.740s
   me: 0m38.002s .. 0m00.130s

harmonic4:
  you: 1m12.680s .. 0m00.430s
   me: 1m14.603s .. 0m00.220s

So on this system, harmonic4 is by far the slowest, and the fastest
version is the one that uses float_of_int and tail recursion.  It's
unclear to me how much of this is that the Intel compiler is simply
better optimized than the PPC compiler.

John.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-13 17:29 ` [Caml-list] " John Prevost
@ 2005-01-13 19:01   ` Will M. Farr
  2005-01-13 20:24     ` John Prevost
  0 siblings, 1 reply; 14+ messages in thread
From: Will M. Farr @ 2005-01-13 19:01 UTC (permalink / raw)
  To: John Prevost; +Cc: caml-list, shootout-list

Is the PowerPC ocamlopt back-end less optimized than the x86?  I didn't 
realize that ocamlopt did enough optimizations that the backend would 
be substantially different on the different architectures (in the 
manual they say that it compiles the code essentially as written -- no 
loop unrolling, etc).  Are you sure that there isn't just a built-in 
instruction on the x86 that adds an int to a float?

Will


On 13 Jan 2005, at 12:29 PM, John Prevost wrote:

> On Thu, 13 Jan 2005 10:53:16 -0500, Will M. Farr <farr@mit.edu> wrote:
>> Each invocation was compiled with "ocamlopt -unsafe -noassert
>> -o harmonic harmonic.ml".  It looks like using references and
>> loops is *by far* the fastest (and also that my PowerBook is
>> pretty slow to convert int->float, but I don't think this is
>> related to ocaml, since the C version does the same thing).
>
> Note that this is dependent on what CPU you're using.  On my test
> system (700MHz AMD Athlon with 256MB of memory), I saw this behavior:
>
> time ./harmonic 1000000000:
>
> harmonic:
>   you: 2m01.590s .. 0m00.790s
>    me: 0m30.811s .. 0m00.120s
>
> harmonic2:
>   you: 2m00.340s .. 0m00.440s
>    me: 0m30.847s .. 0m00.140s
>
> harmonic3:
>   you: 1m44.350s .. 0m00.740s
>    me: 0m38.002s .. 0m00.130s
>
> harmonic4:
>   you: 1m12.680s .. 0m00.430s
>    me: 1m14.603s .. 0m00.220s
>
> So on this system, harmonic4 is by far the slowest, and the fastest
> version is the one that uses float_of_int and tail recursion.  It's
> unclear to me how much of this is that the Intel compiler is simply
> better optimized than the PPC compiler.
>
> John.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-13 19:01   ` Will M. Farr
@ 2005-01-13 20:24     ` John Prevost
  2005-01-13 20:50       ` Erik de Castro Lopo
  0 siblings, 1 reply; 14+ messages in thread
From: John Prevost @ 2005-01-13 20:24 UTC (permalink / raw)
  To: Will M. Farr; +Cc: caml-list, shootout-list

There quite possibly is--I could look.  But I do believe that the
Intel architecture is best optimized for at least some set of
operations.  For example, looking through the assembly source, you'll
notice that it sometimes abuses Intel addressing modes to reduce the
cost of "Caml ints are just like native ints with a 1 in the low bit".

As for whether there's a quick "convert int to float" call in Intel, I
really have no idea.  The assembly for the simple function:

let test x = float_of_int x

isn't trivial, however.  I have to admit that I don't know the ins and
outs of Intel assembly any further than I have learned them while
trying to optimize specific O'Caml loops.  And since I rarely use
floating point, all of these opcodes are greek to me.  :)  I *think*
it's allocating space in the heap for the float, then filling it in
with a non-normalized value (which is pretty easy, since doubles are
64 bits, and ints are 31 bits), and then saying "normalize this,
please."  But I can't say for sure.  And since I don't have a PPC
system to play with, I can't compare.

John.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-13 20:24     ` John Prevost
@ 2005-01-13 20:50       ` Erik de Castro Lopo
  2005-01-13 21:32         ` Erik de Castro Lopo
  0 siblings, 1 reply; 14+ messages in thread
From: Erik de Castro Lopo @ 2005-01-13 20:50 UTC (permalink / raw)
  To: caml-list; +Cc: shootout-list

On Thu, 13 Jan 2005 15:24:19 -0500
John Prevost <j.prevost@gmail.com> wrote:

> As for whether there's a quick "convert int to float" call in Intel, I
> really have no idea.  The assembly for the simple function:
> 
> let test x = float_of_int x
> 
> isn't trivial, however.

Int to float should just work. Int to float is another matter. See
this:

    http://www.mega-nerd.com/FPcast/

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo  nospam@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"Whenever the C++ language designers had two competing ideas as to 
how they should solve some problem, they said, "OK, we'll do them 
both". So the language is too baroque for my taste." -- Donald E Knuth


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-13 20:50       ` Erik de Castro Lopo
@ 2005-01-13 21:32         ` Erik de Castro Lopo
  0 siblings, 0 replies; 14+ messages in thread
From: Erik de Castro Lopo @ 2005-01-13 21:32 UTC (permalink / raw)
  To: caml-list

On Fri, 14 Jan 2005 07:50:57 +1100
Erik de Castro Lopo <ocaml-erikd@mega-nerd.com> wrote:

> Int to float should just work. 

Yes.

> Int to float is another matter. See

I meant float to int of course.

> this:
> 
>     http://www.mega-nerd.com/FPcast/

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo  nospam@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"I consider C++ the most significant technical hazard to the survival 
of your project and do so without apologies." -- Alistair Cockburn 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-13 15:53 Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Will M. Farr
  2005-01-13 17:29 ` [Caml-list] " John Prevost
@ 2005-01-15 11:55 ` Xavier Leroy
  2005-01-15 15:49   ` Michal Moskal
                     ` (2 more replies)
  2005-01-23  2:27 ` Oliver Bandel
  2 siblings, 3 replies; 14+ messages in thread
From: Xavier Leroy @ 2005-01-15 11:55 UTC (permalink / raw)
  To: Will M. Farr; +Cc: shootout-list, caml-list

> Here's an implementation (really 4) along with timing  
> information of the "harmonic" benchmark (essentially summing the  
> harmonic series) [...]
> Here's the code for the fastest implementation:

The following slight modification of your code generates asm code that
is closest to what a C compiler would produce:

let sum_harmonic5 n =
  let sum = ref 1.0 in
  let ifloat = ref 2.0 in
    for i = 2 to n do
      sum := !sum +. 1.0/.(!ifloat);
      ifloat := !ifloat +. 1.0
    done;
    !sum +. 0.0;;

The + 0.0 at the end is ugly but convinces ocamlopt that !sum is best
kept unboxed during the loop.

> (note that I have replaced an int->float conversion for  
> each term with a single floating point operation: ifloat := ifloat +.  
> 1.0).  Since int->float conversions are slow on my machine (PowerBook  
> G4)

Right, the PowerPC does not have an int -> float instruction and that
conversion must be performed with a rather expensive sequence of
instructions (for the gory details, see e.g.
 http://the.wall.riscom.net/books/proc/ppc/cwg/code3.html#303610).

64-bit PPCs have a dedicated instruction to do this conversion,
showing that the IBM/Motorola people learn from their past mistakes...

For Intel processors, it's the reverse conversion (float -> int) that
is slow.  Clearly, the SPEC benchmark doesn't contain much conversions
between floats and ints, otherwise hardware designers would pay more
attention :-)

> this is a big win (about a factor of 2 in time for the C program).  

As others have mentioned, this strongly depends on the processor
instruction set and even on the processor model.  My own benchmarks
(with your Caml code) give the following results:

PPC G4 (Cube)   1 < 2 < 3 < 4 < 5   speed ratio = 1.5
Xeon 2.8        3 < 4 < 1 = 2 < 5   speed ratio = 1.02
Pentium 4 2.0   3 < 1 < 2 < 4 < 5   speed ratio = 1.2
Athlon XP 1.4   4 < 5 < 3 < 1 < 2   speed ratio = 2.2

where 1, 2, 3, 4, 5 refer to the 5 different functions,
1 < 2 means "1 is slower than 2",
and "speed ratio" is the speed difference between fastest and slowest.

The Xeon case is what I was expecting: the running time is dominated by
the time it takes to do the float divisions, everything else is done in
parallel or takes negligible time, so it doesn't matter much how you
write the code.

The Athlon figures are *very* surprising.  It could be the case that
this benchmark falls into a quirk of that (otherwise excellent :-)
processor.  

Actually, this often happens with micro-benchmarks: they are so small
and their mix of operations is so unbalanced that they can easily run
into weird processor behaviors.  So, don't draw conclusions hastily.

John Prevost asks:

> Is the PowerPC ocamlopt back-end less optimized than the x86?

No, not really.  The x86 back-end works harder to work around oddities
in the x86 instruction set (e.g. the lack of floating-point
registers), but that is hardly an optimization, just compensating for
brain damage in the instruction set.  Conversely, the PPC back-end
performs basic-block instruction scheduling while the x86 back-end doesn't.
Instruction scheduling helped with early PPC chips (601, 603) but is
largely irrelevant with modern out-of-order PPC implementations.

> Are you sure that there isn't just a built-in 
> instruction on the x86 that adds an int to a float?

I think there exists one such instruction, but ocamlopt doesn't use
it, and the Intel optimization manuals recommend to do int->float
conversion followed by float addition instead.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-15 11:55 ` Xavier Leroy
@ 2005-01-15 15:49   ` Michal Moskal
  2005-01-15 17:01   ` [Caml-list] [FP performance] Ocaml sums the harmonic series Christophe TROESTLER
  2005-01-15 17:13   ` [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Yaron Minsky
  2 siblings, 0 replies; 14+ messages in thread
From: Michal Moskal @ 2005-01-15 15:49 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Will M. Farr, shootout-list, caml-list

On Sat, 15 Jan 2005 12:55:19 +0100, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
> As others have mentioned, this strongly depends on the processor
> instruction set and even on the processor model.  My own benchmarks
> (with your Caml code) give the following results:
> 
> PPC G4 (Cube)   1 < 2 < 3 < 4 < 5   speed ratio = 1.5
> Xeon 2.8        3 < 4 < 1 = 2 < 5   speed ratio = 1.02
> Pentium 4 2.0   3 < 1 < 2 < 4 < 5   speed ratio = 1.2
> Athlon XP 1.4   4 < 5 < 3 < 1 < 2   speed ratio = 2.2

I tested it on Athlon 64 3000+ using both 32bit and 64bit compilers,
the results:

32bit: 4 = 5 < 3 < 1 = 2, speed ratio 2.2
64bit: 3 < 1 = 2 = 4 < 5, speed ratio 1.15

Difference between 64 and 32 bit version (best cases) is 1.30 (64 is faster).

All tests were performed using ocaml 3.07.

> The Athlon figures are *very* surprising.  It could be the case that
> this benchmark falls into a quirk of that (otherwise excellent :-)
> processor.

So I guess in 32 bit mode it remains the same on newer athlons.

-- 
: Michal Moskal :: http://nemerle.org/~malekith/ :: GCS !tv h e>+++ b++
: No, I will *not* fix your computer............ :: UL++++$ C++ E--- a?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] [FP performance] Ocaml sums the harmonic series
  2005-01-15 11:55 ` Xavier Leroy
  2005-01-15 15:49   ` Michal Moskal
@ 2005-01-15 17:01   ` Christophe TROESTLER
  2005-01-15 17:13   ` [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Yaron Minsky
  2 siblings, 0 replies; 14+ messages in thread
From: Christophe TROESTLER @ 2005-01-15 17:01 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: farr, shootout-list, caml-list

On Sat, 15 Jan 2005, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
> 
>     !sum +. 0.0;;
> 
> The + 0.0 at the end is ugly but convinces ocamlopt that !sum is
> best kept unboxed during the loop.

Since it always has a positive impact (at least w.r.t. the program
without it), would it be possible for ocamlopt to be convinced that it
is a good thing without having to write such a hack?  Or are there
reasons why it is difficult to do?

Best regards,
ChriS


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-15 11:55 ` Xavier Leroy
  2005-01-15 15:49   ` Michal Moskal
  2005-01-15 17:01   ` [Caml-list] [FP performance] Ocaml sums the harmonic series Christophe TROESTLER
@ 2005-01-15 17:13   ` Yaron Minsky
  2 siblings, 0 replies; 14+ messages in thread
From: Yaron Minsky @ 2005-01-15 17:13 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Will M. Farr, shootout-list, caml-list

On Sat, 15 Jan 2005 12:55:19 +0100, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:

> The following slight modification of your code generates asm code that
> is closest to what a C compiler would produce:
> 
> let sum_harmonic5 n =
>   let sum = ref 1.0 in
>   let ifloat = ref 2.0 in
>     for i = 2 to n do
>       sum := !sum +. 1.0/.(!ifloat);
>       ifloat := !ifloat +. 1.0
>     done;
>     !sum +. 0.0;;
> 
> The + 0.0 at the end is ugly but convinces ocamlopt that !sum is best
> kept unboxed during the loop.

That last comment is very interesting and surprising to me.  I've
looked over the optimization suggestions for the compiler, and I don't
understand why that last +. convinces the compiler to unbox sum.  Can
you explain why that is?  Floating point performance is important to
me, and I'd like to get a better grasp on it.

(As a general matter, it would be nice to have some tools to
understand things like unboxing and inlining a little better.  For
example, it would be great to have something akin to -dtypes that
outputs information with which one could check whether a certain
function call is inlined, or whether a certain float is unboxed.)

y


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-13 15:53 Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Will M. Farr
  2005-01-13 17:29 ` [Caml-list] " John Prevost
  2005-01-15 11:55 ` Xavier Leroy
@ 2005-01-23  2:27 ` Oliver Bandel
  2005-01-23  6:07   ` Will M. Farr
  2 siblings, 1 reply; 14+ messages in thread
From: Oliver Bandel @ 2005-01-23  2:27 UTC (permalink / raw)
  To: caml-list

On Thu, Jan 13, 2005 at 10:53:16AM -0500, Will M. Farr wrote:
[...]
> Here's the code for the fastest implementation:
> 
> let sum_harmonic4 n =
>   let sum = ref 1.0 in
>   let ifloat = ref 2.0 in
>     for i = 2 to n do
>       sum := !sum +. 1.0/.(!ifloat);
>       ifloat := !ifloat +. 1.0
>     done;
>     !sum;;
> 
> let _ =
>   let n = int_of_string (Sys.argv.(1)) in
>     Printf.printf "%g\n" (sum_harmonic4 n);;

I tried harmonic4 on Powerbook G4, 400 MHz and the
native-code needs  about 1 min 50s.

The Bytecode for harmonic4 runs in about 1min 53 s.

It seems that there is no real distinction between
bytecode and native code. At least on that system,
#or at least on that task.


I use Panther OS. It seems that it's more than twice as fast as your OS
(look at the processor frequency: 400 MHz on my PB G4, 800 MHz on yours...).

Which OS are you running? An older version of Mac-OS-X? Or Linux? (which one?)

Maybe you can speed-up your calculations a lot, when installing a different
operating system on your computer.

I didn't try the other implementations.
IMHO you can gain more performance easier, when
changing your OS. Easier than looking at some code optimizations...?!
                                   (which you nevertheless can do too)


Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-23  2:27 ` Oliver Bandel
@ 2005-01-23  6:07   ` Will M. Farr
  2005-01-23 15:18     ` Oliver Bandel
  0 siblings, 1 reply; 14+ messages in thread
From: Will M. Farr @ 2005-01-23  6:07 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm running 10.3.7 -- I don't think there's any newer version.  When I 
run harmonic4 as follows:

time ./harmonic 1000000000
21.3005

real    1m3.764s
user    1m0.590s
sys     0m0.130s

the above is what I get.  I'm not sure why I'm not exactly 2x faster 
than you, but there's plenty of things which could affect that.

Running the bytecode on my system gives:

time ./harmonic.bc 1000000000
21.3005

real    11m51.239s
user    11m11.600s
sys     0m0.940s

I would be pretty surprised to see the bytecode come even close to the 
native code version --- are you sure about the numbers on your system?

Will


On 22 Jan 2005, at 9:27 PM, Oliver Bandel wrote:

> I tried harmonic4 on Powerbook G4, 400 MHz and the
> native-code needs  about 1 min 50s.
>
> The Bytecode for harmonic4 runs in about 1min 53 s.
>
> It seems that there is no real distinction between
> bytecode and native code. At least on that system,
> #or at least on that task.
>
>
> I use Panther OS. It seems that it's more than twice as fast as your OS
> (look at the processor frequency: 400 MHz on my PB G4, 800 MHz on 
> yours...).
>
> Which OS are you running? An older version of Mac-OS-X? Or Linux? 
> (which one?)
>
> Maybe you can speed-up your calculations a lot, when installing a 
> different
> operating system on your computer.
>
> I didn't try the other implementations.
> IMHO you can gain more performance easier, when
> changing your OS. Easier than looking at some code optimizations...?!
>                                    (which you nevertheless can do too)
>
>
> Ciao,
>    Oliver
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)

iD8DBQFB8z8qjFCrhUweU3MRAn4FAKCM9oHCU3l/RY/Bm1+/3PzOiGPcSQCcCIku
3XIQ3tXUQQwtNPEfUzZoU3E=
=ivpj
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
  2005-01-23  6:07   ` Will M. Farr
@ 2005-01-23 15:18     ` Oliver Bandel
  0 siblings, 0 replies; 14+ messages in thread
From: Oliver Bandel @ 2005-01-23 15:18 UTC (permalink / raw)
  To: Will M. Farr; +Cc: caml-list

On Sun, Jan 23, 2005 at 01:07:30AM -0500, Will M. Farr wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I'm running 10.3.7 -- I don't think there's any newer version.  When I 
> run harmonic4 as follows:
> 
> time ./harmonic 1000000000
> 21.3005
> 
> real    1m3.764s
> user    1m0.590s
> sys     0m0.130s
> 
> the above is what I get.  I'm not sure why I'm not exactly 2x faster 
> than you, but there's plenty of things which could affect that.
> 
> Running the bytecode on my system gives:
> 
> time ./harmonic.bc 1000000000
> 21.3005
> 
> real    11m51.239s
> user    11m11.600s
> sys     0m0.940s
> 
> I would be pretty surprised to see the bytecode come even close to the 
> native code version --- are you sure about the numbers on your system?

No, not more!

I have used the wrong binary! :(

I thought I had the same names for the executables, after recompiling
them for the test, but the native-code had a different name and so I called
the same file twice! :(

Sorry, I'm really chaotic these days! :(

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance
@ 2005-01-16  9:57 Philippe Lelédy
  0 siblings, 0 replies; 14+ messages in thread
From: Philippe Lelédy @ 2005-01-16  9:57 UTC (permalink / raw)
  To: Caml-list

Xavier Leroy wrote:

>   done;
>    !sum +. 0.0;;
>
> The + 0.0 at the end is ugly but convinces ocamlopt that !sum is best
> kept unboxed during the loop.

Here are my times which show little difference w/ or w/o this hack

On 1.8 GHz PowerPC G5 (MacOS X 10.3.7, Objective Caml version 3.08.0)

./sumH4 1000000000  17.65s user 0.16s system 91% cpu 19.461 total
./sumH5 1000000000  16.17s user 0.11s system 91% cpu 17.702 total

On Intel(R) Pentium(R) 4 CPU 3.00GHz (Debian GNU/Linux, Objective Caml version 3.08.2)

./sumH4 1000000000  15,57s user 0,00s system 99% cpu 15,646 total
./sumH5 1000000000  15,45s user 0,00s system 99% cpu 15,480 total

Ph. L.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-01-23 15:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-13 15:53 Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Will M. Farr
2005-01-13 17:29 ` [Caml-list] " John Prevost
2005-01-13 19:01   ` Will M. Farr
2005-01-13 20:24     ` John Prevost
2005-01-13 20:50       ` Erik de Castro Lopo
2005-01-13 21:32         ` Erik de Castro Lopo
2005-01-15 11:55 ` Xavier Leroy
2005-01-15 15:49   ` Michal Moskal
2005-01-15 17:01   ` [Caml-list] [FP performance] Ocaml sums the harmonic series Christophe TROESTLER
2005-01-15 17:13   ` [Caml-list] Ocaml sums the harmonic series -- four ways, four benchmarks: floating point performance Yaron Minsky
2005-01-23  2:27 ` Oliver Bandel
2005-01-23  6:07   ` Will M. Farr
2005-01-23 15:18     ` Oliver Bandel
2005-01-16  9:57 Philippe Lelédy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).