> Because, as Jacques told you already, your C code is wrong.  "add" and
> "mult" invoke undefined behaviors of C and therefore gcc feels free to
> optimize these functions as no-ops at optimization levels 1 and above.
> That's a major speedup, for sure.  Why don't you check your code for
> correctness first before drawing conclusions on performance?

Sorry for the inconvenience and this stupid error: I am a very bad C 
programmer.

But, I do not obtain the performance of Jacques Garrigue :-( I try to 
bench a parallel matrix multiplication algorithm and test the difference 
between C+MPI and OCaml+MPI (I try to prove that OCaml is efficient 
enought for high-performance, in this community, they largely prefer 
Fortran or C...))


a) with a "polymorphic" C program (using 
"multiply_complex_generic(i,complexe_add,complexe_mult,a,b,c);")

time ./cmult 600 2 602 1
real    0m18.402s
user    0m17.333s
sys     0m0.044s

b) for a monomorphic C programs (using "multiply_complex(i,a,b,c);");

time ./cmult 600 2 602 1
real    0m5.604s
user    0m5.556s
sys     0m0.036s


c) for a polymorphic OCaml program  (using 
"ignore(multiplication_polymorphic (!i) Complex.zero Complex.add 
Complex.mul a b c);")

time ./ocamlmult 600 2 602 1

real    0m16.433s
user    0m16.125s
sys     0m0.068s

d) for a monomorphic OCaml program  (using 
"ignore(multiplication_monomorphic (!i) a b c);")

time ./ocamlmult 600 2 602 1

real    0m15.460s
user    0m15.345s
sys     0m0.072s

I win 1 second if I compile with "-inline 100". So I have not the "twice 
slower at most".

thanks,
Frédéric Gava

ps: here I have test the programs ;-)