> Because, as Jacques told you already, your C code is wrong. "add" and > "mult" invoke undefined behaviors of C and therefore gcc feels free to > optimize these functions as no-ops at optimization levels 1 and above. > That's a major speedup, for sure. Why don't you check your code for > correctness first before drawing conclusions on performance? Sorry for the inconvenience and this stupid error: I am a very bad C programmer. But, I do not obtain the performance of Jacques Garrigue :-( I try to bench a parallel matrix multiplication algorithm and test the difference between C+MPI and OCaml+MPI (I try to prove that OCaml is efficient enought for high-performance, in this community, they largely prefer Fortran or C...)) a) with a "polymorphic" C program (using "multiply_complex_generic(i,complexe_add,complexe_mult,a,b,c);") time ./cmult 600 2 602 1 real 0m18.402s user 0m17.333s sys 0m0.044s b) for a monomorphic C programs (using "multiply_complex(i,a,b,c);"); time ./cmult 600 2 602 1 real 0m5.604s user 0m5.556s sys 0m0.036s c) for a polymorphic OCaml program (using "ignore(multiplication_polymorphic (!i) Complex.zero Complex.add Complex.mul a b c);") time ./ocamlmult 600 2 602 1 real 0m16.433s user 0m16.125s sys 0m0.068s d) for a monomorphic OCaml program (using "ignore(multiplication_monomorphic (!i) a b c);") time ./ocamlmult 600 2 602 1 real 0m15.460s user 0m15.345s sys 0m0.072s I win 1 second if I compile with "-inline 100". So I have not the "twice slower at most". thanks, Frédéric Gava ps: here I have test the programs ;-)