From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_FAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 94440BC6A for ; Thu, 8 Feb 2007 16:16:33 +0100 (CET) Received: from smtp23.orange.fr (smtp23.orange.fr [193.252.22.30]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l18FGXbj006393 for ; Thu, 8 Feb 2007 16:16:33 +0100 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2341.orange.fr (SMTP Server) with ESMTP id F16011C000AE; Thu, 8 Feb 2007 16:16:32 +0100 (CET) Received: from [192.168.0.9] (ARouen-205-1-4-25.w80-11.abo.wanadoo.fr [80.11.106.25]) by mwinf2341.orange.fr (SMTP Server) with ESMTP id 335C01C0008C; Thu, 8 Feb 2007 16:16:32 +0100 (CET) X-ME-UUID: 20070208151632210.335C01C0008C@mwinf2341.orange.fr Message-ID: <45CB3ED0.9040200@univ-paris12.fr> Date: Thu, 08 Feb 2007 16:16:32 +0100 From: =?ISO-8859-1?Q?Fr=E9d=E9ric_Gava?= Reply-To: gava@univ-paris12.fr Organization: =?ISO-8859-1?Q?Universit=E9_de_Paris_12=2C_LACL?= User-Agent: Thunderbird 1.5.0.9 (X11/20070104) MIME-Version: 1.0 To: Xavier Leroy Cc: Jacques Garrigue , caml-list@yquem.inria.fr Subject: Re: [Caml-list] Multiplication of matrix in C and OCaml References: <45CA63F5.6020301@univ-paris12.fr> <20070208.111401.55511744.garrigue@math.nagoya-u.ac.jp> <45CAF3E2.7020807@univ-paris12.fr> <45CAFF5A.2020607@inria.fr> In-Reply-To: <45CAFF5A.2020607@inria.fr> Content-Type: multipart/mixed; boundary="------------050007050100010505030506" X-Miltered: at concorde with ID 45CB3ED1.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; gava:01 gava:01 ocaml:01 mult:01 gcc:01 speedup:01 ocaml:01 mult:01 -inline:01 matrices:01 matrices:01 iteration:01 ocamlopt:01 -unsafe:01 argv:01 X-Attachments: type="text/x-csrc" name="mult.c" name="mult.c" name="mult.ml" name="mult.ml" This is a multi-part message in MIME format. --------------050007050100010505030506 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit > Because, as Jacques told you already, your C code is wrong. "add" and > "mult" invoke undefined behaviors of C and therefore gcc feels free to > optimize these functions as no-ops at optimization levels 1 and above. > That's a major speedup, for sure. Why don't you check your code for > correctness first before drawing conclusions on performance? Sorry for the inconvenience and this stupid error: I am a very bad C programmer. But, I do not obtain the performance of Jacques Garrigue :-( I try to bench a parallel matrix multiplication algorithm and test the difference between C+MPI and OCaml+MPI (I try to prove that OCaml is efficient enought for high-performance, in this community, they largely prefer Fortran or C...)) a) with a "polymorphic" C program (using "multiply_complex_generic(i,complexe_add,complexe_mult,a,b,c);") time ./cmult 600 2 602 1 real 0m18.402s user 0m17.333s sys 0m0.044s b) for a monomorphic C programs (using "multiply_complex(i,a,b,c);"); time ./cmult 600 2 602 1 real 0m5.604s user 0m5.556s sys 0m0.036s c) for a polymorphic OCaml program (using "ignore(multiplication_polymorphic (!i) Complex.zero Complex.add Complex.mul a b c);") time ./ocamlmult 600 2 602 1 real 0m16.433s user 0m16.125s sys 0m0.068s d) for a monomorphic OCaml program (using "ignore(multiplication_monomorphic (!i) a b c);") time ./ocamlmult 600 2 602 1 real 0m15.460s user 0m15.345s sys 0m0.072s I win 1 second if I compile with "-inline 100". So I have not the "twice slower at most". thanks, Frédéric Gava ps: here I have test the programs ;-) --------------050007050100010505030506 Content-Type: text/x-csrc; name="mult.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="mult.c" /* param 1 = size of the matrices at the beginning param 2 = incrementation param 3 = size of the matrices at the end param 4 = number of iteration with gcc -O3 -o cmult mult.c */ #include #include #include typedef struct Complexe {double re; double img;} complexe; complexe zero; __inline__ complexe complexe_add(complexe a, complexe b) { return (complexe){a.re+b.re, a.img+b.img}; } __inline__ complexe complexe_mult(complexe x, complexe y) { return (complexe) {x.re*y.re-x.img*y.img, x.re*y.img+x.img*y.re}; } complexe random_complexe(void) { complexe c = {c.re=(double)(rand()%1000), c.img=(double)(rand()%1000)}; return c; } void multiply_complex(int n, complexe *a, complexe *b, complexe *c) { register int i,j,k; complexe tmp; for (i=0; i random_complex 1000.) let _ = Random.self_init(); let i = ref (int_of_string (Sys.argv).(1)) and pas=(int_of_string (Sys.argv).(2)) and maxi=(int_of_string (Sys.argv).(3)) and iter=(int_of_string (Sys.argv).(4)) in while !i