From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=AWL,NO_REAL_NAME autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 2840CBC69 for ; Fri, 9 Feb 2007 10:01:10 +0100 (CET) Received: from server2.thinkcrime.de (server2.thinkcrime.de [213.133.110.149]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l199197W022839 for ; Fri, 9 Feb 2007 10:01:09 +0100 Received: from hod-sarge-2005-10.lan.m-e-leypold.de (dslb-088-072-219-206.pools.arcor-ip.net [88.72.219.206]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by server2.thinkcrime.de (Postfix) with ESMTP id DEF55488032 for ; Fri, 9 Feb 2007 10:01:09 +0100 (CET) Received: by hod-sarge-2005-10.lan.m-e-leypold.de (Postfix, from userid 1003) id 877A9376C1; Fri, 9 Feb 2007 10:06:18 +0100 (CET) To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Multiplication of matrix in C and OCaml References: <45CAF3E2.7020807@univ-paris12.fr> <45CAFF5A.2020607@inria.fr> <45CB3ED0.9040200@univ-paris12.fr> <20070209.115842.106265091.garrigue@math.nagoya-u.ac.jp> Organization: Leypold, Software-Dienstleistungen und -Beratung From: ls-ocaml-developer-2006@m-e-leypold.de Date: Fri, 09 Feb 2007 10:06:18 +0100 In-Reply-To: <20070209.115842.106265091.garrigue@math.nagoya-u.ac.jp> (Jacques Garrigue's message of "Fri, 09 Feb 2007 11:58:42 +0900 (JST)") Message-ID: User-Agent: Some cool user agent (SCUG) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Miltered: at discorde with ID 45CC3855.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 gava:01 gava:01 ocaml:01 mult:01 compiler:01 gcc:01 gcc:01 compiler:01 matrices:01 markus:01 12.:98 4.1:98 4.1:98 ditch:98 Jacques Garrigue writes: > From: Fr=E9d=E9ric Gava > >> Sorry for the inconvenience and this stupid error: I am a very bad C=20 >> programmer. >>=20 >> But, I do not obtain the performance of Jacques Garrigue :-( I try to=20 >> bench a parallel matrix multiplication algorithm and test the difference= =20 >> between C+MPI and OCaml+MPI (I try to prove that OCaml is efficient=20 >> enought for high-performance, in this community, they largely prefer=20 >> Fortran or C...)) >>=20 >>=20 >> a) with a "polymorphic" C program (using=20 >> "multiply_complex_generic(i,complexe_add,complexe_mult,a,b,c);") >>=20 >> time ./cmult 600 2 602 1 >> real 0m18.402s >> user 0m17.333s >> sys 0m0.044s >>=20 >> b) for a monomorphic C programs (using "multiply_complex(i,a,b,c);"); >>=20 >> time ./cmult 600 2 602 1 >> real 0m5.604s >> user 0m5.556s >> sys 0m0.036s > > Interesting. It all depends on the compiler. > With gcc 3.4, as provided in FreeBSD, I get almost no difference > between your polymorphic and monomorphic versions. But if I switch to > gcc 4.1, the monomorphic version is indeed much faster. Actually, what > I get is: > gcc 3.4 polymorphic: 15s > gcc 4.1 polymorphic: 20s > gcc 3.4 monomorphic: 15s > gcc 4.1 monomorphic: 7s > So it looks like gcc 4.1 is better for monomorphic code, but worse for > function calls... > Note that in my case, this is still within a factor 2 of ocaml (which > is about the same as gcc 3.4). > But your C compiler may be doing some other platform specific > optimizations. The only way to know what is happening is to look at the > generated assembler. I'm just wondering about that. All the data produced during the matrix-multiplication is AFAICS not used for anything. So I wouldn't exclude that in the case of monomorphic code gcc 4.1 is just deciding to ditch most of the actual work after doing some data flow analysis. Gcc 4.1 is (I think) known to do some aggressive optimizations. I'd feel better if the code is benchmarked in a way that the result of the multiplication is output to a file and to subtract the constant contribution of that to the run time that the time is measured for various problem sizes (number of matrices). One would get a linear dependency t(n) =3D C + K*n and fitting a straight line against the data points could obtain K to compare the efficiency of C (under various compiler versions and options) and Ocaml against each other without having to worry about no-op optimizations or constant startup costs (like load times and run time initialization: which might be a bit higher with Ocaml, though certainly not in the order of seconds). Comparing the output to expected results would also help to ensure that the code is correct which I still find difficult to assert at first glance. Regard -- Markus