From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 48378BC69 for ; Fri, 9 Feb 2007 03:59:04 +0100 (CET) Received: from kurims.kurims.kyoto-u.ac.jp (kurims.kurims.kyoto-u.ac.jp [130.54.16.1]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l192x10V025017 for ; Fri, 9 Feb 2007 03:59:03 +0100 Received: from localhost (orion [130.54.16.5]) by kurims.kurims.kyoto-u.ac.jp (8.13.7/8.13.7) with ESMTP id l192wmTh017339; Fri, 9 Feb 2007 11:58:49 +0900 (JST) Date: Fri, 09 Feb 2007 11:58:42 +0900 (JST) Message-Id: <20070209.115842.106265091.garrigue@math.nagoya-u.ac.jp> To: gava@univ-paris12.fr Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Multiplication of matrix in C and OCaml From: Jacques Garrigue In-Reply-To: <45CB3ED0.9040200@univ-paris12.fr> References: <45CAF3E2.7020807@univ-paris12.fr> <45CAFF5A.2020607@inria.fr> <45CB3ED0.9040200@univ-paris12.fr> X-Mailer: Mew version 4.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Miltered: at discorde with ID 45CBE375.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 gava:01 gava:01 ocaml:01 mult:01 compiler:01 gcc:01 gcc:01 compiler:01 12.:98 4.1:98 4.1:98 polymorphic:01 polymorphic:01 caml-list:01 From: Fr=E9d=E9ric Gava > Sorry for the inconvenience and this stupid error: I am a very bad C = > programmer. > = > But, I do not obtain the performance of Jacques Garrigue :-( I try to= = > bench a parallel matrix multiplication algorithm and test the differe= nce = > between C+MPI and OCaml+MPI (I try to prove that OCaml is efficient = > enought for high-performance, in this community, they largely prefer = > Fortran or C...)) > = > = > a) with a "polymorphic" C program (using = > "multiply_complex_generic(i,complexe_add,complexe_mult,a,b,c);") > = > time ./cmult 600 2 602 1 > real 0m18.402s > user 0m17.333s > sys 0m0.044s > = > b) for a monomorphic C programs (using "multiply_complex(i,a,b,c);");= > = > time ./cmult 600 2 602 1 > real 0m5.604s > user 0m5.556s > sys 0m0.036s Interesting. It all depends on the compiler. With gcc 3.4, as provided in FreeBSD, I get almost no difference between your polymorphic and monomorphic versions. But if I switch to gcc 4.1, the monomorphic version is indeed much faster. Actually, what I get is: gcc 3.4 polymorphic: 15s gcc 4.1 polymorphic: 20s gcc 3.4 monomorphic: 15s gcc 4.1 monomorphic: 7s So it looks like gcc 4.1 is better for monomorphic code, but worse for function calls... Note that in my case, this is still within a factor 2 of ocaml (which is about the same as gcc 3.4). But your C compiler may be doing some other platform specific optimizations. The only way to know what is happening is to look at the= generated assembler. Jacques Garrigue