From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 4D617BC69 for ; Thu, 8 Feb 2007 03:14:13 +0100 (CET) Received: from kurims.kurims.kyoto-u.ac.jp (kurims.kurims.kyoto-u.ac.jp [130.54.16.1]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l182EAnc006472 for ; Thu, 8 Feb 2007 03:14:12 +0100 Received: from localhost (orion [130.54.16.5]) by kurims.kurims.kyoto-u.ac.jp (8.13.7/8.13.7) with ESMTP id l182E6gD002761; Thu, 8 Feb 2007 11:14:06 +0900 (JST) Date: Thu, 08 Feb 2007 11:14:01 +0900 (JST) Message-Id: <20070208.111401.55511744.garrigue@math.nagoya-u.ac.jp> To: gava@univ-paris12.fr Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Multiplication of matrix in C and OCaml From: Jacques Garrigue In-Reply-To: <45CA63F5.6020301@univ-paris12.fr> References: <45CA63F5.6020301@univ-paris12.fr> X-Mailer: Mew version 4.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Miltered: at concorde with ID 45CA8772.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 gava:01 gava:01 ocaml:01 -unsafe:01 multiplying:01 matrices:01 ocamlopt:01 -unsafe:01 ocamlopt:01 -inline:01 gcc:01 gcc:01 globl:01 globl:01 From: Fr=E9d=E9ric Gava > I compared multiplication of matrix in C and OCaml and I was a little= = > surprise to see that the following C code (using -O2) is 8 time faste= r = > than the OCaml one (even with -unsafe). > = > Anybody have an idea to optimize my OCaml code or know why is there t= his = > "big" difference ? > > ps: the difference is the same even when I use a non-polymorphic = > multiplication This is not what I see: Multiplying 1000 times two 50x50 matrices on a Pentium M 1.8G, using your code. ocamlopt -unsafe polymorphic: 5.4s ocamlopt -unsafe -inline 100 polymorphic: 5.0s ocamlopt -unsafe monomorphic: 4.4s gcc polymorphic: 6.5s gcc monomorphic: 5.8s gcc -O3 monomorphic: 4.7s So, actually it seems that ocamlopt is even faster than gcc in many cases. Now, I also found a strange result: gcc -O3 monomorphic: 1.4s So I looked at the generated assembler: .globl add .type add, @function add: pushl %ebp movl %esp, %ebp subl $56, %esp leal -56(%ebp), %eax leave ret .size add, .-add .p2align 2,,3 .globl mult .type mult, @function mult: pushl %ebp movl %esp, %ebp subl $56, %esp leal -56(%ebp), %eax leave ret It seems that returning a local value is not well-defined, and gcc just decides that this value being unused inside the function, there is no need to do any computation! -----------------------------------------------------------------------= ---- Jacques Garrigue Nagoya University garrigue at math.nagoya-u.a= c.jp JG