From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.4 required=5.0 tests=SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 38E50BC37; Mon, 11 May 2009 01:13:01 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiEFAFv8BkpFPT6XgWdsb2JhbACBUJVBAQEWIrVeB4N3BQ X-IronPort-AV: E=Sophos;i="4.38,431,1233529200"; d="scan'208";a="29039601" Received: from fftw.vpsland.com (HELO fftw.org) ([69.61.62.151]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/AES256-SHA; 11 May 2009 01:13:00 +0200 Received: from pool-96-237-2-46.bstnma.east.verizon.net ([96.237.2.46] helo=amd) by fftw.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1M3ICW-0006Vl-E4; Sun, 10 May 2009 19:12:56 -0400 Received: from athena by amd with local (Exim 4.69) (envelope-from ) id 1M3ICP-00054l-BL; Sun, 10 May 2009 19:12:49 -0400 To: Xavier Leroy Cc: Dmitry Bely , Caml List Subject: Re: [Caml-list] Ocamlopt x86-32 and SSE2 References: <90823c940904281236x61204451nac149ee15b5df73a@mail.gmail.com> <4A0005C8.8010609@inria.fr> <90823c940905050241y11f012e5xee8316e3e4337ff9@mail.gmail.com> <4A0407A9.4000009@inria.fr> From: Matteo Frigo Date: Sun, 10 May 2009 19:12:49 -0400 In-Reply-To: <4A0407A9.4000009@inria.fr> (Xavier Leroy's message of "Fri\, 08 May 2009 12\:21\:29 +0200") Message-ID: <87pregtuvi.fsf@fftw.org> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam: no; 0.00; ocamlopt:01 gcc:01 compilation:01 gcc:01 athena:98 caml-list:01 arithmetic:01 arithmetic:01 compiling:02 fftw:02 fftw:02 slower:02 numerical:03 vanilla:03 experiments:03 Do you guys have any sort of empirical evidence that scalar SSE2 math is faster than plain old x87? I ask because every time I tried compiling FFTW with gcc -m32 -mfpmath=sse, the result has been invariably slower than the vanilla x87 compilation. (I am talking about scalar arithmetic here. FFTW also supports SSE2 2-way vector arithmetic, which is of course faster.) I also remember trying similar experiments with other numerical code in the Pentium 4 dark ages, with similar results. I don't see any reason why this should be the case, and maybe this is just a problem of gcc, but I don't think you should automatically assume that SSE2 math is faster without running a few experiments first. Regards, Matteo Frigo