From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <athena@fftw.org>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: *
X-Spam-Status: No, score=1.4 required=5.0 tests=SPF_NEUTRAL 
	autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by yquem.inria.fr (Postfix) with ESMTP id 38E50BC37;
	Mon, 11 May 2009 01:13:01 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AiEFAFv8BkpFPT6XgWdsb2JhbACBUJVBAQEWIrVeB4N3BQ
X-IronPort-AV: E=Sophos;i="4.38,431,1233529200"; 
   d="scan'208";a="29039601"
Received: from fftw.vpsland.com (HELO fftw.org) ([69.61.62.151])
  by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/AES256-SHA; 11 May 2009 01:13:00 +0200
Received: from pool-96-237-2-46.bstnma.east.verizon.net ([96.237.2.46] helo=amd)
	by fftw.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.69)
	(envelope-from <athena@fftw.org>)
	id 1M3ICW-0006Vl-E4; Sun, 10 May 2009 19:12:56 -0400
Received: from athena by amd with local (Exim 4.69)
	(envelope-from <athena@fftw.org>)
	id 1M3ICP-00054l-BL; Sun, 10 May 2009 19:12:49 -0400
To: Xavier Leroy <Xavier.Leroy@inria.fr>
Cc: Dmitry Bely <dmitry.bely@gmail.com>,
	Caml List <caml-list@inria.fr>
Subject: Re: [Caml-list] Ocamlopt x86-32 and SSE2
References: <90823c940904281236x61204451nac149ee15b5df73a@mail.gmail.com>
	<4A0005C8.8010609@inria.fr>
	<90823c940905050241y11f012e5xee8316e3e4337ff9@mail.gmail.com>
	<4A0407A9.4000009@inria.fr>
From: Matteo Frigo <athena@fftw.org>
Date: Sun, 10 May 2009 19:12:49 -0400
In-Reply-To: <4A0407A9.4000009@inria.fr> (Xavier Leroy's message of "Fri\, 08 May 2009 12\:21\:29 +0200")
Message-ID: <87pregtuvi.fsf@fftw.org>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Spam: no; 0.00; ocamlopt:01 gcc:01 compilation:01 gcc:01 athena:98 caml-list:01 arithmetic:01 arithmetic:01 compiling:02 fftw:02 fftw:02 slower:02 numerical:03 vanilla:03 experiments:03 

Do you guys have any sort of empirical evidence that scalar SSE2 math is
faster than plain old x87?

I ask because every time I tried compiling FFTW with gcc -m32
-mfpmath=sse, the result has been invariably slower than the vanilla x87
compilation.  (I am talking about scalar arithmetic here.  FFTW also
supports SSE2 2-way vector arithmetic, which is of course faster.)

I also remember trying similar experiments with other numerical code in
the Pentium 4 dark ages, with similar results.  I don't see any reason
why this should be the case, and maybe this is just a problem of gcc,
but I don't think you should automatically assume that SSE2 math is
faster without running a few experiments first.

Regards,
Matteo Frigo