From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id E03D4BC37 for ; Mon, 11 May 2009 03:38:06 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ai0CAJ0eB0rUnwckdWdsb2JhbACCH5RwAQwKCQkRBLVag34F X-IronPort-AV: E=Sophos;i="4.40,325,1238968800"; d="scan'208";a="25908310" Received: from relay.ptn-ipout02.plus.net ([212.159.7.36]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 11 May 2009 03:38:06 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlsFABoeB0rUnw4R/2dsb2JhbACCH8sIg34F Received: from pih-relay04.plus.net ([212.159.14.17]) by relay.ptn-ipout02.plus.net with ESMTP; 11 May 2009 02:38:06 +0100 Received: from [87.112.229.53] (helo=leper.local) by pih-relay04.plus.net with esmtp (Exim) id 1M3KSz-0000Lw-Sy; Mon, 11 May 2009 02:38:06 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr, Matteo Frigo Subject: Re: [Caml-list] Ocamlopt x86-32 and SSE2 Date: Mon, 11 May 2009 03:45:20 +0100 User-Agent: KMail/1.9.9 References: <90823c940904281236x61204451nac149ee15b5df73a@mail.gmail.com> <4A0407A9.4000009@inria.fr> <87pregtuvi.fsf@fftw.org> In-Reply-To: <87pregtuvi.fsf@fftw.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905110345.20272.jon@ffconsultancy.com> X-Plusnet-Relay: 69ee3623acada60c2069414834e1e4cb X-Spam: no; 0.00; ocamlopt:01 ocamlopt:01 gcc:01 compilation:01 gcc:01 2009:98 frog:98 wrote:01 caml-list:01 arithmetic:01 arithmetic:01 compiling:02 motivation:02 fftw:02 fftw:02 On Monday 11 May 2009 00:12:49 Matteo Frigo wrote: > Do you guys have any sort of empirical evidence that scalar SSE2 math is > faster than plain old x87? I believe the motivation is to make good performance tractible in ocamlopt so it is more about the ease of code generation rather than the inherent performance characteristics of the two approaches. > I ask because every time I tried compiling FFTW with gcc -m32 > -mfpmath=sse, the result has been invariably slower than the vanilla x87 > compilation. (I am talking about scalar arithmetic here. FFTW also > supports SSE2 2-way vector arithmetic, which is of course faster.) > > I also remember trying similar experiments with other numerical code in > the Pentium 4 dark ages, with similar results. I don't see any reason > why this should be the case, and maybe this is just a problem of gcc, > but I don't think you should automatically assume that SSE2 math is > faster without running a few experiments first. As I understand it, this is very much a problem with ocamlopt and not with gcc. Specifically, floating point code compiled by ocamlopt on x86 gives mediocre performance for unknown reasons. Hence there is a desire to use more modern solutions that simplify the generation of performant code. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e