From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.8 required=5.0 tests=AWL,DNS_FROM_RFC_POST, SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 72E89BC37; Mon, 11 May 2009 09:55:43 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmQCAGd3B0rRVcipimdsb2JhbACWUT8BAQEKCQwHDwWlb4ERjVwBAwEDg3sF X-IronPort-AV: E=Sophos;i="4.38,431,1233529200"; d="scan'208";a="29054173" Received: from wf-out-1314.google.com ([209.85.200.169]) by mail1-smtp-roc.national.inria.fr with ESMTP; 11 May 2009 09:55:42 +0200 Received: by wf-out-1314.google.com with SMTP id 26so2568767wfd.0 for ; Mon, 11 May 2009 00:55:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=bDmu4QRom9NtWdpEBRvdxJM1S3GZ5IOXUaMjad1Q3BM=; b=JvvNZtX3CtToSQHu2gixs6yL3+sJ4GG8yBEYNR/8L2+G/JuxsOjbBYXnCMm08XEV1s Ba62qC8xZeeakHJVT8oUIGYlwH2Yrm6o6EU/Y/J6N5GAB8MtBO2pUg4iTvZimQHSzNF0 e24teDMYIK12COW8snrL1IZtvQcs9p/rn7QrU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=cy/7QduRCTvmf53+5zJx7nhPrzyo4N1RStAPQ/FIz77+2BhyLitYIT0ZIv6lgze25W c7WXNm/DXQSQRBJA0dKrp2AIr0Vie/lxC64KSOMP6bez4V4RIUghFtnkXQjl/Adz6Piz NtQ014iS4E11ejrA50C/WuZGdlUIteXnOMrHE= MIME-Version: 1.0 Received: by 10.142.245.17 with SMTP id s17mr2237703wfh.206.1242028541092; Mon, 11 May 2009 00:55:41 -0700 (PDT) In-Reply-To: <4A0407A9.4000009@inria.fr> References: <90823c940904281236x61204451nac149ee15b5df73a@mail.gmail.com> <4A0005C8.8010609@inria.fr> <90823c940905050241y11f012e5xee8316e3e4337ff9@mail.gmail.com> <4A0407A9.4000009@inria.fr> Date: Mon, 11 May 2009 11:55:41 +0400 Message-ID: <90823c940905110055y24e3589et89edd4779b41edb0@mail.gmail.com> Subject: Re: [Caml-list] Ocamlopt x86-32 and SSE2 From: Dmitry Bely To: Xavier Leroy , Caml List Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; ocamlopt:01 ocamlopt:01 sqrt:01 ocaml:01 compiler:01 compilation:01 model:01 command-line:01 gcc:01 variants:01 packagers:01 hypothetical:01 implements:01 wikipedia:01 wiki:01 On Fri, May 8, 2009 at 2:21 PM, Xavier Leroy wrote: >> I see. Why I asked this: trying to improve floating-point performance >> on 32-bit x86 platform I have merged floating-point SSE2 code >> generator from amd64 ocamlopt back end to i386 one, making ia32sse2 >> architecture. It also inlines sqrt() via -ffast-math flag and slightly >> optimizes emit_float_test (usually eliminates an extra jump) - >> features that are missed in the original amd64 code generator. > > You just passed black belt in OCaml compiler hacking :-) Thank you, sensei :-) >> Is this of any interest to anybody? > > I'm definitely interested in the potential improvements to the amd64 > code generator. > > Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float > arithmetic does improve performance and fit ocamlopt's compilation > model much better than the current x87 float arithmetic, which is a > bit of a hack. =A0Several options can be considered: > > 1- Have an additional "ia32sse2" port of ocamlopt in parallel with the > =A0 current "i386" port. > > 2- Declare pre-SSE2 processors obsolete and convert the current > =A0 "i386" port to always use SSE2 float arithmetic. > > 3- Support both x87 and SSE2 float arithmetic within the same i386 > =A0 port, with a command-line option to activate SSE2, like gcc does. > > I'm really not keen on approach 1. =A0We have too many ports (and > their variants for Windows/MSVC) already. =A0Moreover, I suspect > packagers would stick to the i386 port for compatibility with old > hardware, and most casual users would, too, out of lazyness, so this > hypothetical "ia32sse2" port would receive little testing. > > Approach 2 is tempting for me because it would simplify the x86-32 > code generator and remove some historical cruft. =A0The issue is that it > demands a processor that implements SSE2. =A0For a list of processors, se= e > =A0http://en.wikipedia.org/wiki/SSE2 > As a rule of thumb, almost all desktop PC bought since 2004 has SSE2, > as well as almost all notebooks since 2006. =A0That should be OK for > professional users (it's nearly impossible to purchase maintenance > beyond 3 years, anyway) and serious hobbyists. =A0However, packagers are > going to be very unhappy: Debian still lists i486 as its bottom line; > for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz > processor", meaning Pentium III. =A0All these processors lack SSE2 > support. =A0Only MacOS X is SSE2-compatible from scratch. > > Approach 3 is probably the best from a user's point of view. =A0But it's > going to complicate the code generator: the x87 cruft would still be > there, and new cruft would need to be added to support SSE2. =A0Code > compiled with the SSE2 flag could link with code compiled without, > provided the SSE2 registers are not used for parameter and result > passing. =A0But as Dmitry observed, this is already the case in the > current ocamlopt compiler. I am curious if passing unboxed floats is possible in the current Ocaml data model? As for proposed options - I tend to vote for #3 (and implement it if there is a consensus). Still there is a plenty of low-power/embedded x86 hardware that does not support SSE2. And one will be able to compare x87 and SSE2 backends performance to convince him/herself that the play really worths the candle :-) - Dmitry Bely