caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: Dmitry Bely <dmitry.bely@gmail.com>
Cc: Caml List <caml-list@inria.fr>
Subject: Re: [Caml-list] Ocamlopt x86-32 and SSE2
Date: Fri, 08 May 2009 12:21:29 +0200	[thread overview]
Message-ID: <4A0407A9.4000009@inria.fr> (raw)
In-Reply-To: <90823c940905050241y11f012e5xee8316e3e4337ff9@mail.gmail.com>

Dmitry Bely wrote:

> I see. Why I asked this: trying to improve floating-point performance
> on 32-bit x86 platform I have merged floating-point SSE2 code
> generator from amd64 ocamlopt back end to i386 one, making ia32sse2
> architecture. It also inlines sqrt() via -ffast-math flag and slightly
> optimizes emit_float_test (usually eliminates an extra jump) -
> features that are missed in the original amd64 code generator.

You just passed black belt in OCaml compiler hacking :-)

> Is this of any interest to anybody?

I'm definitely interested in the potential improvements to the amd64
code generator.

Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float
arithmetic does improve performance and fit ocamlopt's compilation
model much better than the current x87 float arithmetic, which is a
bit of a hack.  Several options can be considered:

1- Have an additional "ia32sse2" port of ocamlopt in parallel with the
   current "i386" port.

2- Declare pre-SSE2 processors obsolete and convert the current
   "i386" port to always use SSE2 float arithmetic.

3- Support both x87 and SSE2 float arithmetic within the same i386
   port, with a command-line option to activate SSE2, like gcc does.

I'm really not keen on approach 1.  We have too many ports (and
their variants for Windows/MSVC) already.  Moreover, I suspect
packagers would stick to the i386 port for compatibility with old
hardware, and most casual users would, too, out of lazyness, so this
hypothetical "ia32sse2" port would receive little testing.

Approach 2 is tempting for me because it would simplify the x86-32
code generator and remove some historical cruft.  The issue is that it
demands a processor that implements SSE2.  For a list of processors, see
  http://en.wikipedia.org/wiki/SSE2
As a rule of thumb, almost all desktop PC bought since 2004 has SSE2,
as well as almost all notebooks since 2006.  That should be OK for
professional users (it's nearly impossible to purchase maintenance
beyond 3 years, anyway) and serious hobbyists.  However, packagers are
going to be very unhappy: Debian still lists i486 as its bottom line;
for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
processor", meaning Pentium III.  All these processors lack SSE2
support.  Only MacOS X is SSE2-compatible from scratch.

Approach 3 is probably the best from a user's point of view.  But it's
going to complicate the code generator: the x87 cruft would still be
there, and new cruft would need to be added to support SSE2.  Code
compiled with the SSE2 flag could link with code compiled without,
provided the SSE2 registers are not used for parameter and result
passing.  But as Dmitry observed, this is already the case in the
current ocamlopt compiler.

Jean-Marc Eber:
>> But again, having better floating point performance (and
>> predictable behaviour, compared to the bytecode version) would be a
>> big plus for some applications.

Dmitry Bely:
> Don't quite understand what is "predictable behavior" - any generator
> should conform to specs. In my tests x87 and SSE2 backends show the
> same results (otherwise it would be called a bug).

You haven't tested enough :-).  The x87 backend keeps some intermediate
results in 80-bit float format, while the SSE2 backend (as well as all
other backends and the bytecode interpreter) compute everything in
64-bit format.  See David Monniaux's excellent tutorial:
  http://hal.archives-ouvertes.fr/hal-00128124/en/
Computing intermediate results in extended precision has pros and
cons, but my understanding is that the cons slightly outweigh the pros.

- Xavier Leroy


  parent reply	other threads:[~2009-05-08 10:21 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-28 19:36 Ocamlopt code generator question Dmitry Bely
     [not found] ` <m27i13tofi.fsf@Pythagorion.local.i-did-not-set--mail-host-address--so-tickle-me>
2009-04-29 16:50   ` Dmitry Bely
2009-04-29 20:04     ` Jeffrey Scofield
2009-05-05  9:24 ` [Caml-list] " Xavier Leroy
2009-05-05  9:41   ` Dmitry Bely
2009-05-05 14:15     ` Jean-Marc Eber
2009-05-05 14:58       ` Sylvain Le Gall
2009-05-05 15:21         ` [Caml-list] " David Allsopp
2009-05-05 15:59         ` Dmitry Bely
     [not found]           ` <4A006410.8000205@lexifi.com>
2009-05-05 16:26             ` Dmitry Bely
2009-05-05 15:14       ` [Caml-list] " Jon Harrop
2009-05-08 10:21     ` Xavier Leroy [this message]
2009-05-10 11:04       ` [Caml-list] Ocamlopt x86-32 and SSE2 David MENTRE
2009-05-11  2:43         ` Jon Harrop
2009-05-11  3:43         ` Stefan Monnier
2009-05-11  5:38           ` [Caml-list] " Jon Harrop
2009-05-10 23:12       ` [Caml-list] " Matteo Frigo
2009-05-11  2:45         ` Jon Harrop
2009-05-11  7:55       ` Dmitry Bely
     [not found] <20090509100004.353ADBC5C@yquem.inria.fr>
2009-05-09 11:38 ` CUOQ Pascal
2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
2009-05-10  2:16     ` Seo Sanghyeon
2009-05-10  3:50       ` Jon Harrop
2009-05-11  8:05         ` Dmitry Bely
2009-05-11  9:26           ` Jon Harrop
2009-05-11  8:43             ` Dmitry Bely
2009-05-11 13:47               ` Jon Harrop
2009-05-11  9:12             ` Andrey Riabushenko
2009-05-10  8:56     ` CUOQ Pascal
2009-05-10 14:47       ` [Caml-list] " Richard Jones
2009-05-10 19:25     ` Florian Weimer
     [not found] <20090511043120.976EBBC67@yquem.inria.fr>
2009-05-11  7:10 ` Pascal Cuoq
2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
2009-05-12 12:40     ` Richard Jones
2009-05-13 22:30     ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A0407A9.4000009@inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=dmitry.bely@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).