caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Ocamlopt x86-32 and SSE2
       [not found] <20090511043120.976EBBC67@yquem.inria.fr>
@ 2009-05-11  7:10 ` Pascal Cuoq
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
  0 siblings, 1 reply; 9+ messages in thread
From: Pascal Cuoq @ 2009-05-11  7:10 UTC (permalink / raw)
  To: caml-list

Here's an idea, I don't know if it is relevant, but it looks that
it could be a good compromise (option 2.5, if you will): how about
implementing floating-point operations as function calls
(the functions could be written in C and be part of the runtime library)
when the SSE2 instructions are not available? Is that simpler than
option 3?

Matteo Frigo <athena@fftw.org> wrote:
> Do you guys have any sort of empirical evidence that scalar SSE2  
> math is
> faster than plain old x87?

It's not speed I am after personally, but a correct implementation
of IEEE 754's round-to-nearest mode for doubles.
Also, the satisfying knowledge that the code of the compiler I use
is as tight is it can be and that I could understand it if I had to
some day.

Jon Harrop <jon@ffconsultancy.com> wrote:
> Note that you can use the same argument to justify not optimizing  
> the x86
> backend because power users should be using the (much more  
> performant) x64
> code gen.

I don't know where you get "much more performant" from.
For what I do, speed of floating-point operations is irrelevant, but
not the speed of the whole application. The whole application is
slightly slower (~10%) with the larger data words despite the improved
instruction set. Plus, memory is also a concern, and for users who
have less than 6GiB of memory, there are actually more addressable
data words in x86 mode.

Pascal


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-11  7:10 ` Ocamlopt x86-32 and SSE2 Pascal Cuoq
@ 2009-05-12  9:37   ` Xavier Leroy
  2009-05-12 10:04     ` Sylvain Le Gall
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Xavier Leroy @ 2009-05-12  9:37 UTC (permalink / raw)
  To: caml-list

This is an interesting discussion with many relevant points being
made.  Some comments:

Matteo Frigo:
> Do you guys have any sort of empirical evidence that scalar SSE2 math is
> faster than plain old x87?
> I ask because every time I tried compiling FFTW with gcc -m32
> -mfpmath=sse, the result has been invariably slower than the vanilla x87
> compilation.  (I am talking about scalar arithmetic here.  FFTW also
> supports SSE2 2-way vector arithmetic, which is of course faster.)

gcc does rather clever tricks with the x87 float stack and the fxch
instruction, making it look almost like a flat register set and
managing to expose some instruction-level parallelism despite the
dependencies on the top of the stack.  In contrast, ocamlopt uses the
x87 stack in a pedestrian, reverse-Polish-notation way, so the
benefits of having "real" float registers is bigger.

Using the experimental x86-sse2 port that I did in 2003 on a Core2
processor, I see speedups of 10 to 15% on my few standard float
benchmarks.  However, these benchmarks were written in such a way that
the generated x87 code isn't too awful.  It is easy to construct
examples where the SSE2 code is twice as fast as x87.

More generally, the SSE2 code generator is much more forgiving towards
changes in program style, and its performance characteristics are more
predictable than the x87 code generator.  For instance, manual
elimination of common subexpressions is almost always a win with SSE2
but quite often a loss with x87 ...

Pascal Cuoq:
> According to http://en.wikipedia.org/wiki/SSE2, someone using a Via C7
> should be fine.

Richard Jones:
> AMD Geode then ...

Apparently, recent versions of the Geode support SSE2 as well.
Low-power people love vector instruction sets, because it lets them do
common tasks like audio and video decoding more efficiently, ergo with
less energy.

Sylvain Le Gall:
> If INRIA choose to switch to SSE2 there should be at least still a way
> to compile on older architecture. Doesn't mean that INRIA need to keep
> the old code generator, but should provide a simple emulation for it. In
> this case, we will have good performance on new arch for float and we
> will still be able to compile on old arch. 

The least complicated way to preserve backward compatibility with
pre-SSE2 hardware is to keep the existing x87 code generator and bolt
the SSE2 generator on top of it, Frankenstein-style.  Well, either
that, or rely on the kernel to trap unimplemented SSE2 instructions
and emulate them in software.  This is theoretically possible but I'm
pretty sure neither Linux nor Windows implement it.

David Mentre:
> Regarding option 2, I assume that byte-code would still work on i386
> pre-SSE2 machines? So OCaml programs would still work on those machines.

You're correct, provided the bytecode interpreter isn't compiled in
SSE2 mode itself (see below for one reason one might want to do this).
However, packagers would still be unhappy about this: packaged OCaml
applications like Unison or Coq are usually compiled to native-code
(the additional speed is most welcome in the case of Coq...).
Therefore, packagers would have to choose between making these
applications SSE2-only or make them slower by compiling them to bytecode.

Dmitry Bely:
> [Reproducibility of results between bytecode and native]
> I wouldn't be so sure. Bytecode runtime is C compiler-dependent (that
> does use x87 for floating-point calculations), so rounding errors can
> lead to different results.

That's right: even though it stores all intermediate float results in
64-bit format, a bytecode interpreter compiled in default x87 mode still
exhibits double rounding anomalies.  One would have to compile it with
gcc in SSE2 mode (like MacOS X does by default) to have complete
reproducibility between bytecode and native.

> Floating point is always approximate...

I used to believe strongly in this viewpoint, but after discussion
with people who do static analysis or program proof over float
programs, I'm not so sure: static analysis and program proof are
difficult enough that one doesn't want to complicate them even further
to take extended-precision intermediate results and double rounding
into account...

To finish: I'm still very interested in hearing from packagers.  Does
Debian, for example, already have some packages that are SSE2-only?
Are these packages specially tagged so that the installer will refuse
to install them on pre-SSE2 hardware?  What's the party line?

- Xavier Leroy


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
@ 2009-05-12 10:04     ` Sylvain Le Gall
  2009-05-25  8:23       ` Sylvain Le Gall
  2009-05-12 12:40     ` [Caml-list] " Richard Jones
  2009-05-13 22:30     ` Florian Weimer
  2 siblings, 1 reply; 9+ messages in thread
From: Sylvain Le Gall @ 2009-05-12 10:04 UTC (permalink / raw)
  To: caml-list

On 12-05-2009, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>
> Sylvain Le Gall:
>> If INRIA choose to switch to SSE2 there should be at least still a way
>> to compile on older architecture. Doesn't mean that INRIA need to keep
>> the old code generator, but should provide a simple emulation for it. In
>> this case, we will have good performance on new arch for float and we
>> will still be able to compile on old arch. 
>
> The least complicated way to preserve backward compatibility with
> pre-SSE2 hardware is to keep the existing x87 code generator and bolt
> the SSE2 generator on top of it, Frankenstein-style.  Well, either
> that, or rely on the kernel to trap unimplemented SSE2 instructions
> and emulate them in software.  This is theoretically possible but I'm
> pretty sure neither Linux nor Windows implement it.
>

I was thinking (if it is possible) to use simple "function call" for
doing float operation. This will be very inefficient, but will provide a
very simple compatible layer. 

>
> To finish: I'm still very interested in hearing from packagers.  Does
> Debian, for example, already have some packages that are SSE2-only?
> Are these packages specially tagged so that the installer will refuse
> to install them on pre-SSE2 hardware?  What's the party line?
>

The more obvious package I see, is the linux kernel or the libc6:
http://packages.debian.org/lenny/linux-image-2.6.26-2-486
http://packages.debian.org/lenny/linux-image-2.6.26-1-686-bigmem
http://packages.debian.org/lenny/libc6
http://packages.debian.org/lenny/libc6-i686

AFAIK, there is no way for the package manager to do a real difference
(no tag). However, the installer has some clue about which one to choose
and install the best one for linux and libc6. Once installed, it is
always updated in the good way, because the arch is embeded into the
package name.

I think linux and libc6 should be considered as exceptions, because they
really provide an important benefit for overall optimization.

For other package, if there is possible optimization, a version with and
without optimization is embedded into the package and chosen at runtime.
Example libavcodec provide i686 and i486 version:
http://packages.debian.org/sid/i386/libavcodec52/filelist

So in conclusion, there is always a "default" non SSE2 alternative for
package that can provide an optimized version. I don't know any package
that are SSE2-only.

Im my opinion, Debian will probably refuse to ship a package that only
provide SSE2-only version (but I am talking from my point of view).

Regards
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
  2009-05-12 10:04     ` Sylvain Le Gall
@ 2009-05-12 12:40     ` Richard Jones
  2009-05-13 22:30     ` Florian Weimer
  2 siblings, 0 replies; 9+ messages in thread
From: Richard Jones @ 2009-05-12 12:40 UTC (permalink / raw)
  Cc: caml-list

On Tue, May 12, 2009 at 11:37:17AM +0200, Xavier Leroy wrote:
> Richard Jones:
> >AMD Geode then ...
> 
> Apparently, recent versions of the Geode support SSE2 as well.
> Low-power people love vector instruction sets, because it lets them do
> common tasks like audio and video decoding more efficiently, ergo with
> less energy.

I was mostly joking about this - don't worry :-)

> Well, either
> that, or rely on the kernel to trap unimplemented SSE2 instructions
> and emulate them in software.  This is theoretically possible but I'm
> pretty sure neither Linux nor Windows implement it.

<aside>
Even VMWare aren't doing this.  However, it's now relatively common to
have the CPU lie about the true capabilities of its instruction set
(by faking the return from CPUID, which in Linux means that
/proc/cpuinfo flags doesn't give the true picture).  This is done so
that guests can be migrated across machines in a cluser which have
different capabilities.  VMWare called this 'EVC clustering'.
</aside>

> To finish: I'm still very interested in hearing from packagers.  Does
> Debian, for example, already have some packages that are SSE2-only?
> Are these packages specially tagged so that the installer will refuse
> to install them on pre-SSE2 hardware?  What's the party line?

>From the Fedora p.o.v., there's no problem.  We'll just deprecate
OCaml on ancient pre-SSE2 hardware (for new distributions - they can
keep using RHEL 5 on older hardware).

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
  2009-05-12 10:04     ` Sylvain Le Gall
  2009-05-12 12:40     ` [Caml-list] " Richard Jones
@ 2009-05-13 22:30     ` Florian Weimer
  2 siblings, 0 replies; 9+ messages in thread
From: Florian Weimer @ 2009-05-13 22:30 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

* Xavier Leroy:

> To finish: I'm still very interested in hearing from packagers.  Does
> Debian, for example, already have some packages that are SSE2-only?

Not to my knowledge (it would be a bug).  Some packages use JITting or
dynamic shared objects to provide optimized code.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-12 10:04     ` Sylvain Le Gall
@ 2009-05-25  8:23       ` Sylvain Le Gall
  0 siblings, 0 replies; 9+ messages in thread
From: Sylvain Le Gall @ 2009-05-25  8:23 UTC (permalink / raw)
  To: caml-list

On 12-05-2009, Sylvain Le Gall <sylvain@le-gall.net> wrote:
> On 12-05-2009, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>>
>> Sylvain Le Gall:
>>
>> To finish: I'm still very interested in hearing from packagers.  Does
>> Debian, for example, already have some packages that are SSE2-only?
>> Are these packages specially tagged so that the installer will refuse
>> to install them on pre-SSE2 hardware?  What's the party line?
>>
>
> Im my opinion, Debian will probably refuse to ship a package that only
> provide SSE2-only version (but I am talking from my point of view).
>

For those who are interested, a discussion just started about dropping 
pre-i686 architecture for Debian:
http://permalink.gmane.org/gmane.linux.debian.devel.kernel/47844

The first round of post seems clearly against this decision. The main
argument is that many school are using old pre-i686 hardware. 

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-10 11:04       ` David MENTRE
@ 2009-05-11  3:43         ` Stefan Monnier
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2009-05-11  3:43 UTC (permalink / raw)
  To: caml-list

> As far as I know, one is using ocamlopt to improve performance.
> I can't think of any case where one would need native code running on
> pre-SS2 machines which are so outdated performance-wise.

You mean we should make slow machines even slower?


        Stefan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Ocamlopt x86-32 and SSE2
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
@ 2009-05-10  8:56     ` CUOQ Pascal
  0 siblings, 0 replies; 9+ messages in thread
From: CUOQ Pascal @ 2009-05-10  8:56 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: caml-list

>That does not
>really mean i486 at 25MHz will be used but it is the common bottom
>line that can easily be supported.

My point is that you're not looking at the whole set of
requirements for OCaml and other existing Debian packages
when you look only at the processor's instruction set.

The way to keep old hardware running is to keep
it running old software. or, if you give me a second
to switch to my Bogart voice, "we will always have 3.11".

>Having ocaml require SSE2 is quite unacceptable for someone with a Via
>C7 cpu (they don't have SSE2, right?) 

According to http://en.wikipedia.org/wiki/SSE2, someone using a Via C7
should be fine.

Pascal


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Ocamlopt x86-32 and SSE2
       [not found] <20090509100004.353ADBC5C@yquem.inria.fr>
@ 2009-05-09 11:38 ` CUOQ Pascal
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
  0 siblings, 1 reply; 9+ messages in thread
From: CUOQ Pascal @ 2009-05-09 11:38 UTC (permalink / raw)
  To: caml-list, caml-list

Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>2- Declare pre-SSE2 processors obsolete and convert the current
>   "i386" port to always use SSE2 float arithmetic.
>
>3- Support both x87 and SSE2 float arithmetic within the same i386
>   port, with a command-line option to activate SSE2, like gcc does.

As someone with somewhat of an obsession for keeping
obsolete computers in function as long as they are not broken,
I have to interject something.

I still have a functional Pentium 90 (granted, that's not
the newest computer that does not support SSE2, but
please hear me). I gave up the idea of bootstrapping
OCaml on it years ago because it has 16Mb of memory,
and that became insufficient around the time Camlp4 became
part of the distribution. I would have had either to modify
the compilation flow or cross-compile, both of which were
too much work for the meagre resulting cool factor.
Now, both the old and the new Camlp4 are
fine pieces of software that make use of
resources available nowadays to make things possible
that weren't before. I am not complaining. I am saying that
you have to be consistent in your requirements.

My father was using Debian on a 500MHz K6-3D that I had
somehow been able to upgrade with enough memory
to run one of the two popular desktops. He finally
upgraded to a new computer because he could
see the characters being displayed one by one in the
e-mail client. That, or the motherboard died. I can't
remember. It was serendipitous, anyway.

There are plenty of embedded processors with an x86
instruction set and no SSE2 around, but these are not in
the cool toys that we want to run OCaml on. The cool
toys have ARM processors.

My message is: I am one of the people who have the peculiar
mental illness that leads one to suggest a compatible option.

Well, I am not.

Take option 2 and run with it!

>However, packagers are
>going to be very unhappy: Debian still lists i486 as its bottom line;
>for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
>processor", meaning Pentium III.  All these processors lack SSE2
>support.  Only MacOS X is SSE2-compatible from scratch.

Only Linux distributions are a problem, if OCaml packages
are at risk of being rejected.

Just because Windows still works on old computers doesn't force
every program to do the same (flame bait: and I would add that
Windows' support for old computers is mostly unintentional).

In Linux distributions, is it completely forbidden to have packages
that will not work on the bottom line?
This is (I assume) Ocaml 3.12 that we are talking about, which
would land sometime in 2010 and arrive in binary distributions
that are scheduled to be released in 2011. Will Debian maintain
its delusion of supporting the i486 by that time?

Pascal


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-05-25  8:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20090511043120.976EBBC67@yquem.inria.fr>
2009-05-11  7:10 ` Ocamlopt x86-32 and SSE2 Pascal Cuoq
2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
2009-05-12 10:04     ` Sylvain Le Gall
2009-05-25  8:23       ` Sylvain Le Gall
2009-05-12 12:40     ` [Caml-list] " Richard Jones
2009-05-13 22:30     ` Florian Weimer
     [not found] <20090509100004.353ADBC5C@yquem.inria.fr>
2009-05-09 11:38 ` CUOQ Pascal
2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
2009-05-10  8:56     ` CUOQ Pascal
2009-04-28 19:36 Ocamlopt code generator question Dmitry Bely
2009-05-05  9:24 ` [Caml-list] " Xavier Leroy
2009-05-05  9:41   ` Dmitry Bely
2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
2009-05-10 11:04       ` David MENTRE
2009-05-11  3:43         ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).