caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Ocamlopt x86-32 and SSE2
       [not found] <20090509100004.353ADBC5C@yquem.inria.fr>
@ 2009-05-09 11:38 ` CUOQ Pascal
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
  0 siblings, 1 reply; 16+ messages in thread
From: CUOQ Pascal @ 2009-05-09 11:38 UTC (permalink / raw)
  To: caml-list, caml-list

Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>2- Declare pre-SSE2 processors obsolete and convert the current
>   "i386" port to always use SSE2 float arithmetic.
>
>3- Support both x87 and SSE2 float arithmetic within the same i386
>   port, with a command-line option to activate SSE2, like gcc does.

As someone with somewhat of an obsession for keeping
obsolete computers in function as long as they are not broken,
I have to interject something.

I still have a functional Pentium 90 (granted, that's not
the newest computer that does not support SSE2, but
please hear me). I gave up the idea of bootstrapping
OCaml on it years ago because it has 16Mb of memory,
and that became insufficient around the time Camlp4 became
part of the distribution. I would have had either to modify
the compilation flow or cross-compile, both of which were
too much work for the meagre resulting cool factor.
Now, both the old and the new Camlp4 are
fine pieces of software that make use of
resources available nowadays to make things possible
that weren't before. I am not complaining. I am saying that
you have to be consistent in your requirements.

My father was using Debian on a 500MHz K6-3D that I had
somehow been able to upgrade with enough memory
to run one of the two popular desktops. He finally
upgraded to a new computer because he could
see the characters being displayed one by one in the
e-mail client. That, or the motherboard died. I can't
remember. It was serendipitous, anyway.

There are plenty of embedded processors with an x86
instruction set and no SSE2 around, but these are not in
the cool toys that we want to run OCaml on. The cool
toys have ARM processors.

My message is: I am one of the people who have the peculiar
mental illness that leads one to suggest a compatible option.

Well, I am not.

Take option 2 and run with it!

>However, packagers are
>going to be very unhappy: Debian still lists i486 as its bottom line;
>for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
>processor", meaning Pentium III.  All these processors lack SSE2
>support.  Only MacOS X is SSE2-compatible from scratch.

Only Linux distributions are a problem, if OCaml packages
are at risk of being rejected.

Just because Windows still works on old computers doesn't force
every program to do the same (flame bait: and I would add that
Windows' support for old computers is mostly unintentional).

In Linux distributions, is it completely forbidden to have packages
that will not work on the bottom line?
This is (I assume) Ocaml 3.12 that we are talking about, which
would land sometime in 2010 and arrive in binary distributions
that are scheduled to be released in 2011. Will Debian maintain
its delusion of supporting the i486 by that time?

Pascal


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-09 11:38 ` Ocamlopt x86-32 and SSE2 CUOQ Pascal
@ 2009-05-10  1:52   ` Goswin von Brederlow
  2009-05-10  2:16     ` Seo Sanghyeon
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Goswin von Brederlow @ 2009-05-10  1:52 UTC (permalink / raw)
  To: CUOQ Pascal; +Cc: caml-list

"CUOQ Pascal" <Pascal.CUOQ@cea.fr> writes:

> Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>>2- Declare pre-SSE2 processors obsolete and convert the current
>>   "i386" port to always use SSE2 float arithmetic.
>>
>>3- Support both x87 and SSE2 float arithmetic within the same i386
>>   port, with a command-line option to activate SSE2, like gcc does.
>...
> In Linux distributions, is it completely forbidden to have packages
> that will not work on the bottom line?
> This is (I assume) Ocaml 3.12 that we are talking about, which
> would land sometime in 2010 and arrive in binary distributions
> that are scheduled to be released in 2011. Will Debian maintain
> its delusion of supporting the i486 by that time?
>
> Pascal

As you said (in the deleted part) there are plenty of cpus without
SSE2 around and Debian will continue to support them. That does not
really mean i486 at 25MHz will be used but it is the common bottom
line that can easily be supported.

Having ocaml require SSE2 is quite unacceptable for someone with a Via
C7 cpu (they don't have SSE2, right?) Is it really that much work for
ocaml to use option 3?

MfG
        Goswin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
@ 2009-05-10  2:16     ` Seo Sanghyeon
  2009-05-10  3:50       ` Jon Harrop
  2009-05-10  8:56     ` CUOQ Pascal
  2009-05-10 19:25     ` Florian Weimer
  2 siblings, 1 reply; 16+ messages in thread
From: Seo Sanghyeon @ 2009-05-10  2:16 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: CUOQ Pascal, caml-list

2009/5/10 Goswin von Brederlow <goswin-v-b@web.de>:
> Having ocaml require SSE2 is quite unacceptable for someone with a Via
> C7 cpu (they don't have SSE2, right?) Is it really that much work for
> ocaml to use option 3?

Maybe not, but don't underestimate tiny inconveniences! Even if it is
tiny more work to support x87, it could be a difference of doing it and
not doing it.
http://lesswrong.com/lw/f1/beware_trivial_inconveniences/

-- 
Seo Sanghyeon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-10  2:16     ` Seo Sanghyeon
@ 2009-05-10  3:50       ` Jon Harrop
  2009-05-11  8:05         ` Dmitry Bely
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Harrop @ 2009-05-10  3:50 UTC (permalink / raw)
  To: caml-list

On Sunday 10 May 2009 03:16:49 Seo Sanghyeon wrote:
> 2009/5/10 Goswin von Brederlow <goswin-v-b@web.de>:
> > Having ocaml require SSE2 is quite unacceptable for someone with a Via
> > C7 cpu (they don't have SSE2, right?) Is it really that much work for
> > ocaml to use option 3?
>
> Maybe not, but don't underestimate tiny inconveniences! Even if it is
> tiny more work to support x87, it could be a difference of doing it and
> not doing it.
> http://lesswrong.com/lw/f1/beware_trivial_inconveniences/

If you want to avoid inconvenience, why not use LLVM to replace several of the 
existing backends?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Ocamlopt x86-32 and SSE2
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
  2009-05-10  2:16     ` Seo Sanghyeon
@ 2009-05-10  8:56     ` CUOQ Pascal
  2009-05-10 14:47       ` [Caml-list] " Richard Jones
  2009-05-10 19:25     ` Florian Weimer
  2 siblings, 1 reply; 16+ messages in thread
From: CUOQ Pascal @ 2009-05-10  8:56 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: caml-list

>That does not
>really mean i486 at 25MHz will be used but it is the common bottom
>line that can easily be supported.

My point is that you're not looking at the whole set of
requirements for OCaml and other existing Debian packages
when you look only at the processor's instruction set.

The way to keep old hardware running is to keep
it running old software. or, if you give me a second
to switch to my Bogart voice, "we will always have 3.11".

>Having ocaml require SSE2 is quite unacceptable for someone with a Via
>C7 cpu (they don't have SSE2, right?) 

According to http://en.wikipedia.org/wiki/SSE2, someone using a Via C7
should be fine.

Pascal


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-10  8:56     ` CUOQ Pascal
@ 2009-05-10 14:47       ` Richard Jones
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Jones @ 2009-05-10 14:47 UTC (permalink / raw)
  To: CUOQ Pascal; +Cc: Goswin von Brederlow, caml-list

On Sun, May 10, 2009 at 10:56:37AM +0200, CUOQ Pascal wrote:
> According to http://en.wikipedia.org/wiki/SSE2, someone using a Via C7
> should be fine.

AMD Geode then ...

$ grep -i flags /proc/cpuinfo 
flags		: fpu de pse tsc msr cx8 pge cmov mmx mmxext 3dnowext 3dnow up

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
  2009-05-10  2:16     ` Seo Sanghyeon
  2009-05-10  8:56     ` CUOQ Pascal
@ 2009-05-10 19:25     ` Florian Weimer
  2 siblings, 0 replies; 16+ messages in thread
From: Florian Weimer @ 2009-05-10 19:25 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: CUOQ Pascal, caml-list

* Goswin von Brederlow:

> Having ocaml require SSE2 is quite unacceptable for someone with a Via
> C7 cpu (they don't have SSE2, right?)

More problematic are AMD's K7 and some of their Sempron processors, I
think.  AMD introduced SSE2-less CPUs as late as 2004.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-10  3:50       ` Jon Harrop
@ 2009-05-11  8:05         ` Dmitry Bely
  2009-05-11  9:26           ` Jon Harrop
  0 siblings, 1 reply; 16+ messages in thread
From: Dmitry Bely @ 2009-05-11  8:05 UTC (permalink / raw)
  To: Caml List

On Sun, May 10, 2009 at 7:50 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
> On Sunday 10 May 2009 03:16:49 Seo Sanghyeon wrote:
>> 2009/5/10 Goswin von Brederlow <goswin-v-b@web.de>:
>> > Having ocaml require SSE2 is quite unacceptable for someone with a Via
>> > C7 cpu (they don't have SSE2, right?) Is it really that much work for
>> > ocaml to use option 3?
>>
>> Maybe not, but don't underestimate tiny inconveniences! Even if it is
>> tiny more work to support x87, it could be a difference of doing it and
>> not doing it.
>> http://lesswrong.com/lw/f1/beware_trivial_inconveniences/
>
> If you want to avoid inconvenience, why not use LLVM to replace several of the
> existing backends?

I think it would be the major code rewrite (if ever possible). Merging
SSE2 from amd64 into i386 code generator took about a day of my
efforts. How much time LLVM integration would require? If it is that
simple can you provide a proof-of-the-concept implementation?

- Dmitry Bely


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-11  9:26           ` Jon Harrop
@ 2009-05-11  8:43             ` Dmitry Bely
  2009-05-11 13:47               ` Jon Harrop
  2009-05-11  9:12             ` Andrey Riabushenko
  1 sibling, 1 reply; 16+ messages in thread
From: Dmitry Bely @ 2009-05-11  8:43 UTC (permalink / raw)
  To: Caml List

On Mon, May 11, 2009 at 1:26 PM, Jon Harrop <jon@ffconsultancy.com> wrote:
> On Monday 11 May 2009 09:05:08 Dmitry Bely wrote:
>> I think it would be the major code rewrite (if ever possible). Merging
>> SSE2 from amd64 into i386 code generator took about a day of my
>> efforts. How much time LLVM integration would require? If it is that
>> simple can you provide a proof-of-the-concept implementation?
>
> Well, I can provide a complete garbage collected VM. :-)
>
>  http://hlvm.forge.ocamlcore.org/

We are talking about a new backend to Ocaml compiler, aren't we?

> The hard part of writing an LLVM backend for ocamlopt is probably getting LLVM
> to generate code that is compatible with OCaml's GC, particularly the stack.
> However, I believe Gordon Henriksen already did this:
>
>  "Included in the pending LLVM garbage collection code generation
> changeset is an Ocaml frametable emitter." -
>  http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011527.html

So it's just pie in the sky. No working implementation has been
demonstrated since then. The answer to your "why not use LLVM to
replace several of the existing backends?" question is quite obvious.

- Dmitry Bely


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-11  9:26           ` Jon Harrop
  2009-05-11  8:43             ` Dmitry Bely
@ 2009-05-11  9:12             ` Andrey Riabushenko
  1 sibling, 0 replies; 16+ messages in thread
From: Andrey Riabushenko @ 2009-05-11  9:12 UTC (permalink / raw)
  To: caml-list

> Did any of the OCaml+LLVM student projects get funded in the end?

NO, Unfortunately. Not this time...


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-11  8:05         ` Dmitry Bely
@ 2009-05-11  9:26           ` Jon Harrop
  2009-05-11  8:43             ` Dmitry Bely
  2009-05-11  9:12             ` Andrey Riabushenko
  0 siblings, 2 replies; 16+ messages in thread
From: Jon Harrop @ 2009-05-11  9:26 UTC (permalink / raw)
  To: caml-list

On Monday 11 May 2009 09:05:08 Dmitry Bely wrote:
> I think it would be the major code rewrite (if ever possible). Merging
> SSE2 from amd64 into i386 code generator took about a day of my
> efforts. How much time LLVM integration would require? If it is that
> simple can you provide a proof-of-the-concept implementation?

Well, I can provide a complete garbage collected VM. :-)

  http://hlvm.forge.ocamlcore.org/

The hard part of writing an LLVM backend for ocamlopt is probably getting LLVM 
to generate code that is compatible with OCaml's GC, particularly the stack. 
However, I believe Gordon Henriksen already did this:

  "Included in the pending LLVM garbage collection code generation  
changeset is an Ocaml frametable emitter." -
  http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-November/011527.html

Unfortunately, I will not have any spare time until my next book is out...

Did any of the OCaml+LLVM student projects get funded in the end?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Caml-list] Ocamlopt x86-32 and SSE2
  2009-05-11  8:43             ` Dmitry Bely
@ 2009-05-11 13:47               ` Jon Harrop
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Harrop @ 2009-05-11 13:47 UTC (permalink / raw)
  To: caml-list

On Monday 11 May 2009 09:43:59 Dmitry Bely wrote:
> So it's just pie in the sky. No working implementation has been
> demonstrated since then.

The file "test/CodeGen/Generic/GC/simple_ocaml.ll" in the LLVM 2.5 source 
distribution contains the following test code for the OCaml-compatible 
frametable emitter:

  %struct.obj = type { i8*, %struct.obj* }
  
  define %struct.obj* @fun(%struct.obj* %head) gc "ocaml" {
  entry:
          %gcroot.0 = alloca i8*
          %gcroot.1 = alloca i8*
          
          call void @llvm.gcroot(i8** %gcroot.0, i8* null)
          call void @llvm.gcroot(i8** %gcroot.1, i8* null)
          
          %local.0 = bitcast i8** %gcroot.0 to %struct.obj**
          %local.1 = bitcast i8** %gcroot.1 to %struct.obj**
  
          store %struct.obj* %head, %struct.obj** %local.0
          br label %bb.loop
  bb.loop:
          %t0 = load %struct.obj** %local.0
          %t1 = getelementptr %struct.obj* %t0, i32 0, i32 1
          %t2 = bitcast %struct.obj* %t0 to i8*
          %t3 = bitcast %struct.obj** %t1 to i8**
          %t4 = call i8* @llvm.gcread(i8* %t2, i8** %t3)
          %t5 = bitcast i8* %t4 to %struct.obj*
          %t6 = icmp eq %struct.obj* %t5, null
          br i1 %t6, label %bb.loop, label %bb.end
  bb.end:
          %t7 = malloc %struct.obj
          store %struct.obj* %t7, %struct.obj** %local.1
          %t8 = bitcast %struct.obj* %t7 to i8*
          %t9 = load %struct.obj** %local.0
          %t10 = getelementptr %struct.obj* %t9, i32 0, i32 1
          %t11 = bitcast %struct.obj* %t9 to i8*
          %t12 = bitcast %struct.obj** %t10 to i8**
          call void @llvm.gcwrite(i8* %t8, i8* %t11, i8** %t12)
          ret %struct.obj* %t7
  }
  
  declare void @llvm.gcroot(i8** %value, i8* %tag)
  declare void @llvm.gcwrite(i8* %value, i8* %obj, i8** %field)
  declare i8* @llvm.gcread(i8* %obj, i8** %field)

Compiling this with:

  llvm-as <simple_ocaml.ll | llc

gives:

          .file	"<stdin>"
          .text
          .globl	caml<stdin>__code_begin
  caml<stdin>__code_begin:
          .data
          .globl	caml<stdin>__data_begin
  caml<stdin>__data_begin:
  
          .text
          .align	16
          .globl	fun
          .type	fun,@function
  fun:
  .Leh_func_begin1:
  .Llabel1:
          subl	$12, %esp
          movl	$0, 8(%esp)
          movl	$0, 4(%esp)
          movl	16(%esp), %eax
          movl	%eax, 8(%esp)
          .align	16
  .LBB1_1:	# bb.loop
          movl	8(%esp), %eax
          cmpl	$0, 4(%eax)
          je	.LBB1_1	# bb.loop
  .LBB1_2:	# bb.end
          movl	$8, (%esp)
          call	malloc
  .Llabel2:
          movl	%eax, 4(%esp)
          movl	8(%esp), %ecx
          movl	%eax, 4(%ecx)
          addl	$12, %esp
          ret
          .size	fun, .-fun
  .Leh_func_end1:
          .section	.eh_frame,"aw",@progbits
  .LEH_frame0:
  .Lsection_eh_frame:
  .Leh_frame_common:
          .long	.Leh_frame_common_end-.Leh_frame_common_begin
  .Leh_frame_common_begin:
          .long	0x0
          .byte	0x1
          .asciz	"zR"
          .uleb128	1
          .sleb128	-4
          .byte	0x8
          .uleb128	1
          .byte	0x1B
          .byte	0xC
          .uleb128	4
          .uleb128	4
          .byte	0x88
          .uleb128	1
          .align	4
  .Leh_frame_common_end:
  
  .Lfun.eh:
          .long	.Leh_frame_end1-.Leh_frame_begin1
  .Leh_frame_begin1:
          .long	.Leh_frame_begin1-.Leh_frame_common
          .long	.Leh_func_begin1-.
          .long	.Leh_func_end1-.Leh_func_begin1
          .uleb128	0
          .byte	0xE
          .uleb128	16
          .byte	0x4
          .long	.Llabel1-.Leh_func_begin1
          .byte	0xD
          .uleb128	4
          .align	4
  .Leh_frame_end1:
  
          .text
          .globl	caml<stdin>__code_end
  caml<stdin>__code_end:
          .data
          .globl	caml<stdin>__data_end
  caml<stdin>__data_end:
          .long	0
          .globl	caml<stdin>__frametable
  caml<stdin>__frametable:
          # live roots for fun
          .long	.Llabel2
          .short	0xC
          .short	0x2
          .word	8
          .word	4
          .align	4
          .section	.note.GNU-stack,"",@progbits

So perhaps it is worth a look.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-12 10:04     ` Sylvain Le Gall
@ 2009-05-25  8:23       ` Sylvain Le Gall
  0 siblings, 0 replies; 16+ messages in thread
From: Sylvain Le Gall @ 2009-05-25  8:23 UTC (permalink / raw)
  To: caml-list

On 12-05-2009, Sylvain Le Gall <sylvain@le-gall.net> wrote:
> On 12-05-2009, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>>
>> Sylvain Le Gall:
>>
>> To finish: I'm still very interested in hearing from packagers.  Does
>> Debian, for example, already have some packages that are SSE2-only?
>> Are these packages specially tagged so that the installer will refuse
>> to install them on pre-SSE2 hardware?  What's the party line?
>>
>
> Im my opinion, Debian will probably refuse to ship a package that only
> provide SSE2-only version (but I am talking from my point of view).
>

For those who are interested, a discussion just started about dropping 
pre-i686 architecture for Debian:
http://permalink.gmane.org/gmane.linux.debian.devel.kernel/47844

The first round of post seems clearly against this decision. The main
argument is that many school are using old pre-i686 hardware. 

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
@ 2009-05-12 10:04     ` Sylvain Le Gall
  2009-05-25  8:23       ` Sylvain Le Gall
  0 siblings, 1 reply; 16+ messages in thread
From: Sylvain Le Gall @ 2009-05-12 10:04 UTC (permalink / raw)
  To: caml-list

On 12-05-2009, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>
> Sylvain Le Gall:
>> If INRIA choose to switch to SSE2 there should be at least still a way
>> to compile on older architecture. Doesn't mean that INRIA need to keep
>> the old code generator, but should provide a simple emulation for it. In
>> this case, we will have good performance on new arch for float and we
>> will still be able to compile on old arch. 
>
> The least complicated way to preserve backward compatibility with
> pre-SSE2 hardware is to keep the existing x87 code generator and bolt
> the SSE2 generator on top of it, Frankenstein-style.  Well, either
> that, or rely on the kernel to trap unimplemented SSE2 instructions
> and emulate them in software.  This is theoretically possible but I'm
> pretty sure neither Linux nor Windows implement it.
>

I was thinking (if it is possible) to use simple "function call" for
doing float operation. This will be very inefficient, but will provide a
very simple compatible layer. 

>
> To finish: I'm still very interested in hearing from packagers.  Does
> Debian, for example, already have some packages that are SSE2-only?
> Are these packages specially tagged so that the installer will refuse
> to install them on pre-SSE2 hardware?  What's the party line?
>

The more obvious package I see, is the linux kernel or the libc6:
http://packages.debian.org/lenny/linux-image-2.6.26-2-486
http://packages.debian.org/lenny/linux-image-2.6.26-1-686-bigmem
http://packages.debian.org/lenny/libc6
http://packages.debian.org/lenny/libc6-i686

AFAIK, there is no way for the package manager to do a real difference
(no tag). However, the installer has some clue about which one to choose
and install the best one for linux and libc6. Once installed, it is
always updated in the good way, because the arch is embeded into the
package name.

I think linux and libc6 should be considered as exceptions, because they
really provide an important benefit for overall optimization.

For other package, if there is possible optimization, a version with and
without optimization is embedded into the package and chosen at runtime.
Example libavcodec provide i686 and i486 version:
http://packages.debian.org/sid/i386/libavcodec52/filelist

So in conclusion, there is always a "default" non SSE2 alternative for
package that can provide an optimized version. I don't know any package
that are SSE2-only.

Im my opinion, Debian will probably refuse to ship a package that only
provide SSE2-only version (but I am talking from my point of view).

Regards
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Ocamlopt x86-32 and SSE2
       [not found] <20090511043120.976EBBC67@yquem.inria.fr>
@ 2009-05-11  7:10 ` Pascal Cuoq
  2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
  0 siblings, 1 reply; 16+ messages in thread
From: Pascal Cuoq @ 2009-05-11  7:10 UTC (permalink / raw)
  To: caml-list

Here's an idea, I don't know if it is relevant, but it looks that
it could be a good compromise (option 2.5, if you will): how about
implementing floating-point operations as function calls
(the functions could be written in C and be part of the runtime library)
when the SSE2 instructions are not available? Is that simpler than
option 3?

Matteo Frigo <athena@fftw.org> wrote:
> Do you guys have any sort of empirical evidence that scalar SSE2  
> math is
> faster than plain old x87?

It's not speed I am after personally, but a correct implementation
of IEEE 754's round-to-nearest mode for doubles.
Also, the satisfying knowledge that the code of the compiler I use
is as tight is it can be and that I could understand it if I had to
some day.

Jon Harrop <jon@ffconsultancy.com> wrote:
> Note that you can use the same argument to justify not optimizing  
> the x86
> backend because power users should be using the (much more  
> performant) x64
> code gen.

I don't know where you get "much more performant" from.
For what I do, speed of floating-point operations is irrelevant, but
not the speed of the whole application. The whole application is
slightly slower (~10%) with the larger data words despite the improved
instruction set. Plus, memory is also a concern, and for users who
have less than 6GiB of memory, there are actually more addressable
data words in x86 mode.

Pascal


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Ocamlopt x86-32 and SSE2
  2009-05-10 11:04       ` David MENTRE
@ 2009-05-11  3:43         ` Stefan Monnier
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Monnier @ 2009-05-11  3:43 UTC (permalink / raw)
  To: caml-list

> As far as I know, one is using ocamlopt to improve performance.
> I can't think of any case where one would need native code running on
> pre-SS2 machines which are so outdated performance-wise.

You mean we should make slow machines even slower?


        Stefan


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-05-25  8:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20090509100004.353ADBC5C@yquem.inria.fr>
2009-05-09 11:38 ` Ocamlopt x86-32 and SSE2 CUOQ Pascal
2009-05-10  1:52   ` [Caml-list] " Goswin von Brederlow
2009-05-10  2:16     ` Seo Sanghyeon
2009-05-10  3:50       ` Jon Harrop
2009-05-11  8:05         ` Dmitry Bely
2009-05-11  9:26           ` Jon Harrop
2009-05-11  8:43             ` Dmitry Bely
2009-05-11 13:47               ` Jon Harrop
2009-05-11  9:12             ` Andrey Riabushenko
2009-05-10  8:56     ` CUOQ Pascal
2009-05-10 14:47       ` [Caml-list] " Richard Jones
2009-05-10 19:25     ` Florian Weimer
     [not found] <20090511043120.976EBBC67@yquem.inria.fr>
2009-05-11  7:10 ` Pascal Cuoq
2009-05-12  9:37   ` [Caml-list] " Xavier Leroy
2009-05-12 10:04     ` Sylvain Le Gall
2009-05-25  8:23       ` Sylvain Le Gall
2009-04-28 19:36 Ocamlopt code generator question Dmitry Bely
2009-05-05  9:24 ` [Caml-list] " Xavier Leroy
2009-05-05  9:41   ` Dmitry Bely
2009-05-08 10:21     ` [Caml-list] Ocamlopt x86-32 and SSE2 Xavier Leroy
2009-05-10 11:04       ` David MENTRE
2009-05-11  3:43         ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).