From: Dmitry Bely <dmitry.bely@gmail.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] testers wanted for experimental SSE2 back-end
Date: Tue, 23 Mar 2010 11:58:09 +0300 [thread overview]
Message-ID: <90823c941003230158n5080b46apc7c7462aead3372d@mail.gmail.com> (raw)
In-Reply-To: <4B967857.3070303@inria.fr>
On Tue, Mar 9, 2010 at 7:33 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
> Hello list,
>
> This is a call for testers concerning an experimental OCaml compiler
> back-end that uses SSE2 instructions for floating-point arithmetic.
> This code generation strategy was discussed before on this list, and I
> include below a summary in Q&A style.
>
> The new back-end is being considered for inclusion in the next major
> release (3.12), but performance testing done so far at INRIA and by
> Caml Consortium members is not conclusive. Additional results
> from members of this list would therefore be very welcome.
>
> We're not terribly interested in small (< 50 LOC), Shootout-style
> benchmarks, since their performance is very sensitive to code and data
> placement. However, if some of you have a sizeable (> 500 LOC) body
> of float-intensive Caml code, we'd be very interested to hear about
> the compared speed of the SSE2 back-end and the old back-end on your
> code.
I cannot provide any benchmark yet but even not taking into account
the better register organization there are at least two areas where
SSE2 can outperform x87 significantly.
1. Float to integer conversion
Is quite inefficient on x87 because you have to explicitly set and
restore rounding mode. Typical
let round x = truncate (x +. 0.5)
Translates to
_camlT__round_58:
sub esp, 8
L100:
fld L101
fadd REAL8 PTR [eax]
sub esp, 8
fnstcw [esp+4]
mov ax, [esp+4]
mov ah, 12
mov [esp], ax
fldcw [esp]
fistp DWORD PTR [esp]
mov eax, [esp]
fldcw [esp+4]
add esp, 8
lea eax, DWORD PTR [eax+eax+1]
add esp, 8
ret
but just to
_camlT__round_58:
L100:
movlpd xmm0, L101
addsd xmm0, REAL8 PTR [eax]
cvttsd2si eax, xmm0
lea eax, DWORD PTR [eax+eax+1]
ret
with SSE2.
2. Float compare
Does not set flags on x87 so
let fmin (x:float) y = if x < y then x else y
ends up with
_camlT__fmin_58:
sub esp, 8
L101:
mov ecx, eax
fld REAL8 PTR [ebx]
fld REAL8 PTR [ecx]
fcompp
fnstsw ax
and ah, 69
cmp ah, 1
jne L100
mov eax, ecx
add esp, 8
ret
L100:
mov eax, ebx
add esp, 8
ret
on SSE2 you just have
_camlT__fmin_58:
L101:
movlpd xmm1, REAL8 PTR [ebx]
movlpd xmm0, REAL8 PTR [eax]
comisd xmm1, xmm0
jbe L100
ret
L100:
mov eax, ebx
ret
As for SSE2 backend presented I have some thoughts regarding the code
(fast math functions via x87 are questionable, optimization of
floating compare etc.) Where to discuss that - just here or there is
some entry in Mantis?
- Dmitry Bely
next prev parent reply other threads:[~2010-03-23 8:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-09 16:33 Xavier Leroy
2010-03-10 19:25 ` [Caml-list] " Mike Lin
2010-03-10 20:51 ` Will M. Farr
2010-03-11 8:42 ` Xavier Leroy
2010-03-13 16:10 ` Gaëtan DUBREIL
2010-03-23 8:58 ` Dmitry Bely [this message]
2010-03-23 9:07 ` Daniel Bünzli
2010-03-23 9:22 ` Dmitry Bely
2010-03-29 16:49 ` Xavier Leroy
2010-03-29 18:58 ` Dmitry Bely
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=90823c941003230158n5080b46apc7c7462aead3372d@mail.gmail.com \
--to=dmitry.bely@gmail.com \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).