caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Dmitry Bely <dmitry.bely@gmail.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] testers wanted for experimental SSE2 back-end
Date: Tue, 23 Mar 2010 11:58:09 +0300	[thread overview]
Message-ID: <90823c941003230158n5080b46apc7c7462aead3372d@mail.gmail.com> (raw)
In-Reply-To: <4B967857.3070303@inria.fr>

On Tue, Mar 9, 2010 at 7:33 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
> Hello list,
>
> This is a call for testers concerning an experimental OCaml compiler
> back-end that uses SSE2 instructions for floating-point arithmetic.
> This code generation strategy was discussed before on this list, and I
> include below a summary in Q&A style.
>
> The new back-end is being considered for inclusion in the next major
> release (3.12), but performance testing done so far at INRIA and by
> Caml Consortium members is not conclusive.  Additional results
> from members of this list would therefore be very welcome.
>
> We're not terribly interested in small (< 50 LOC), Shootout-style
> benchmarks, since their performance is very sensitive to code and data
> placement.  However, if some of you have a sizeable (> 500 LOC) body
> of float-intensive Caml code, we'd be very interested to hear about
> the compared speed of the SSE2 back-end and the old back-end on your
> code.

I cannot provide any benchmark yet but even not taking into account
the better register organization there are at least two areas where
SSE2 can outperform x87 significantly.

1. Float to integer conversion
Is quite inefficient on x87 because you have to explicitly set and
restore rounding mode. Typical

let round x = truncate (x +. 0.5)

Translates to

_camlT__round_58:
	sub	esp, 8
L100:
	fld	L101
	fadd	REAL8 PTR [eax]
	sub	esp, 8
	fnstcw	[esp+4]
	mov	ax, [esp+4]
	mov	ah, 12
	mov	[esp], ax
	fldcw	[esp]
	fistp	DWORD PTR [esp]
	mov	eax, [esp]
	fldcw	[esp+4]
	add	esp, 8
	lea	eax, DWORD PTR [eax+eax+1]
	add    esp, 8
	ret

but just to

_camlT__round_58:
L100:
	movlpd	xmm0, L101
	addsd	xmm0, REAL8 PTR [eax]
	cvttsd2si	eax, xmm0
	lea	eax, DWORD PTR [eax+eax+1]
	ret

with SSE2.

2. Float compare
Does not set flags on x87 so

let fmin (x:float) y = if x < y then x else y

ends up with

_camlT__fmin_58:
	sub	esp, 8
L101:
	mov	ecx, eax
	fld	REAL8 PTR [ebx]
	fld	REAL8 PTR [ecx]
	fcompp
	fnstsw	ax
	and	ah, 69
	cmp	ah, 1
	jne	L100
	mov	eax, ecx
	add    esp, 8
	ret
L100:
	mov	eax, ebx
	add    esp, 8
	ret

on SSE2 you just have

_camlT__fmin_58:
L101:
	movlpd	xmm1, REAL8 PTR [ebx]
	movlpd	xmm0, REAL8 PTR [eax]
	comisd	xmm1, xmm0
	jbe	L100
	ret
L100:
	mov	eax, ebx
	ret

As for SSE2 backend presented I have some thoughts regarding the code
(fast math functions via x87 are questionable, optimization of
floating compare etc.) Where to discuss that - just here or there is
some entry in Mantis?

- Dmitry Bely


  parent reply	other threads:[~2010-03-23  8:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-09 16:33 Xavier Leroy
2010-03-10 19:25 ` [Caml-list] " Mike Lin
2010-03-10 20:51   ` Will M. Farr
2010-03-11  8:42   ` Xavier Leroy
2010-03-13 16:10 ` Gaëtan DUBREIL
2010-03-23  8:58 ` Dmitry Bely [this message]
2010-03-23  9:07   ` Daniel Bünzli
2010-03-23  9:22     ` Dmitry Bely
2010-03-29 16:49   ` Xavier Leroy
2010-03-29 18:58     ` Dmitry Bely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=90823c941003230158n5080b46apc7c7462aead3372d@mail.gmail.com \
    --to=dmitry.bely@gmail.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).