Re: [Caml-list] Need for a built in round_to_int function

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Robert Roessler <roessler@rftp.com>
To: Erik de Castro Lopo <ocaml-erikd@mega-nerd.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Need for a built in round_to_int function
Date: Sun, 20 Feb 2005 16:00:53 -0800	[thread overview]
Message-ID: <421924B5.6030108@rftp.com> (raw)
In-Reply-To: <20050221072255.29055ee4.ocaml-erikd@mega-nerd.com>

Erik de Castro Lopo wrote:

> I am about to port some code from C to O'caml. This code uses the 
> C99 function :
> 
>     long int lrint (double d) ;
> 
> which performs rounding on the double and then converts that to
> a long int.
> 
> In O'caml the only option seems to be:
> 
>     let round_to_int f = int_of_float (f +. 0.5) ;;
> 
> The problem is that this code on i386 produces really slow code:
> 
>     804b385:    dd 44 98 fc        fldl   0xfffffffc(%eax,%ebx,4)
>     804b389:    de c1              faddp  %st,%st(1)
>     804b38b:    83 ec 08           sub    $0x8,%esp
>     804b38e:    d9 7c 24 04        fnstcw 0x4(%esp)
>     804b392:    66 8b 44 24 04     mov    0x4(%esp),%ax
>     804b397:    b4 0c              mov    $0xc,%ah
>     804b399:    66 89 44 24 00     mov    %ax,0x0(%esp)
>     804b39e:    d9 6c 24 00        fldcw  0x0(%esp)
>     804b3a2:    db 1c 24           fistpl (%esp)
>     804b3a5:    8b 04 24           mov    (%esp),%eax
>     804b3a8:    d9 6c 24 04        fldcw  0x4(%esp)
>     804b3ac:    83 c4 08           add    $0x8,%esp
> 
> The killer here is the two fldcw (floating point load control word)
> instructions, around the fistpl (which actually does the float to int 
> conversion). Loading the FP control work causes a flush of the FPU
> pipeline. In code with a lot of floating point code interspersed
> with a round to int, there can be a significant slow down due to
> the fldcw instructions.

I will preface this by a Slashdot-like "IANANA" (I Am Not A Numerical 
Analyst).

The above approach is more or less what you expect if you (as a 
compiler code generator) a) want to do rounding following C/C++ 
standards ("Truncate (toward 0)"), and b) make no assumption regarding 
the state of the IEEE hardware rounding setting...

> The lrint function in C, replaces all the above with one fistpl
> and a single mov instruction and leaves the floating point
> control word intact. In C code that moved from:
> 
>     (int) floor (f + 0.5)
> 
> to
>     lrintf (f)
> 
> I have seen an up to 4 fold increase in speed.

You, on the other hand, are willing to make an assumption regarding 
the hardware rounding mode - [presumably] that it is set to the 
power-on default of "Round to nearest, or to even if equidistant", 
which may not be unreasonable - it just needs to be explicit that this 
*is* the assumption, and that you have a way of verifying (or at least 
reason to believe) that other software components in your app's 
environment are not invalidating this assumption.

The fact that the default hardware rounding mode does NOT match "(int) 
floor (f + 0.5)" should also be mentioned... the "+ 0.5" attempts to 
do what the hardware would call "Round up (toward +infinity)" while 
the "floor" would match the "Round down (toward -infinity)" mode. 
Combining them does not equate to "Round to nearest, or to even if 
equidistant". :)

In case it isn't obvious, the IEEE hardware default rounding behavior 
is chosen to minimize the effects of accumulated rounding errors in a 
series of calculations involving rounding.

> I've looked at the code for the O'Caml compiler and I think I 
> know how to implement this, at least for x86 and PowerPC, the two
> architectures I have access to. If I was to supply a patch would
> it be accepted?
> 
> 
> I know other suggestions like this one :
> 
>     http://sardes.inrialpes.fr/~aschmitt/cwn/2003.11.18.html#1
> 
> were not viewed favourably, but the addition of a single function
> with an explicit behaviour is a far neater solution.

This could take the form of a compiler switch exactly like "/QIfist", 
which was added to VC7 (and VC6 with the "Processor Pack").  Using 
this switch means you are aware of (or should be) and happy with the 
above detailed assumption.

Of course, if something like this were to added to ocamlopt (for 
target architectures using IEEE floating point), code (an additional 
bytecode op?) emulating the same behavior could be added to the 
runtime to maintain consistency across the interpreted and native 
operating environments - or not.

Robert Roessler
roessler@rftp.com
http://www.rftp.com

next prev parent reply	other threads:[~2005-02-21  0:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-20 20:22 Erik de Castro Lopo
2005-02-21  0:00 ` Robert Roessler [this message]
2005-02-21  1:51   ` [Caml-list] " Erik de Castro Lopo
2005-02-21  4:34     ` Robert Roessler
2005-02-21  6:50       ` Erik de Castro Lopo
2005-02-21 11:54 ` Erik de Castro Lopo
2005-02-21 16:00   ` Xavier Leroy
2005-02-22  7:23     ` Erik de Castro Lopo
2005-02-21 16:01   ` Hendrik Tews

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=421924B5.6030108@rftp.com \
    --to=roessler@rftp.com \
    --cc=caml-list@inria.fr \
    --cc=ocaml-erikd@mega-nerd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).