From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 4871EBC8E for ; Mon, 21 Feb 2005 01:00:40 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j1L00dpm000915 for ; Mon, 21 Feb 2005 01:00:39 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id BAA02108 for ; Mon, 21 Feb 2005 01:00:39 +0100 (MET) Received: from smtp815.mail.sc5.yahoo.com (smtp815.mail.sc5.yahoo.com [66.163.170.1]) by concorde.inria.fr (8.13.0/8.13.0) with SMTP id j1L00bHP030568 for ; Mon, 21 Feb 2005 01:00:38 +0100 Received: from unknown (HELO ?192.168.1.100?) (rftp@pacbell.net@63.194.18.166 with plain) by smtp815.mail.sc5.yahoo.com with SMTP; 21 Feb 2005 00:00:36 -0000 Message-ID: <421924B5.6030108@rftp.com> Date: Sun, 20 Feb 2005 16:00:53 -0800 From: Robert Roessler Organization: Robert's High-performance Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041217 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Erik de Castro Lopo Cc: caml-list@inria.fr Subject: Re: [Caml-list] Need for a built in round_to_int function References: <20050221072255.29055ee4.ocaml-erikd@mega-nerd.com> In-Reply-To: <20050221072255.29055ee4.ocaml-erikd@mega-nerd.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 421924A7.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 421924A5.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 wrote:01 o'caml:01 rounding:01 converts:01 o'caml:01 compiler:01 rounding:01 unreasonable:01 compiler:01 powerpc:01 sardes:01 inrialpes:01 aschmitt:01 ocamlopt:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: Erik de Castro Lopo wrote: > I am about to port some code from C to O'caml. This code uses the > C99 function : > > long int lrint (double d) ; > > which performs rounding on the double and then converts that to > a long int. > > In O'caml the only option seems to be: > > let round_to_int f = int_of_float (f +. 0.5) ;; > > The problem is that this code on i386 produces really slow code: > > 804b385: dd 44 98 fc fldl 0xfffffffc(%eax,%ebx,4) > 804b389: de c1 faddp %st,%st(1) > 804b38b: 83 ec 08 sub $0x8,%esp > 804b38e: d9 7c 24 04 fnstcw 0x4(%esp) > 804b392: 66 8b 44 24 04 mov 0x4(%esp),%ax > 804b397: b4 0c mov $0xc,%ah > 804b399: 66 89 44 24 00 mov %ax,0x0(%esp) > 804b39e: d9 6c 24 00 fldcw 0x0(%esp) > 804b3a2: db 1c 24 fistpl (%esp) > 804b3a5: 8b 04 24 mov (%esp),%eax > 804b3a8: d9 6c 24 04 fldcw 0x4(%esp) > 804b3ac: 83 c4 08 add $0x8,%esp > > The killer here is the two fldcw (floating point load control word) > instructions, around the fistpl (which actually does the float to int > conversion). Loading the FP control work causes a flush of the FPU > pipeline. In code with a lot of floating point code interspersed > with a round to int, there can be a significant slow down due to > the fldcw instructions. I will preface this by a Slashdot-like "IANANA" (I Am Not A Numerical Analyst). The above approach is more or less what you expect if you (as a compiler code generator) a) want to do rounding following C/C++ standards ("Truncate (toward 0)"), and b) make no assumption regarding the state of the IEEE hardware rounding setting... > The lrint function in C, replaces all the above with one fistpl > and a single mov instruction and leaves the floating point > control word intact. In C code that moved from: > > (int) floor (f + 0.5) > > to > lrintf (f) > > I have seen an up to 4 fold increase in speed. You, on the other hand, are willing to make an assumption regarding the hardware rounding mode - [presumably] that it is set to the power-on default of "Round to nearest, or to even if equidistant", which may not be unreasonable - it just needs to be explicit that this *is* the assumption, and that you have a way of verifying (or at least reason to believe) that other software components in your app's environment are not invalidating this assumption. The fact that the default hardware rounding mode does NOT match "(int) floor (f + 0.5)" should also be mentioned... the "+ 0.5" attempts to do what the hardware would call "Round up (toward +infinity)" while the "floor" would match the "Round down (toward -infinity)" mode. Combining them does not equate to "Round to nearest, or to even if equidistant". :) In case it isn't obvious, the IEEE hardware default rounding behavior is chosen to minimize the effects of accumulated rounding errors in a series of calculations involving rounding. > I've looked at the code for the O'Caml compiler and I think I > know how to implement this, at least for x86 and PowerPC, the two > architectures I have access to. If I was to supply a patch would > it be accepted? > > > I know other suggestions like this one : > > http://sardes.inrialpes.fr/~aschmitt/cwn/2003.11.18.html#1 > > were not viewed favourably, but the addition of a single function > with an explicit behaviour is a far neater solution. This could take the form of a compiler switch exactly like "/QIfist", which was added to VC7 (and VC6 with the "Processor Pack"). Using this switch means you are aware of (or should be) and happy with the above detailed assumption. Of course, if something like this were to added to ocamlopt (for target architectures using IEEE floating point), code (an additional bytecode op?) emulating the same behavior could be added to the runtime to maintain consistency across the interpreted and native operating environments - or not. Robert Roessler roessler@rftp.com http://www.rftp.com