From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: by yquem.inria.fr (Postfix, from userid 18180) id 7E8E8BC75; Mon, 21 Feb 2005 17:00:23 +0100 (CET) Date: Mon, 21 Feb 2005 17:00:23 +0100 From: Xavier Leroy To: Erik de Castro Lopo Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Need for a built in round_to_int function Message-ID: <20050221160023.GA4759@yquem.inria.fr> References: <20050221072255.29055ee4.ocaml-erikd@mega-nerd.com> <20050221225432.4f15c5e5.ocaml-erikd@mega-nerd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050221225432.4f15c5e5.ocaml-erikd@mega-nerd.com> User-Agent: Mutt/1.3.28i X-Spam: no; 0.00; caml-list:01 hacked:01 ocamlopt:01 compiler:01 speedup:01 ocaml:01 ocamlopt:01 instr:01 ...:98 hacks:01 maintainers:01 floats:01 floats:01 essentially:01 int:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.0.2 X-Spam-Level: > [ "round float to int" primitive ] > with the hacked ocamlopt compiler I'm getting about a 3 times speed > improvement using round_to_int on a Pentium III. The speedup on P4 > is supposed to be even more because of the P4's deeper pipeline. On the other hand, according to the P4 optimization manuals, the P4 is supposed to special-case this particular use of fnstcw / fldcw, so perhaps the situation is no worse than on the P3. At any rate: > I'd be interested in any comments from the official ocaml maintainers. > What are the chances of this going into the official distribution? Essentially zero :-( Basically, this is a case where additional stuff is introduced in the machine-independent parts of ocamlopt and in every code generator just to work around the brain-dead x87 floating-point instruction set. Every other processor (as well as the SSE2 instr.set on the P4) has a fast "truncate float to int" instruction, from which an efficient "round to nearest int" is easy to obtain (truncate (x +. 0.5)). I spent a lot of time in the past trying to extract decent float performance out of the x87 instruction set, under the strict constraint that these performance hacks should be confined to the x86-specific parts of ocamlopt. Nowadays, I no longer care about performance for x87: users who want good float performance should simply use the x86-64 architecture (with SSE2 floats), it's cheaply available from both AMD and Intel and works so much better for floats... At the extreme, I'd rather do a x86+SSE2 port (for P4 and P3M-class processors) than spend more time on x86+x87. - Xavier Leroy