From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA21307; Thu, 27 Mar 2003 21:48:58 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA21332 for ; Thu, 27 Mar 2003 21:48:57 +0100 (MET) Received: from marble.he.net (marble.he.net [216.218.230.2]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h2RKmu508543 for ; Thu, 27 Mar 2003 21:48:56 +0100 (MET) Received: from ucdavis.edu ([208.227.214.58] (may be forged)) by marble.he.net (8.8.6/8.8.2) with ESMTP id MAA11395 for ; Thu, 27 Mar 2003 12:48:50 -0800 Message-ID: <3E836522.8000905@ucdavis.edu> Date: Thu, 27 Mar 2003 12:54:58 -0800 From: Issac Trotts User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020913 Debian/1.1-1 MIME-Version: 1.0 To: OCaml List Subject: Re: [Caml-list] DFT in OCaml vs. C References: <3E82A960.2070707@ucdavis.edu> <20030327153247.A5015@pauillac.inria.fr> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; issac:01 trotts:01 ijtrotts:01 caml-list:01 fourier:01 timings:01 achieves:01 notoriously:01 gcc:01 feats:01 ocamlopt:01 camlcvs:01 cvsweb:01 delivers:99 ocaml:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Xavier Leroy wrote: >>Here's a numerical mini-benchmark comparing C to OCaml >>on a simple implementation of the Discrete Fourier Transform: >>[...] >>So the C version was about five times as fast. >>I'd be interested if anyone on this list knows of a way >>to make it perform as well as the C version (without using the FFT.) >> >> > >It can be done, but not on a Pentium 3. Here are my timings: > > Pentium 4 Pentium 4 SSE2 Alpha 21264 > (2 GHz) (2 GHz) (500 MHz) > >C 20 20 36 >OCaml (your code) 113 40 52 >OCaml (variant 1) 90 26 40 >OCaml (variant 2) 72 38 100 > >Variants 1 and 2 differ on the complex multiply step: > >Your code: > let a2=c*. !a -. s*. !b > and b2=c*. !b +. s*. !a in > a := a2; > b := b2; >Variant 1: > let x = s *. !a in > a := c*. !a -. s*. !b; > b := c*. !b +. x >Variant 2: > let olda = !a and oldb = !b in > a := c *. olda -. s *. oldb; > b := c *. oldb +. s *. olda > > >The "Pentium 4 SSE2" column is an experimental code generator for the >Pentium 4 that uses SSE2 instructions and registers for floating-point >computations. (Before you ask: no, it's not publically available, >but will be the basis for the x86_64 code generator as soon as the >hardware becomes available.) > >As you can see above, variant 1 achieves almost the performance of C >on platforms that have a regular register-based FP arithmetic unit. > >However, the x86 floating-point stack (what OCaml uses for >compatibility with Pentium 3 and earlier processors) is notoriously >cranky and hard to generate efficient code for. gcc manages to >exploit instruction-level parallelism between the "re" and "im" >computations via amazing feats (fxch instructions, etc), but the >ocamlopt x86 code generator just generates very sequential code... > >So, unless you have an Alpha at hand, you'd better consider FFT. >There's an FFT implementation that I use as a benchmark here: > > http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/test/fft.ml > >and it delivers about 2/3 of the performances of C, even on the Pentium. > >- Xavier Leroy > Thanks for a very informative and helpful message. Issac ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners