From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id IAA12965; Fri, 13 Jun 2003 08:43:43 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id IAA12560 for ; Fri, 13 Jun 2003 08:43:41 +0200 (MET DST) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h5D6heH17213 for ; Fri, 13 Jun 2003 08:43:40 +0200 (MET DST) Received: from tcsndslgw9poolf147.tcsn.uswest.net ([67.41.20.147] helo=dylan) by bluejay.mail.pas.earthlink.net with asmtp (Exim 3.33 #1) id 19QiHi-0002Cb-00 for caml-list@inria.fr; Thu, 12 Jun 2003 23:43:39 -0700 Message-ID: <003601c33177$324ecc40$0201a8c0@dylan> From: "David McClain" To: Subject: [Caml-list] FP's and HyperThreading Processors Date: Thu, 12 Jun 2003 23:44:09 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-ELNK-Trace: 7a0ab3eafc8cf994b22988ad1c62733440683398e744b8a45f7d571eba6d042f849926a10e0263af350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Spam: no; 0.00; mcclain:01 dmcclain:01 retrieval:99 optical:99 closures:01 banks:99 smalltalk:01 raytheon:01 ocaml:01 lisp:01 350:97 optimized:02 tucson:02 overhead:03 data:03 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Hi, I have a massive application that performs nonlinear fitting to 170+ parameters; a phase retrieval problem for discerning the aberrations of an optical system. This program is written largely in compiled OCaml closures, along with a multithreaded vendor supplied FFT routine (presumably optimized for their processor). On an old P2 single processor machine at 350 MHz, I am seeing almost 95%+ CPU utilization. But on a new 3 GHz P4 with HyperThreading enabled (dual register banks for fast context switching and minimum cache coherence overhead), this same program provides much less than 5% CPU utilization. The net result is that this program runs only twice as fast on the new 3 GHz P4 as it runs on the old 350 MHz P2. I suspect, but have yet to prove, that the low utilization is due to a low CPU to memory bandwidth and to the failure of the L1 and L2 caches to supply needed operations and data to the CPU. This, I would hypothesize, is going to be demonstrated by any language that prefers fresh memory allocation for results, e.g., OCaml, ML, Lisp, Smalltalk, etc. If I am correct, then it implies that our hardware friends are moving rapidly in the opposite direction to our advanced software systems. I mention this in order to tickle the imaginations of both camps. Cheers, - David McClain, Sr. Scientist, Raytheon, Tucson, AZ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners