From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id XAA05949; Fri, 13 Jun 2003 23:23:31 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id XAA06158 for ; Fri, 13 Jun 2003 23:23:30 +0200 (MET DST) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h5DLNTT25630 for ; Fri, 13 Jun 2003 23:23:29 +0200 (MET DST) Received: from tcsndslgw9poolf147.tcsn.uswest.net ([67.41.20.147] helo=dylan) by bluejay.mail.pas.earthlink.net with asmtp (Exim 3.33 #1) id 19Qw19-0004IY-00 for caml-list@inria.fr; Fri, 13 Jun 2003 14:23:27 -0700 Message-ID: <00ad01c331f2$1bbacee0$0201a8c0@dylan> From: "David McClain" To: References: Subject: Re: [Caml-list] FP's and HyperThreading Processors Date: Fri, 13 Jun 2003 14:23:51 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-ELNK-Trace: 7a0ab3eafc8cf994b22988ad1c62733440683398e744b8a4580ec9fbdb915eb645d03a62921a0350667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c X-Spam: no; 0.00; mcclain:01 dmcclain:01 caml-list:01 libs:01 optical:99 reuse:01 violates:01 locality:01 reclamation:01 low-level:01 linked:01 ocaml:01 garbage:01 lisp:01 sml:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > If it was optimized for the P2 it will by definition not be optimized for > the P4, Yes, you are correct about this, but I have the Intel numerical libs for the P4 linked into the program. On the old P2 system, this code spend roughly 60% of its time inside that vendor code for FFT's. This is no longer true, as other tests show very high performance of just those vendor routines. Whereas, on the P2 at 350 MHz, I have been able to process audio through 5 stereo pairs of heavy duty (1-4 K) FFT's per data block at rates of around 200 MSamples/sec, this new computer is even faster by a huge amount. I haven't any numerical rates to give on this just yet, but I can certainly produce these. The new processor is fast enough to do serious DSP coding on the P4 directly from compiled OCaml. What is different about the audio processing code versus my optical phase retreival code, is that I took care to reuse audo memory buffers. I know this violates the spirit of FP to some extent, but it was needed to gain this kind of speedy throughput in OCaml. I did no such optimizations in the optical analysis code. So clearly, data locality is an important performance parameter. By the way, I did not mean to indict any of the high level languages, OCaml, Lisp, SML, etc. I love these to death! But I also have some experience in designing memory subsystems aimed at making OO code more efficient. Specifically, we developed a hardware assist to garbage reclamation. What I'm really asking is that hardware now become more aware of higher level language needs. They have shown that they can do a superb job of running hand crafted efficient low-level code. But they turned their backs on our more forward looking needs. Cheers, - DM ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners