From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id XAA05949; Fri, 13 Jun 2003 23:23:31 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id XAA06158 for <caml-list@pauillac.inria.fr>; Fri, 13 Jun 2003 23:23:30 +0200 (MET DST)
Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218])
	by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h5DLNTT25630
	for <caml-list@inria.fr>; Fri, 13 Jun 2003 23:23:29 +0200 (MET DST)
Received: from tcsndslgw9poolf147.tcsn.uswest.net ([67.41.20.147] helo=dylan)
	by bluejay.mail.pas.earthlink.net with asmtp (Exim 3.33 #1)
	id 19Qw19-0004IY-00
	for caml-list@inria.fr; Fri, 13 Jun 2003 14:23:27 -0700
Message-ID: <00ad01c331f2$1bbacee0$0201a8c0@dylan>
From: "David McClain" <dmcclain1@mindspring.com>
To: <caml-list@inria.fr>
References: <Pine.LNX.4.21.0306131117030.5668-100000@walnut.he.net>
Subject: Re: [Caml-list] FP's and HyperThreading Processors
Date: Fri, 13 Jun 2003 14:23:51 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
X-ELNK-Trace: 7a0ab3eafc8cf994b22988ad1c62733440683398e744b8a4580ec9fbdb915eb645d03a62921a0350667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c
X-Spam: no; 0.00; mcclain:01 dmcclain:01 caml-list:01 libs:01 optical:99 reuse:01 violates:01 locality:01 reclamation:01 low-level:01 linked:01 ocaml:01 garbage:01 lisp:01 sml:01 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

> If it was optimized for the P2 it will by definition not be optimized for
> the P4,

Yes, you are correct about this, but I have the Intel numerical libs for the
P4 linked into the program. On the old P2 system, this code spend roughly
60% of its time inside that vendor code for FFT's. This is no longer true,
as other tests show very high performance of just those vendor routines.

Whereas, on the P2 at 350 MHz, I have been able to process audio through 5
stereo pairs of heavy duty (1-4 K) FFT's per data block at rates of around
200 MSamples/sec, this new computer is even faster by a huge amount. I
haven't any numerical rates to give on this just yet, but I can certainly
produce these. The new processor is fast enough to do serious DSP coding on
the P4 directly from compiled OCaml.

What is different about the audio processing code versus my optical phase
retreival code, is that I took care to reuse audo memory buffers. I know
this violates the spirit of FP to some extent, but it was needed to gain
this kind of speedy throughput in OCaml. I did no such optimizations in the
optical analysis code. So clearly, data locality is an important performance
parameter.

By the way, I did not mean to indict any of the high level languages, OCaml,
Lisp, SML, etc. I love these to death! But I also have some experience in
designing memory subsystems aimed at making OO code more efficient.
Specifically, we developed a hardware assist to garbage reclamation. What
I'm really asking is that hardware now become more aware of higher level
language needs. They have shown that they can do a superb job of running
hand crafted efficient low-level code. But they turned their backs on our
more forward looking needs.

Cheers,

- DM


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners