From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA10661; Sun, 20 Jun 2004 09:13:47 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id JAA10660 for <caml-list@pauillac.inria.fr>; Sun, 20 Jun 2004 09:13:46 +0200 (MET DST)
Received: from plato.rocketdogcreative.com (plato.rocketdogcreative.com [66.139.78.99])
	by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i5K7DiEV017539
	for <caml-list@inria.fr>; Sun, 20 Jun 2004 09:13:45 +0200
Received: from [192.168.0.8] (63-227-90-107.tcsn.qwest.net [63.227.90.107])
	by plato.rocketdogcreative.com (8.11.6/8.11.6) with ESMTP id i5K7Dhg05288
	for <caml-list@inria.fr>; Sun, 20 Jun 2004 00:13:43 -0700
Mime-Version: 1.0 (Apple Message framework v618)
Content-Transfer-Encoding: 7bit
Message-Id: <5EEC2156-C289-11D8-9C37-000A95C19BAA@Avisere.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed
To: caml-list@inria.fr
From: David McClain <David.McClain@Avisere.com>
Subject: [Caml-list] 32-bit Floating Format
Date: Sun, 20 Jun 2004 00:13:46 -0700
X-Mailer: Apple Mail (2.618)
X-Miltered: at nez-perce with ID 40D53928.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; mcclain:01 mcclain:01 virtues:01 blazing:99 4...:01 -bits:01 pursued:99 speedup:01 ad-hoc:01 terse:01 interfacing:01 provably:01 graphing:01 compounds:01 kernels:01 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

Back during the gamer discussion about the virtues, and lack thereof, 
of 32-bit single precision floating point format, a comment was made 
casting aspersions at this format for computations but allowing its 
virtue as a storage format.

Prior tests of mine in the previous 5 years showed that double 
precision, and indeed, extended precision, formats were computationally 
more efficient on conventional FPU architectures. (e.g., X86 and 
Pentium archectures, 68K FPU's, SPARC, etc).

But now, we have arrived at the decision to perform computer image 
recognition on modern DSP's and the new G4 and G5 Power PC 
architectures. The Altivec engine, for example, only gives its blazing 
speed for 32-bit floating point formats. (at least on the G4... perhaps 
the G5 also allows 64-bits?)

All of a sudden, we are facing the need to support massive computations 
in this more meagre 32-bit format (er... actually only 24 bits). I 
still have questions in the back of my mind about just how far we can 
push these calculations before error buildup overwhelms us. But the 
Altivec is definitely being pursued, as are 32-bit floating point DSP 
architectures, based purely on the speedup that appears possible.

I am just beginning my investigations on the limitations of this 
format. I am using OCaml for this purpose -- actually a derived 
language called NML to support ad-hoc vectorized calculations from even 
more terse program statements. The beauty of OCaml (the underlying 
implementation language for NML) is its relative ease of interfacing to 
a C foreign function interface, in addition to its ease of use as a 
FPL.

NML retains the syntax and gross semantics of OCaml, but the typing is 
dynamic and loose. So NML does not qualify as a strongly typed and 
provably correct language. It is instead a language of extreme 
convenience along with extensive graphing capabilities for easy 
interpretation of aggregate numeric results.

But if anyone out there has prior experience using the Altivec for 
single precision computations, I would appreciate hearing from you 
about your perceptions and measurements of the accuracy of large 
vectorized calculations. If we are wasting our time it would be good to 
know sooner than later.

24 bit processing seems acceptable for audio work and one dimensional 
signal processing. But 2-D processing on images compounds the errors. 
This is mitigated to some extent by the fact that image kernels and 
dimensions are typically much smaller than audio stream chunks.

On the Pentium architectures, I have had considerable success using 
OCaml for audio DSP work in conjunction with Intel provided 
high-performance single and double precision Math Kernel Libraries and 
Signal Processing Libraries. However, under OCaml, most of that work 
was restricted to double precision floating point computations. Even 
so, I found that I could support enormous data rates, far exceeding 
even the highest practiced 192 KHz sampling rates, even on older 
Pentium II architectures, and all programmed in OCaml.

I am only now starting my work with the Power PC, having just ported my 
Ocaml based NML these past few days to a brand new PPC workstation.

David McClain
Senior Corporate Scientist
Avisere, Inc.

david.mcclain@avisere.com
+1.520.390.7738 (USA)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners