From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA09342; Thu, 8 Nov 2001 10:59:55 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA10177 for ; Thu, 8 Nov 2001 10:59:54 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id fA89xqX09397; Thu, 8 Nov 2001 10:59:52 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA10266; Thu, 8 Nov 2001 10:59:52 +0100 (MET) Date: Thu, 8 Nov 2001 10:59:52 +0100 From: Xavier Leroy To: Niall Dalton Cc: caml-list@inria.fr Subject: Re: [Caml-list] native code optimization priorities Message-ID: <20011108105952.C9260@pauillac.inria.fr> References: <20011106150621.B3221@pauillac.inria.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from ndalton@ics.uci.edu on Tue, Nov 06, 2001 at 11:50:59AM -0800 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > > I have > > vague ideas about things that could be done, e.g. a Pentium-4 back-end > > that would use SSE2 registers for floating-point, but this is all > > low priority. > > May I ask if you ever did implement this, would you limit it to some > P4 specific technique? I've idly toyed with the idea of implementing > something for Altivec on the G4. I'm afraid I wasn't clear enough: the first step would be to use SSE2 registers as normal floating-point registers, storing only one float per register, and performing single floating-point operations. This would already improve float performance quite a lot compared with the current x86 float stack. Other processors do not need this hack, because they already have a sensible register-based float architecture. The next step, of course, would be to actually use SIMD instructions to operate on pairs or quadruples of floats. The standard approach would be to have special abstract types for these packed floats, with operations corresponding to what the hardware SIMD unit provides. The problem here is that of portability: SSE2 and Altivec, for instance, do not provide the same SIMD instructions... > I wondered if it would be possible > to integrate this into the type inference; if the compiler can infer > that certain values will never require more than a certain number of > bits they become candidates for use in a SIMD unit. This is along the > lines of Bitwidth Analysis (PLDI'00 Stephenson et al, and Larsen and > Amarasinghe's Exploting Superword Level Parallelism with Multimedia > Instruction Sets, same conference). Scott Ananian's SM thesis at MIT > also included a predicated (forward and reverse) SSA variant that used > a similar optimization to find narrow operations that could be executed in > parallel. We're getting into really advanced stuff here! It's a research topic on its own, and I somewhat doubt that we can extract much parallelism this way, but we'll see. - Xavier Leroy ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr