From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA09342; Thu, 8 Nov 2001 10:59:55 +0100 (MET)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA10177 for <caml-list@pauillac.inria.fr>; Thu, 8 Nov 2001 10:59:54 +0100 (MET)
Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35])
	by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id fA89xqX09397;
	Thu, 8 Nov 2001 10:59:52 +0100 (MET)
Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA10266; Thu, 8 Nov 2001 10:59:52 +0100 (MET)
Date: Thu, 8 Nov 2001 10:59:52 +0100
From: Xavier Leroy <xavier.leroy@inria.fr>
To: Niall Dalton <ndalton@ics.uci.edu>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] native code optimization priorities
Message-ID: <20011108105952.C9260@pauillac.inria.fr>
References: <20011106150621.B3221@pauillac.inria.fr> <Pine.SOL.4.20.0111061141330.10389-100000@godzilla.ics.uci.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <Pine.SOL.4.20.0111061141330.10389-100000@godzilla.ics.uci.edu>; from ndalton@ics.uci.edu on Tue, Nov 06, 2001 at 11:50:59AM -0800
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

> > I have
> > vague ideas about things that could be done, e.g. a Pentium-4 back-end
> > that would use SSE2 registers for floating-point, but this is all
> > low priority.
> 
> May I ask if you ever did implement this, would you limit it to some
> P4 specific technique? I've idly toyed with the idea of implementing
> something for Altivec on the G4.

I'm afraid I wasn't clear enough: the first step would be to use SSE2
registers as normal floating-point registers, storing only one float
per register, and performing single floating-point operations.  This
would already improve float performance quite a lot compared with the
current x86 float stack.  Other processors do not need this hack,
because they already have a sensible register-based float architecture.

The next step, of course, would be to actually use SIMD instructions
to operate on pairs or quadruples of floats.  The standard approach
would be to have special abstract types for these packed floats, with
operations corresponding to what the hardware SIMD unit provides.  The
problem here is that of portability: SSE2 and Altivec, for instance,
do not provide the same SIMD instructions...

> I wondered if it would be possible
> to integrate this into the type inference; if the compiler can infer
> that certain values will never require more than a certain number of
> bits they become candidates for use in a SIMD unit. This is along the
> lines of Bitwidth Analysis (PLDI'00 Stephenson et al, and Larsen and
> Amarasinghe's Exploting Superword Level Parallelism with Multimedia
> Instruction Sets, same conference). Scott Ananian's SM thesis at MIT
> also included a predicated (forward and reverse) SSA variant that used
> a similar optimization to find narrow operations that could be executed in
> parallel. 

We're getting into really advanced stuff here!  It's a research topic
on its own, and I somewhat doubt that we can extract much parallelism
this way, but we'll see.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr