Re: [Caml-list] 32 bit floats, SSE instructions

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Jon Harrop <jdh30@cam.ac.uk>
To: "Brandon J. Van Every" <vanevery@indiegamedesign.com>
Cc: "caml" <caml-list@inria.fr>
Subject: Re: [Caml-list] 32 bit floats, SSE instructions
Date: Wed, 9 Jun 2004 01:25:24 +0100	[thread overview]
Message-ID: <200406090125.24112.jdh30@cam.ac.uk> (raw)
In-Reply-To: <OOEALCJCKEBJBIJHCNJDKEFJHDAB.vanevery@indiegamedesign.com>

On Tuesday 08 June 2004 20:24, Brandon J. Van Every wrote:
> ...
> What utter nonsense!

Yo Mama.

> You ever written a 3D device driver? 

Are you trying to write a device driver in OCaml?

> You do *not* 
> engineer your most basic data structures for infinite flexibility.

You appear to be approximating the two in "two transpose formats" with 
infinity. Are you an astrophysicist?

> Almost nobody has the luxury of defining things so abastractly that they
> can switch SoA for AoS whenever they like.  It's a highly invasive
> change of programming model.

I think you are exaggerating cost of the "abstraction" of reordering the 
arguments of a function.

> The experiment of SoA has been tried in the 3D graphics industry and
> found wanting.  All the HW is AoS.

Apart from that "CPU" thing. ;-)

> The SoA methodology is possible with 
> the 3D APIs.  As one poster hinted, it's probably borne of the software
> rendering era mentality.  I distinctly recall when the DirectX 5.0 guys
> were implementing those features, shortly after the OpenGL guys did
> IIRC.  That would have been a 1997 timeframe.  They were thinking how
> "neat" it would all be.  Seen from the vantage of 2004, it all gave way
> to AoS and programmable shaders.  All your data for one vertex or one
> pixel at a time.

My point is that there are likely to be much more productive, higher-level 
optimisations that you could be doing.

> > The only algorithms which would be significantly affected are
> > those for which
> > accesses are to (x_i, y_i, z_i) for random "i" rather than to
>
> Oh, the 'only' algorithms.  Crikey.

Are you saying that most of your algorithms require random access of that 
form? Can these algorithms not be transformed so that they access more 
coherently?

> You ever written a 3D graphics pipeline???

I have dabbled in 3D graphics.

> You think it's all based on handing over some huge matrix 
> that could jolly well be in whatever order?

That is the objective, yes.

> Get real. 

Well, you're hardly going to be using complex numbers to represent vertex 
coordinates... ;-)

> The vast 
> majority of 3D graphics processing is accept / reject testing.  You want
> your data here, now, so you can decide what to do with it.  So you can
> retire it once you're done deciding, and not have it pollute your cache
> any further.

I'd like more, specific examples here. What determines the accept or reject? 
What is the consequence of an accept or reject?

> > > For example, transforming a large
> > > number of XYZW vectors by a 4x4 matrix is a 'pat' problem
> > > that occurs at some point in 3D graphics processing.
> >
> > If you want high performance, which you seem to want, the
> > hardware should be doing those for you.
>
> Well hand me a general purpose GPU with an incredible 2-way memory bus,
> smart guy.

The task of optimising 3D graphics software is to design your approach such 
that you don't need the results back, and all further computation occurs on 
the card. You make as much of the data available as possible and control data 
flow through the pipeline at the highest level with state changes.

If your programs are bottlenecked by lots of very low-level arithmetic over 
huge, flat data structures then you will almost certainly benefit from using 
more structured, hierarchical (ideally, multiresolution) representations. 
Derive the (possibly asymptotic) complexities of any suitable algorithms and 
make an educated decision on the basis of that quantification. This is likely 
to give you much better performance than very low-level optimisations such as 
fiddling with 32-bits floats.

> Hint: commodity 3D graphics cards are fast when you write to 
> them, damn slow when you read them, by design.

In general, you can't read the results of T&L from the card (sorry for being 
so off topic, guys) so the driver resorts to software T&L in OGL feedback 
mode.

If you absolutely must use flat data structures then perhaps you should 
consider using the GPU as a CPU.

> OCaml has a somewhat practical focus, but maybe it's not sufficiently
> practical for me.  I do find myself re-evaluating languages in terms of
> 3 overriding problems:
>
> - the available C++ migration path and its efficiency

Why C++? Is your objective to always prototype in OCaml and convert to C++?

> - the support of basic 3D graphics types, i.e. 32 bit floats

Why not "ease of use of trees, graphs etc."? Algorithmic optimisations are so 
much more productive...

> - ability to work with imperative, object oriented designs and libraries

OO is overrated, IMHO. Imperative is excusable for UI level things but I'd 
prefer a functional style for everything but the simplest of algorithms. 
OCaml can play with libraries fairly well but, yes, it takes a lot of time to 
get some things working. That isn't the fault of the OCaml creators though, 
of course, it's the fault of those dim-wits at Bell Labs...

> OCaml currently has 1 out of 3.
> See for point of reference, "Why No One Uses Functional Languages."
> http://cm.bell-labs.com/cm/cs/who/wadler/papers/sigplan-why/sigplan-why.
> ps.gz

Speak of the devil.

> 1.5 out of 3 if one considers OCaml SWIG to be an available, slow,
> optimizeable path.

Given the huge differences between C/C++ and OCaml (like safety), it would be 
overly optimistic to expect migration of code from OCaml to C to be much more 
efficient than it is now.

> ...Write my own little Python script to emit a lot of redundant,
> boring filename.i files with #include filename.h %include filename.h
> directives in them.

Yes, but the OCaml bit of SWIG is very alpha, AFAIK. SWIG wasn't even designed 
to deal with languages like OCaml.

Just out of curiosity, do you use the STL much? An interface to the STL might 
be interesting. I wouldn't use it any more though - I no longer have any need 
for C++.

> > > The reality is that 32 bit floats get used in the real
> > > world all over the place by 3D graphics guys.
> >
> > Don't read that, Xavier.
>
> Oh, is that about Xavier exploding if he hears 'real world' again?

Yes. Personally, I think the INRIA are doing a superb job with OCaml. It is an 
excellent implementation of an excellent language. I do all of my work in 
OCaml now. I'm getting offered jobs because I am so much more productive as a 
consequence. The programs I write whilst doing my 3D graphics research are 
more robust and faster than ever now that they have been converted entirely 
into OCaml from C++.

> > I don't believe Python was designed for doing 3D graphics.
>
> It wasn't.  It was designed to be flexible and easy to program, not
> fast.  Now that Python is growing in popularity, people are coming in
> post-hoc to try to make it fast.  Maybe in 3 years it'll be a good
> language in that regard.

I'm no expert, but I suspect there are numerous, rigorous theoretical reasons 
why it can't be made much more efficient (at least not by people who choose 
to program in Python ;-).

> > ...
> > Can you give an example of this?
>
> I'm feeling mentally challenged on specifics rigth now.  Generally
> speaking, 3D graphics problems are pipelines with N stages you might
> turn on or off.  This creates 2^N path possibilities.  Often you'd like
> to coalesce the operations at the various stages.

So you've got a large quantity of data in flat containers like arrays which 
you want to perform a sequence of algorithms on?

What sort of data is in the containers and what sorts of algorithms are you 
performing?

Is the problem that you would like to hoist the inner loops of all of the 
algorithms so that each datum has each algorithm applied to it rather than 
feeding all of the data through each algorithm in turn (i.e. deforestation)?

Cheers,
Jon.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

next prev parent reply	other threads:[~2004-06-09  0:26 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-07 11:13 Brandon J. Van Every
2004-06-07 11:32 ` Christophe TROESTLER
     [not found]   ` <20040607131717.GA12136@gaia.cc.gatech.edu>
2004-06-07 16:53     ` Christophe TROESTLER
2004-06-07 19:30       ` Brandon J. Van Every
2004-06-07 20:39         ` Nicolas Cannasse
2004-06-08  5:42           ` Brandon J. Van Every
2004-06-08 16:54             ` Jon Harrop
2004-06-08 20:50               ` Brandon J. Van Every
2004-06-09  3:19                 ` skaller
2004-06-08 14:23           ` Keith Wansbrough
2004-06-10 14:43             ` David Brown
2004-06-10 15:20               ` Keith Wansbrough
2004-06-10 15:57                 ` skaller
2004-06-10 16:23                   ` Keith Wansbrough
2004-06-10 16:47                     ` skaller
2004-06-10 19:46                     ` Evan Martin
2004-06-07 21:00         ` Richard Jones
2004-06-07 21:42           ` Jon Harrop
2004-06-09 15:55           ` Richard Jones
2004-06-07 22:48         ` Chris Clearwater
2004-06-07 17:01 ` brogoff
2004-06-08  1:50 ` Brian Hurt
2004-06-08  5:27   ` Brandon J. Van Every
2004-06-08 15:05     ` Brian Hurt
2004-06-08 16:50       ` art yerkes
2004-06-08 17:10     ` Jon Harrop
2004-06-08 19:24       ` Brandon J. Van Every
2004-06-09  0:25         ` Jon Harrop [this message]
2004-06-09  1:28           ` [Caml-list] 3D graphics debate Brandon J. Van Every
2004-06-09  2:40             ` Jon Harrop
2004-06-09  8:09               ` Brandon J. Van Every
2004-06-09  1:33           ` [Caml-list] 32 bit floats, SSE instructions Brandon J. Van Every
2004-06-09  3:04             ` Jon Harrop
2004-06-09  8:33               ` [Caml-list] The multiresolution business model Brandon J. Van Every
2004-06-09  3:27           ` [Caml-list] 32 bit floats, SSE instructions skaller
2004-06-09 14:21             ` Christophe TROESTLER
2004-06-09  2:57         ` [Caml-list] RE: 3D Rendering pipeline Brian Hurt
2004-06-10 17:55           ` [Caml-list] Re: [Ocaml-lib-devel] " Nicolas Cannasse
2004-06-08  8:10 [Caml-list] 32 bit floats, SSE instructions Ennals, Robert
2004-06-08 11:17 ` skaller
2004-06-08 17:42 ` John Carr
2004-06-09 16:13 ` Xavier Leroy
2004-06-08 17:15 Jon Harrop
2004-06-08 19:59 ` Brandon J. Van Every
2004-06-09  3:15   ` skaller
2004-06-09  4:08   ` Brian Hurt
2004-06-09  6:33     ` skaller
2004-06-09 16:26 ` Xavier Leroy
2004-06-09 17:58   ` Christophe TROESTLER
2004-06-09 18:15     ` Daniel Ortmann
2004-06-09 18:52       ` Kenneth Knowles
2004-06-09 20:03         ` John Carr
2004-06-09 19:54   ` Brandon J. Van Every

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200406090125.24112.jdh30@cam.ac.uk \
    --to=jdh30@cam.ac.uk \
    --cc=caml-list@inria.fr \
    --cc=vanevery@indiegamedesign.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).