From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id CAA03485; Wed, 9 Jun 2004 02:26:54 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id CAA03488 for ; Wed, 9 Jun 2004 02:26:53 +0200 (MET DST) Received: from ptb-relay03.plus.net (ptb-relay03.plus.net [212.159.14.214]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i590QqSH006852 for ; Wed, 9 Jun 2004 02:26:52 +0200 Received: from [80.229.56.224] (helo=chetara) by ptb-relay03.plus.net with esmtp (Exim) id 1BXqva-000Atw-DA; Wed, 09 Jun 2004 00:26:50 +0000 From: Jon Harrop Organization: University of Cambridge To: "Brandon J. Van Every" Subject: Re: [Caml-list] 32 bit floats, SSE instructions Date: Wed, 9 Jun 2004 01:25:24 +0100 User-Agent: KMail/1.6.2 Cc: "caml" References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Message-Id: <200406090125.24112.jdh30@cam.ac.uk> X-Miltered: at concorde with ID 40C6594C.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 floats:01 2004:99 brandon:99 mama:99 model:01 abstraction:01 reordering:01 apis:01 era:99 mentality:01 shaders:01 higher-level:01 'only':99 xyzw:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Tuesday 08 June 2004 20:24, Brandon J. Van Every wrote: > ... > What utter nonsense! Yo Mama. > You ever written a 3D device driver? Are you trying to write a device driver in OCaml? > You do *not* > engineer your most basic data structures for infinite flexibility. You appear to be approximating the two in "two transpose formats" with infinity. Are you an astrophysicist? > Almost nobody has the luxury of defining things so abastractly that they > can switch SoA for AoS whenever they like. It's a highly invasive > change of programming model. I think you are exaggerating cost of the "abstraction" of reordering the arguments of a function. > The experiment of SoA has been tried in the 3D graphics industry and > found wanting. All the HW is AoS. Apart from that "CPU" thing. ;-) > The SoA methodology is possible with > the 3D APIs. As one poster hinted, it's probably borne of the software > rendering era mentality. I distinctly recall when the DirectX 5.0 guys > were implementing those features, shortly after the OpenGL guys did > IIRC. That would have been a 1997 timeframe. They were thinking how > "neat" it would all be. Seen from the vantage of 2004, it all gave way > to AoS and programmable shaders. All your data for one vertex or one > pixel at a time. My point is that there are likely to be much more productive, higher-level optimisations that you could be doing. > > The only algorithms which would be significantly affected are > > those for which > > accesses are to (x_i, y_i, z_i) for random "i" rather than to > > Oh, the 'only' algorithms. Crikey. Are you saying that most of your algorithms require random access of that form? Can these algorithms not be transformed so that they access more coherently? > You ever written a 3D graphics pipeline??? I have dabbled in 3D graphics. > You think it's all based on handing over some huge matrix > that could jolly well be in whatever order? That is the objective, yes. > Get real. Well, you're hardly going to be using complex numbers to represent vertex coordinates... ;-) > The vast > majority of 3D graphics processing is accept / reject testing. You want > your data here, now, so you can decide what to do with it. So you can > retire it once you're done deciding, and not have it pollute your cache > any further. I'd like more, specific examples here. What determines the accept or reject? What is the consequence of an accept or reject? > > > For example, transforming a large > > > number of XYZW vectors by a 4x4 matrix is a 'pat' problem > > > that occurs at some point in 3D graphics processing. > > > > If you want high performance, which you seem to want, the > > hardware should be doing those for you. > > Well hand me a general purpose GPU with an incredible 2-way memory bus, > smart guy. The task of optimising 3D graphics software is to design your approach such that you don't need the results back, and all further computation occurs on the card. You make as much of the data available as possible and control data flow through the pipeline at the highest level with state changes. If your programs are bottlenecked by lots of very low-level arithmetic over huge, flat data structures then you will almost certainly benefit from using more structured, hierarchical (ideally, multiresolution) representations. Derive the (possibly asymptotic) complexities of any suitable algorithms and make an educated decision on the basis of that quantification. This is likely to give you much better performance than very low-level optimisations such as fiddling with 32-bits floats. > Hint: commodity 3D graphics cards are fast when you write to > them, damn slow when you read them, by design. In general, you can't read the results of T&L from the card (sorry for being so off topic, guys) so the driver resorts to software T&L in OGL feedback mode. If you absolutely must use flat data structures then perhaps you should consider using the GPU as a CPU. > OCaml has a somewhat practical focus, but maybe it's not sufficiently > practical for me. I do find myself re-evaluating languages in terms of > 3 overriding problems: > > - the available C++ migration path and its efficiency Why C++? Is your objective to always prototype in OCaml and convert to C++? > - the support of basic 3D graphics types, i.e. 32 bit floats Why not "ease of use of trees, graphs etc."? Algorithmic optimisations are so much more productive... > - ability to work with imperative, object oriented designs and libraries OO is overrated, IMHO. Imperative is excusable for UI level things but I'd prefer a functional style for everything but the simplest of algorithms. OCaml can play with libraries fairly well but, yes, it takes a lot of time to get some things working. That isn't the fault of the OCaml creators though, of course, it's the fault of those dim-wits at Bell Labs... > OCaml currently has 1 out of 3. > See for point of reference, "Why No One Uses Functional Languages." > http://cm.bell-labs.com/cm/cs/who/wadler/papers/sigplan-why/sigplan-why. > ps.gz Speak of the devil. > 1.5 out of 3 if one considers OCaml SWIG to be an available, slow, > optimizeable path. Given the huge differences between C/C++ and OCaml (like safety), it would be overly optimistic to expect migration of code from OCaml to C to be much more efficient than it is now. > ...Write my own little Python script to emit a lot of redundant, > boring filename.i files with #include filename.h %include filename.h > directives in them. Yes, but the OCaml bit of SWIG is very alpha, AFAIK. SWIG wasn't even designed to deal with languages like OCaml. Just out of curiosity, do you use the STL much? An interface to the STL might be interesting. I wouldn't use it any more though - I no longer have any need for C++. > > > The reality is that 32 bit floats get used in the real > > > world all over the place by 3D graphics guys. > > > > Don't read that, Xavier. > > Oh, is that about Xavier exploding if he hears 'real world' again? Yes. Personally, I think the INRIA are doing a superb job with OCaml. It is an excellent implementation of an excellent language. I do all of my work in OCaml now. I'm getting offered jobs because I am so much more productive as a consequence. The programs I write whilst doing my 3D graphics research are more robust and faster than ever now that they have been converted entirely into OCaml from C++. > > I don't believe Python was designed for doing 3D graphics. > > It wasn't. It was designed to be flexible and easy to program, not > fast. Now that Python is growing in popularity, people are coming in > post-hoc to try to make it fast. Maybe in 3 years it'll be a good > language in that regard. I'm no expert, but I suspect there are numerous, rigorous theoretical reasons why it can't be made much more efficient (at least not by people who choose to program in Python ;-). > > ... > > Can you give an example of this? > > I'm feeling mentally challenged on specifics rigth now. Generally > speaking, 3D graphics problems are pipelines with N stages you might > turn on or off. This creates 2^N path possibilities. Often you'd like > to coalesce the operations at the various stages. So you've got a large quantity of data in flat containers like arrays which you want to perform a sequence of algorithms on? What sort of data is in the containers and what sorts of algorithms are you performing? Is the problem that you would like to hoist the inner loops of all of the algorithms so that each datum has each algorithm applied to it rather than feeding all of the data through each algorithm in turn (i.e. deforestation)? Cheers, Jon. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners