From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Eckhardt Subject: Re: [9fans] speaking of kenc To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> In-Reply-To: <70260999-40A3-4298-9276-DC03BB4B514E@telus.net> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <9958.1178315907.1@lunacy.ugrad.cs.cmu.edu> Date: Fri, 4 May 2007 17:58:27 -0400 Message-ID: <9959.1178315907@lunacy.ugrad.cs.cmu.edu> Topicbox-Message-UUID: 59da3e26-ead2-11e9-9d60-3106f5b1d025 > Looking at the SPUs on Cell, kencc won't let me make decent > code for them: the vast space of vector instructions requires > extensive language extensions to use well. The overhead of a > function call defeats the careful interleaving of those > instructions. I've probably read just enough about this architecture to make a fool of myself, but it's Friday afternoon, so here goes nothing. One possible goal might be a language in which you could describe high-level algorithms of a certain class which could then be compiled to run well on a Cell (and, to be a cool result, on some other thing). This would probably handle not just computation but also the necessary DMA to get the data ready. As you point out, that language probably wouldn't be C, and it may well be the case that it doesn't exist yet. Failing that, it seems like what people will be doing for a while is writing code carefully tuned to run well on exactly one or two particular models of Cell, which seems to me likely to look like carefully optimized "inner loop" stuff wrapped by glue code which matters less. I have to wonder whether it would be less painful to learn the hardware and write the optimized code in assembly language or to learn the hardware *and* learn how to cajole a complicated compiler into emitting the assembly language you know it should be emitting. With respect to kencc, I wonder how far you could get if each Cell vector instruction were a C-callable .s function of a few instructions and the SPU linker routinely inlined all small-instruction-count functions and had an optimizer explicitly designed for the SPU. Dave Eckhardt