From mboxrd@z Thu Jan 1 00:00:00 1970 Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Paul Lalonde Subject: Re: [9fans] GCC/G++: some stress testing Date: Sun, 2 Mar 2008 12:34:30 -0800 To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Topicbox-Message-UUID: 6cb4fba2-ead3-11e9-9d60-3106f5b1d025 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 CSP doesn't scale very well to hundreds of simultaneously executing threads (my claim, not, as far as I've found yet, anyone else's). It is very well suited to a small number of threads that need to communicate, and as a model of concurrency for tasks with few points of contact. For performance, the channel locks become a bottleneck as the number of cores scale up. As far as expressiveness, there are still issues with composability and correctness as the number of threads interacting increases. Yes, you at least get local stacks, but the work seems to get exponentially harder as the number of systems in the simulation (um, game engine) increases. By shared cache I mean any number of caches that are kept coherent at the hardware level without serializing instructions. Programming the memory hierarchy is really a specific instance of programming for masking latency. This goes well beyond inserting prefetches in an optimization stage, presenting itself as problem decompositions that keep the current working set in cache (at L3, L2, or L1 granularity, depending), while simultaneously avoiding having multiple processors chewing on the same data (which leads to vast amounts of cache synchronization bus traffic). Successful algorithms in this space work on small bundles of data that either get flushed back to memory uncached (to keep more cache for streaming in), or in small bundles that can be passed from compute kernel to compute kernel cheaply. Having language structures to help with these decompositions and caching decisions is a great help - that's one of the reasons why functional programming keeps rearing its head in this space. Without aliasing and global (serializing) state it's much easier to analyze the program and chose how to break up the computation into kernels that can be streamed, pipelined, or otherwise separated to allow better cache utilization and parallelism. Currently, the best performing programs I know for exploiting the memory hierarchy are C and C++ programs written in a "kernel composition" kind of model that the language supports poorly. You can do it, but it feels more like coding in assembly than like expressing your algorithms. Much of the template metaprogramming is about taking measures of cache spaces and data sets and turning out code (statically) tuned to those sizes. There's a huge opportunity for a JIT language to allow better adaptability to changing data sets and numbers of active threads. Paul On 2-Mar-08, at 10:59 AM, erik quanstrom wrote: > >> Almost certainly. And so is C. Programming many-core shared-cache >> machines in languages with global state and aliasing is just plain >> wrong, in the same way that programming in assembly instead of C is >> wrong. Add a highly heterogeneous real-time task mix on top of that, >> and you're in for a world of poor cache performance and deadlocks, >> which could be avoided by better choices of implementation language. > > i don't understand this argument. are you saying that csp doesn't > work > in c? or are you saying that csp has caching problems that some other > languages solve? > > also, could you define what you mean by "shared cache" a bit more. > would you consider an intel quad core cpu to be a "shared cache" > machine, since the two l2 caches sit on the same fsb? > >> Programming for the memory hierarchy is way more important than >> optimizing CPU clocks anymore (though that winds up still having a >> place in some compute kernels). I wish our programming languages >> reflected that change in perspective. > > what do you mean by "programming for the memory heirarchy"? > > - erik > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iD8DBQFHyw9XpJeHo/Fbu1wRAgNJAJ9pCFh0kixsaCir2fGKXBZhXTXsDQCfROva LKnBfk+TaRKNrih36OBexbA= =Mhdg -----END PGP SIGNATURE-----