From mboxrd@z Thu Jan 1 00:00:00 1970 Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <47CB236A.2020402@free.fr> References: <47CB236A.2020402@free.fr> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <5A9D2697-A21E-4DF0-AB5E-759CD273BE49@telus.net> Content-Transfer-Encoding: 7bit From: Paul Lalonde Subject: Re: [9fans] GCC/G++: some stress testing Date: Sun, 2 Mar 2008 19:25:00 -0800 To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Topicbox-Message-UUID: 6d1d1c78-ead3-11e9-9d60-3106f5b1d025 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mar 2, 2008, at 2:00 PM, Philippe Anel wrote: > I agree with you, taking care about memory hierarchy is becoming > very important. Especially if you think about the upcoming NUMAcc > systems (Opterons are already there though). > But the fact is doesn't scale well is not about CSP itself, but the > way it has been implemented. > If CSP system itself takes care about memory hierarchy and uses no > synchronisation (using IPI to send message to another core by > example), CSP scales very well. > Of course IPI mechanism requires a switch to kernel mode which > costs a lot. But this is necessary only if the destination thread > is running on another core, and I don't think latency is very > important in algorigthms requiring a lot of cpus. Latency is quite important in the application domain I have to target: the target is to produce a new image every 60th of a second, including all the simulation effort to get there. In addition, we have user input which needs to be processed, and usually network delays to worry about as well. Every bit of latency between user input and display breaks the illusion of control. And though TVs are getting better, it's not atypical to see 4-6 frames of latency introduced by the display subsystem, once you've finished generating a frame buffer. I don't know what you mean by "CSP system itself takes care about memory hierarchy". Do you mean that the CSP implementation does something about it, or do you mean that the code using the CSP approach takes care of it? IPI isn't free either - apart from the OS switch, it generates bus traffic that competes with the cache coherence protocols and memory traffic; in a well designed compute kernel that saturates both compute and bandwidth the latency hiccups so introduced can propagate really badly. Paul -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (Darwin) iD8DBQFHy2+MpJeHo/Fbu1wRAgyUAKDdSB8B1vKRt8dpNA0MoT+3jnV63wCdGtNP 6FVzjgBJIkvy37rVNlmbE7Q= =RnvR -----END PGP SIGNATURE-----