From mboxrd@z Thu Jan  1 00:00:00 1970
Mime-Version: 1.0 (Apple Message framework v752.3)
In-Reply-To: <47CB236A.2020402@free.fr>
References: <d5033e6962e97bb803b4f3feee886f55@quanstro.net>
	<E329B8ED-63E6-42C7-BB0B-81F4FAF79962@telus.net>
	<47CB236A.2020402@free.fr>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <5A9D2697-A21E-4DF0-AB5E-759CD273BE49@telus.net>
Content-Transfer-Encoding: 7bit
From: Paul Lalonde <plalonde@telus.net>
Subject: Re: [9fans] GCC/G++: some stress testing
Date: Sun,  2 Mar 2008 19:25:00 -0800
To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu>
Topicbox-Message-UUID: 6d1d1c78-ead3-11e9-9d60-3106f5b1d025

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Mar 2, 2008, at 2:00 PM, Philippe Anel wrote:
> I agree with you, taking care about memory hierarchy is becoming
> very important. Especially if you think about the upcoming NUMAcc
> systems (Opterons are already there though).
> But the fact is doesn't scale well is not about CSP itself, but the
> way it has been implemented.
> If CSP system itself takes care about memory hierarchy and uses no
> synchronisation (using IPI to send message to another core by
> example), CSP scales very well.
> Of course IPI mechanism requires a switch to kernel mode which
> costs a lot. But this is necessary only if the destination thread
> is running on another core, and I don't think latency is very
> important in algorigthms requiring a lot of cpus.

Latency is quite important in the application domain I have to
target: the target is to produce a new image every 60th of a second,
including all the simulation effort to get there.  In addition, we
have user input which needs to be processed, and usually network
delays to worry about as well.  Every bit of latency between user
input and display breaks the illusion of control.  And though TVs are
getting better, it's not atypical to see 4-6 frames of latency
introduced by the display subsystem, once you've finished generating
a frame buffer.

I don't know what you mean by "CSP system itself takes care about
memory hierarchy".  Do you mean that the CSP implementation does
something about it, or do you mean that the code using the CSP
approach takes care of it?

IPI isn't free either - apart from the OS switch, it generates bus
traffic that competes with the cache coherence protocols and memory
traffic; in a well designed compute kernel that saturates both
compute and bandwidth the latency hiccups so introduced can propagate
really badly.

Paul

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHy2+MpJeHo/Fbu1wRAgyUAKDdSB8B1vKRt8dpNA0MoT+3jnV63wCdGtNP
6FVzjgBJIkvy37rVNlmbE7Q=
=RnvR
-----END PGP SIGNATURE-----