From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <47CBC0FD.1010406@free.fr> Date: Mon, 3 Mar 2008 10:12:29 +0100 From: Philippe Anel User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] GCC/G++: some stress testing References: <47CB236A.2020402@free.fr> <13426df10803021919l45b67b63uea5b8871bd2fb5a5@mail.gmail.com> In-Reply-To: <13426df10803021919l45b67b63uea5b8871bd2fb5a5@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Topicbox-Message-UUID: 6d6130f2-ead3-11e9-9d60-3106f5b1d025 Ron, I thought Paul was talking about cache coherent system on which a high-contention lock can become a huge problem. Although the work did by Jim Taft on the NASA project looks very interesting (and if you have pointers to papers about locking primitive on such system, I would appreciate), it seems this system is memory coherent, not cache coherent (coherency maintained by SGI NUMALink interconnect fabric). And I agree with you. I also think (global) shared memory for IPC is more efficient than passing copied data across the nodes, and I suppose several papers tend to confirm this is the case: today's interconnect fabrics are lot of faster than memory memory access. My conjecture (I only have access to a simple dual core machines) is about locking primitive used in CSP (and IPC), I mean libthread which is based on rendezvous system call (which does use locking primitives 9/proc.c:sysrendezvous() ). I think this is the only reason why CSP would not scale well. Regarding my (other) conjecture about IPI, please read my answer to Paul. Phil; >> If CSP system itself takes care about memory hierarchy and uses no >> synchronisation (using IPI to send message to another core by example), >> CSP scales very well. >> > > Is this something you have measured or is this conjecture? > >> Of course IPI mechanism requires a switch to kernel mode which costs a >> lot. But this is necessary only if the destination thread is running on >> another core, and I don't think latency is very important in algorigthms >> requiring a lot of cpus. >> > > same question. > > For a look at an interesting library that scaled well on a 1024-node > SMP at NASA Ame's, by Jim Taft. > Short form: use shared memory for IPC, not data sharing. > > he's done very well this way. > > ron > > >