From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <6c8a5500d1d2d2fc0c1d587d96bf3a26@hamnavoe.com> To: 9fans@cse.psu.edu Subject: Re: [9fans] parallel/distributed computation From: Richard Miller <9fans@hamnavoe.com> Date: Sun, 28 Oct 2007 15:21:52 +0000 In-Reply-To: <13426df10710262244o625804ebs78881f5eefc53549@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Topicbox-Message-UUID: dcc147bc-ead2-11e9-9d60-3106f5b1d025 > BSP is essentially all about "the inner loop". In this loop, you do > the work, and, at the bottom of the loop, you tell everyone what you > have done. > > So you are either computing or communicating. Not quite accurate. In BSP, communication is separated from synchronisation. During the loop (BSP calls it a "superstep"), each process does a mixture of computation and asynchronous communication, in any order. The computation is only allowed to use data which was local to that process at the beginning of the superstep. At the end of the superstep, there's a global barrier synchronisation. After the synchronisation, processes can use the new values which which were communicated during the previous superstep. During a superstep, therefore, computation and communication can and should be overlapped as much as the algorithm and platform allow. It's only the barrier synch at the end which needs to be atomic. > ... Which means, on your > $100M computer, that you are using about $50M of it over time. Which > is undesirable. In the worst case of exactly equal amounts of computation and communication, the cost of having no overlap at all would be a constant factor of 2, but this does not limit scalability. (Academic computer scientists don't care about constant factors, except perhaps in their research grants ...) > This problem with BSP is well known, which is why some folks have > tried to time-share the nodes Mapping multiple virtual processors onto each physical processor to increase overlap and hide latency was always part of the BSP model. Valiant's original CACM paper called it "parallel slackness". Ron is right to say BSP doesn't fit all problems. I think it's worst with loosely structured or adaptive algorithms where it's hard to predict in advance how much work each process will do in a superstep; then the global synch can be too restrictive. But where it does fit, the simplicity of the approach gives the usual benefits of minimalism. Hence not completely off topic for this list, I hope. -- Richard