From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <6c8a5500d1d2d2fc0c1d587d96bf3a26@hamnavoe.com>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] parallel/distributed computation
From: Richard Miller <9fans@hamnavoe.com>
Date: Sun, 28 Oct 2007 15:21:52 +0000
In-Reply-To: <13426df10710262244o625804ebs78881f5eefc53549@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Topicbox-Message-UUID: dcc147bc-ead2-11e9-9d60-3106f5b1d025

> BSP is essentially all about "the inner loop". In this loop, you do
> the work, and, at the bottom of the loop, you tell everyone what you
> have done.
> 
> So you are either computing or communicating.

Not quite accurate.  In BSP, communication is separated from
synchronisation.  During the loop (BSP calls it a "superstep"), each
process does a mixture of computation and asynchronous communication,
in any order.  The computation is only allowed to use data which was
local to that process at the beginning of the superstep.  At the end
of the superstep, there's a global barrier synchronisation.  After the
synchronisation, processes can use the new values which which were
communicated during the previous superstep.  During a superstep,
therefore, computation and communication can and should be overlapped
as much as the algorithm and platform allow.  It's only the barrier
synch at the end which needs to be atomic.

> ... Which means, on your
> $100M computer, that you are using about $50M of it over time. Which
> is undesirable.

In the worst case of exactly equal amounts of computation and
communication, the cost of having no overlap at all would be a
constant factor of 2, but this does not limit scalability.  (Academic
computer scientists don't care about constant factors, except perhaps
in their research grants ...)

> This problem with BSP is well known, which is why some folks have
> tried to time-share the nodes

Mapping multiple virtual processors onto each physical processor to
increase overlap and hide latency was always part of the BSP model.
Valiant's original CACM paper called it "parallel slackness".

Ron is right to say BSP doesn't fit all problems.  I think it's worst
with loosely structured or adaptive algorithms where it's hard to
predict in advance how much work each process will do in a superstep;
then the global synch can be too restrictive.

But where it does fit, the simplicity of the approach gives the usual
benefits of minimalism.  Hence not completely off topic for this list,
I hope.

-- Richard