From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <13426df10710262244o625804ebs78881f5eefc53549@mail.gmail.com>
Date: Fri, 26 Oct 2007 22:44:38 -0700
From: "ron minnich" <rminnich@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: [9fans] parallel/distributed computation
In-Reply-To: <d9acb4468aeb14c949308863880dbd99@quanstro.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <13426df10710260801q593c6d05me5c1101af9e9151e@mail.gmail.com>
	<d9acb4468aeb14c949308863880dbd99@quanstro.net>
Topicbox-Message-UUID: db4148d8-ead2-11e9-9d60-3106f5b1d025

On 10/26/07, erik quanstrom <quanstro@quanstro.net> wrote:

> could you elaborate or give a pointer explaining why
> bsp is insufficient?

BSP is essentially all about "the inner loop". In this loop, you do
the work, and, at the bottom of the loop, you tell everyone what you
have done.

So you are either computing or communicating. Which means, on your
$100M computer, that you are using about $50M of it over time. Which
is undesirable.

Nowadays, people work fairly hard to ensure that while computation is
happening, the network is busy moving data.

This problem with BSP is well known, which is why some folks have
tried to time-share the nodes in the following
way(www.ccs3.lanl.gov/pal/publications/papers/petrini01:feng.pdf):
have N jobs (N usually 2). While N-1 jobs are using the network, and
hence not computing, have 1 job computing. Of course, matching this
all up is hard, and most compute  jobs typically are sized to use all
of memory, so this approach has not been used much. The nodes on the
big machines are typically not shared between jobs.

BSP was an interesting idea but is not commonly used any more, at
least on the systems I know about. Rather, people work hard to overlap
communication and computation.

ron
p.s. for more recent work see: www.cs.unm.edu/~fastos/06meeting/sft.pdf