From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: To: 9fans@cse.psu.edu Subject: Re: [9fans] parallel/distributed computation From: erik quanstrom Date: Sat, 27 Oct 2007 01:53:36 -0400 In-Reply-To: <13426df10710262244o625804ebs78881f5eefc53549@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Topicbox-Message-UUID: db472aaa-ead2-11e9-9d60-3106f5b1d025 thanks. - erik > BSP is essentially all about "the inner loop". In this loop, you do > the work, and, at the bottom of the loop, you tell everyone what you > have done. > > So you are either computing or communicating. Which means, on your > $100M computer, that you are using about $50M of it over time. Which > is undesirable. > > Nowadays, people work fairly hard to ensure that while computation is > happening, the network is busy moving data. > > This problem with BSP is well known, which is why some folks have > tried to time-share the nodes in the following > way(www.ccs3.lanl.gov/pal/publications/papers/petrini01:feng.pdf): > have N jobs (N usually 2). While N-1 jobs are using the network, and > hence not computing, have 1 job computing. Of course, matching this > all up is hard, and most compute jobs typically are sized to use all > of memory, so this approach has not been used much. The nodes on the > big machines are typically not shared between jobs. > > BSP was an interesting idea but is not commonly used any more, at > least on the systems I know about. Rather, people work hard to overlap > communication and computation. > > ron > p.s. for more recent work see: www.cs.unm.edu/~fastos/06meeting/sft.pdf