9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] Barrelfish
       [not found] <<20091016172030.GB3135@nipl.net>
@ 2009-10-16 18:34 ` erik quanstrom
  0 siblings, 0 replies; 90+ messages in thread
From: erik quanstrom @ 2009-10-16 18:34 UTC (permalink / raw)
  To: 9fans

> > > There is a vast range of applications that cannot
> > > be managed in real time using existing single-core technology.
> >
> > please name one.
>
> Your apparent lack of imagination surprises me.
>
> Surely you can see that a whole range of applications becomes possible when
> using a massively parallel system, when compared to a single-CPU system.  You
> could perhaps also achieve these applications using a large network of 1000
> normal computers, but that would be expensive and use a lot of space.
>
> I named two in another post: real-time animated raytracing, and instantaneous
> complex dsp over a long audio track.  I'll also mention instantaneous video
> encoding.  Instantaneous building of a complex project from source.
> (I'm defining instantaneous as less than 1 second for this.)

two points.

1.  by real time i mean this http://en.wikipedia.org/wiki/Real-time_computing
i'm not sure what your definition is.  i guessing you're using the
"can keep up most of the time" definition?

2.  i still can't think of any a priori reasons why one can't do any
particular task in real time with 1 processor that one can do with
more than one processor.  perhaps the hidden assumption io
that the total processing power of an mp setup will be greater?
if the processing power is equal, certainly one would go for the
uniprocessor.

but as long as i'm being lumped in with ken, i'll take it as
a grand complement.  ☺

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-21 15:43         ` Sam Watkins
  2009-10-21 16:11           ` Russ Cox
  2009-10-21 18:01           ` ron minnich
@ 2009-10-28 15:37           ` matt
  2 siblings, 0 replies; 90+ messages in thread
From: matt @ 2009-10-28 15:37 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Sorry to kick this rotting horse but I just got back

>
>
>>You've got to feed in 2 hours of source material - 820Gb per stream, how?
>>
>>
>
>I suppose some sort of parallel bus of wires or optic fibres.
>
we call that "hand waving"

>If I have
>massively parallel processing I would want massively parallel IO to go with it.
>I.e. something like "read data starting from here" -> "here it is streaming one
>megabit in parallel down the bus at 1Ghz over 1 million channels"
>
>
While riding a unicorn

>would take advantage of perhaps 720 "cores" to encode a two hour video in 10
>second chunks with barely any Ahmdal effects,
>
>
This 720Gbit storage device sounds pretty good.

>People do this stuff every day.
>Have you heard of a render-farm?
>
>
Your sarcasm is cute. Have you used a render farm ? You're right that
rendering on a few cores is CPU bound. But you've moved the goalposts by
100,000,000 orders of magnitude.

We have this comic on the wall with "programmer|compiling" replaced with
"animator|rendering"

http://xkcd.com/303/

And there's a standing order that you can't have sex unless you're
rendering.

>I'm not sure why I'm wasting time writing about this, it's obvious anyway.
>
>
Yeah, that must be why everyone is rendering Imax movies in a few seconds.

We can all imagine a place where computation is instant and we just say
"computer! run Sherlock Holmes on the Holodeck from where we left off,
but this time give it a Wild West theme".








^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-21 15:43         ` Sam Watkins
  2009-10-21 16:11           ` Russ Cox
@ 2009-10-21 18:01           ` ron minnich
  2009-10-28 15:37           ` matt
  2 siblings, 0 replies; 90+ messages in thread
From: ron minnich @ 2009-10-21 18:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Oct 21, 2009 at 8:43 AM, Sam Watkins <sam@nipl.net> wrote:

>   People do this stuff every day.
> Have you heard of a render-farm?

Yes, and some of them are on this list, and have actually done this
sort of work, as you clearly have not. Else you would understand where
the limits on parallelism are in parallel encoding of MPEG-2, and why,
in fact, one useful good thing to know about a two hour movie is that
the limit on parallelism might be, oh, say around 240.

Had you done it, or come close to doing it, or given some indication
that you have some approximation of a clue as to what is involved in
doing it, you might be getting a little less argument.

> I'm not sure why I'm wasting time writing about this, it's obvious anyway.

It is obvious. It's obviously wrong. It's obviously not informed by
experience. It's obvious you are enthusiastic about this type of thing
but need to learn more about it. The enthusiasm is admirable anyway
...

ron



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-21 16:11           ` Russ Cox
@ 2009-10-21 16:37             ` Sam Watkins
  0 siblings, 0 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-21 16:37 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Oct 21, 2009 at 09:11:10AM -0700, Russ Cox wrote:
> > Can you give one example of a slow task that you think cannot benefit much
> > from parallel processing?
>
> Rebuilding a venti index is almost entirely I/O bound.

Perhaps I should have specified a processor-bound task.  I don't know much
about venti or its indexes, but "rebuilding" an index sounds like a bad idea
anyway. I suppose you could make an index that updates progressively?
or does this happen in the event of a crash or something?

If someone wants to use a massively parallel computer for IO-bound tasks,
they should have massively parallel IO and media to go with it.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-21 15:43         ` Sam Watkins
@ 2009-10-21 16:11           ` Russ Cox
  2009-10-21 16:37             ` Sam Watkins
  2009-10-21 18:01           ` ron minnich
  2009-10-28 15:37           ` matt
  2 siblings, 1 reply; 90+ messages in thread
From: Russ Cox @ 2009-10-21 16:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Can you give one example of a slow task that you think cannot benefit much from
> parallel processing?

Rebuilding a venti index is almost entirely I/O bound.
You can have as many cores as you want and they
will all be sitting idle waiting for the disks.  Parallel
processing helps only to the extent that you can run
the disks in parallel, and they're not multiplying quite
as fast as processor cores.

> Perhaps you have a couple of videos to recode?  Then you can achieve
> close to 100% utilization.

http://www.dilbert.com/strips/comic/2008-12-13/

Russ


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-20  2:16       ` matt
  2009-10-20  9:15         ` Steve Simon
@ 2009-10-21 15:43         ` Sam Watkins
  2009-10-21 16:11           ` Russ Cox
                             ` (2 more replies)
  1 sibling, 3 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-21 15:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I wrote:
>I calculated roughly that encoding a 2-hour video could be parallelized by a
>factor of perhaps 20 trillion, using pipelining and divide-and-conquer

On Tue, Oct 20, 2009 at 03:16:22AM +0100, matt wrote:
> I know you are using video / audio encoding as an example and there are
> probably datasets that make sense but in this case, what use is it?

I was using it to work out the *maximum* extent to which a common task can be
parallelized.  20-trillion-fold is the answer I came up with.  Someone was
talking about Ahmdal's Law and saying that having large numbers of processors
is not much use because Ahmdal's Law limits their utilization.  I disagree.

In reality 10,000 processing units might be a more sensible number to have than
20 trillion.  If you have ever done H264 video encoding on a PC you would know
that it is very slow, even normal mpeg encoding is barely faster than real time
on a 1Ghz PC.  Few people like having to wait 2 hours for a task to complete.

This whole argument / discussion has come out of nowhere since it appears Ken's
original comment was criticising the normal sort of multi-core systems, and he
is more in favor of other approaches like FPGA.  I fully agree with that.

> You can't watch 2 hours of video per second and you can't write it to disk
> fast enough to empty the pipeline.

If I had a computer with 20 trillion processing units capable of recoding 2
billion hours of video per second, I would have superior storage media and IO
systems to go with it.  The system I described could encode 2 BILLION hours of
video per second, not 2 hours per second.

> You've got to feed in 2 hours of source material - 820Gb per stream, how?

I suppose some sort of parallel bus of wires or optic fibres.  If I have
massively parallel processing I would want massively parallel IO to go with it.
I.e. something like "read data starting from here" -> "here it is streaming one
megabit in parallel down the bus at 1Ghz over 1 million channels"

> Once you have your uncompressed stream, MPEG-2 encoding requires seeking
> through the time dimension with keyframes every n frames and out of order
> macro blocks, so we have to wait for n frames to be composited.  For the best
> quality the datarate is unconstrained on the first processing run and then
> macro blocks best-fitted and re-ordered on the second to match the desired
> output datarate, but again, this is n frames at a time.
>
> Amdahl is punching you in the face every time you say "see, it's easy".

I'm no expert on video encoding but it seems to me you are assuming I would
approach it the conventional stupid serial way.  With massively parallel
processing one could "seek" through the time dimension simply by comparing data
from all time offsets at once in parallel.

Can you give one example of a slow task that you think cannot benefit much from
parallel processing?  video is an extremely obvious example of one that
certainly does benefit from just about as much parallel processing as you can
throw at it, so I'm surprised you would argue about it.  Probably my "20
trillion" upset you or something, it seems you didn't get my point.

It might have been better to consider a simpler example, such as frequency
analysis of audio data to perform pitch correction (for out of tune singers).

I can write a simple shell script using ffmpeg to do h264 video encoding which
would take advantage of perhaps 720 "cores" to encode a two hour video in 10
second chunks with barely any Ahmdal effects, running the encoding over a LAN.
A server should be able to pipe the whole 800Mb input - I am assuming it is
already encoded in xvid or something - over the network in about 10 seconds on
a gigabit (or faster) network.  Each participating computer will receive the
chunk of data it needs.  The encoding would take perhaps 30 seconds for the 10
seconds of video on each of 720 1Ghz computers.  And another 10 seconds to pipe
the data back to the server.  Concatenating the video should take very little
time, although perhaps the mp4 format is not the best for that, I'm not sure.

The entire operation takes 50 seconds as opposed to 6 hours (21600 seconds).
With my 721 computers I achieve a 432 times speed up.  Ahmdal is not sucking up
much there, only a little for transferring data around.  And each computer
could be doing something else while waiting for its chunk of data to arrive,
the total actual utilization can be over 99%.  People do this stuff every day.
Have you heard of a render-farm?

This applies for all Ahmdal arguments - if part of the system is idle due to
serial constraints in the algorithm, it could likely be working on something
else.  Perhaps you have a couple of videos to recode?  Then you can achieve
close to 100% utilization.  The time taken for a single task may be limited by
the method or the hardware, but a batch of several tasks can be achieved close
to N times faster if you have N processors/computers.

I'm not sure why I'm wasting time writing about this, it's obvious anyway.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-20  2:16       ` matt
@ 2009-10-20  9:15         ` Steve Simon
  2009-10-21 15:43         ` Sam Watkins
  1 sibling, 0 replies; 90+ messages in thread
From: Steve Simon @ 2009-10-20  9:15 UTC (permalink / raw)
  To: 9fans

> Add into that the datarate of full 10 bit uncompressed 1920x1080/60i HD
> is 932Mbit so your 1Ghz clockspeed might not be fast enough to play it :)

Not sure I agree, I think its worse than that:

1920pixels * 1080lines * 30 frames/sec * 20bits/sample in YUV
=> 1.244Gbps

Also, if you want to encode live material you have bigger problems. encoders have
pipeline delay but this must be limited, usually to a few hundred millisecods.

This means you can only decompose the stream into a few frames which you can
run on seperate cpus. Spatial decomposition of the frames helps too but this is
much more difficult to do well - i.e. to ensure you cannot see the joins.

-Steve



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-20  2:11 ` erik quanstrom
@ 2009-10-20  2:33   ` matt
  0 siblings, 0 replies; 90+ messages in thread
From: matt @ 2009-10-20  2:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


>this is quite an astounding thread.  you brought
>up clock speed doubling and now refute yourself.
>
>i just noted that 48ghz is not possible with silicon
>non-quantium effect tech.
>
>- erik
>
>
>
I think I've been misunderstood, I wasn't asserting the clock speed
increase in the first place, I was hoping to demonstrate what would have
happened if Moore's law was the often misquoted "speed doubles every 2
years" when measured in Ghz (not flops as noted by Eris)





^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-20  1:58     ` Eris Discordia
@ 2009-10-20  2:17       ` matt
  0 siblings, 0 replies; 90+ messages in thread
From: matt @ 2009-10-20  2:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Eris Discordia wrote:

>> "Moore's law doesn't say anything about speed or power.
>
>
> But why'd you assume "people in the wrong" (w.r.t. their understanding
> of Moore's law) would measure "speed" in gigahertz rather than MIPS or
> FLOPS?
>
because that's what the discussion I was having was about



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 15:57     ` Sam Watkins
  2009-10-19 16:03       ` ron minnich
  2009-10-19 16:46       ` Russ Cox
@ 2009-10-20  2:16       ` matt
  2009-10-20  9:15         ` Steve Simon
  2009-10-21 15:43         ` Sam Watkins
  2 siblings, 2 replies; 90+ messages in thread
From: matt @ 2009-10-20  2:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Sam Watkins wrote:

>I calculated roughly that encoding a 2-hour video could be parallelized by a
>factor of perhaps 20 trillion, using pipelining and divide-and-conquer, with a
>longest path length of 10000 operations in series.  Such a system running at
>1Ghz could encode a single 2-hour video in 1/100000 second (latency), or 2
>billion hours of video per second (throughput).
>
>
>
>
I know you are using video / audio encoding as an example and there are
probably datasets that make sense but in this case, what use is it?

You can't watch 2 hours of video per second and you can't write it to
disk fast enough to empty the pipeline. So you'll process all the video
and then sit there keeping it powered while you wait to do something
with it. I suppose you could keep filtering it.

Add into that the datarate of full 10 bit uncompressed 1920x1080/60i HD
is 932Mbit so your 1Ghz clockspeed might not be fast enough to play it :)

You've got to feed in 2 hours of source material - 820Gb per stream, how?

Once you have your uncompressed stream, MPEG-2 encoding requires seeking
through the time dimension with keyframes every n frames and out of
order macro blocks, so we have to wait for n frames to be composited.
For the best quality the datarate is unconstrained on the first
processing run and then macro blocks best-fitted and re-ordered on the
second to match the desired output datarate, but again, this is n frames
at a time.

Amdahl is punching you in the face every time you say "see, it's easy".






^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<4ADD147A.4090801@maht0x0r.net>
@ 2009-10-20  2:11 ` erik quanstrom
  2009-10-20  2:33   ` matt
  0 siblings, 1 reply; 90+ messages in thread
From: erik quanstrom @ 2009-10-20  2:11 UTC (permalink / raw)
  To: 9fans

> >you motivated me to find my copy of _high speed
> >semiconductor devices_, s.m. sze, ed., 1990.
> >
> >
> >
> which motivated me to dig out the post I made elsewhere :
>
> "Moore's law doesn't say anything about speed or power. It says
> manufacturing costs will lower from technological improvements such that
> the reasonably priced transistor count in an IC will double every 2 years.

this is quite an astounding thread.  you brought
up clock speed doubling and now refute yourself.

i just noted that 48ghz is not possible with silicon
non-quantium effect tech.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-20  1:38   ` matt
@ 2009-10-20  1:58     ` Eris Discordia
  2009-10-20  2:17       ` matt
  0 siblings, 1 reply; 90+ messages in thread
From: Eris Discordia @ 2009-10-20  1:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> "Moore's law doesn't say anything about speed or power.

But why'd you assume "people in the wrong" (w.r.t. their understanding of
Moore's law) would measure "speed" in gigahertz rather than MIPS or FLOPS?



--On Tuesday, October 20, 2009 02:38 +0100 matt <maht-9fans@maht0x0r.net>
wrote:

> erik quanstrom wrote:
>
>>
>> you motivated me to find my copy of _high speed
>> semiconductor devices_, s.m. sze, ed., 1990.
>>
>>
>>
> which motivated me to dig out the post I made elsewhere :
>
> "Moore's law doesn't say anything about speed or power. It says
> manufacturing costs will lower from technological improvements such that
> the reasonably priced transistor count in an IC will double every 2 years.
>
> And here's a pretty graph
> http://en.wikipedia.org/wiki/File:Transistor_Count_and_Moore%27s_Law_-_20
> 08.svg
>
> The misunderstanding makes people who say such twaddle as "Moore's Law,
> the founding axiom behind Intel, that chips get exponentially faster".
>
> If we pretend that 2 years = double speed then roughly :
> The 1993 66Mhz P1 would now be running at 16.9Ghz
> The 1995 200Mhz Pentium now would be 25.6Ghz
> The 1997 300Mhz Pentium now would be 19.2Ghz
> The 1999 500Mhz Pentium now would be 16Ghz
> The 2000 1.3Ghz Pentium now would be 20Ghz
> The 2002 2.2Ghz Pentium would now be 35Ghz
> The 2002 3.06Ghz Pentium would be going on 48Ghz by Xmas
>
> If you plot speed vs year for Pentiums you get two straight lines with a
> change in gradient in 1999 with the introduction of the P4"
>
>
>




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 16:13 ` erik quanstrom
  2009-10-19 18:23   ` tlaronde
@ 2009-10-20  1:38   ` matt
  2009-10-20  1:58     ` Eris Discordia
  1 sibling, 1 reply; 90+ messages in thread
From: matt @ 2009-10-20  1:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

erik quanstrom wrote:

>
>you motivated me to find my copy of _high speed
>semiconductor devices_, s.m. sze, ed., 1990.
>
>
>
which motivated me to dig out the post I made elsewhere :

"Moore's law doesn't say anything about speed or power. It says
manufacturing costs will lower from technological improvements such that
the reasonably priced transistor count in an IC will double every 2 years.

And here's a pretty graph
http://en.wikipedia.org/wiki/File:Transistor_Count_and_Moore%27s_Law_-_2008.svg

The misunderstanding makes people who say such twaddle as "Moore's Law,
the founding axiom behind Intel, that chips get exponentially faster".

If we pretend that 2 years = double speed then roughly :
The 1993 66Mhz P1 would now be running at 16.9Ghz
The 1995 200Mhz Pentium now would be 25.6Ghz
The 1997 300Mhz Pentium now would be 19.2Ghz
The 1999 500Mhz Pentium now would be 16Ghz
The 2000 1.3Ghz Pentium now would be 20Ghz
The 2002 2.2Ghz Pentium would now be 35Ghz
The 2002 3.06Ghz Pentium would be going on 48Ghz by Xmas

If you plot speed vs year for Pentiums you get two straight lines with a
change in gradient in 1999 with the introduction of the P4"





^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<20091019182352.GA1688@polynum.com>
@ 2009-10-19 18:48 ` erik quanstrom
  0 siblings, 0 replies; 90+ messages in thread
From: erik quanstrom @ 2009-10-19 18:48 UTC (permalink / raw)
  To: 9fans

totally ot.  sorry.

> > 1.  p. 8. "the most promising devices are quantum effect
> > devices."  (none are currently in use in processors.)
>
> Since quantics means unpredictable, I think that we see more and more
> quantum effects in hardware and software. So, I beg to disagree ;)

you may not fully appreciate what is meant by
quantum effect.  example devices are: resonant-
tunneling transistors, quantum wires and dots.

they are definately not unpredictable.  they are
probabilistic and one can build very useful devices
with them.

in my misguided youth i worked on building an
808nm laser out of quaternary semis and such
quantum structures.  wierd stuff.

there is no fundamental reason one couldn't build
a computer with rt transistors.  here's a rt xor structure
http://www.hindawi.com/journals/vlsi/2009/803974.html

this stuff is insanely hard, and probablly the mythical
"twenty-years out"; it's just not Si-friendly.  and nobody
wants (can afford) to deal with GaAs let alone the funky
quateraries. but if we make any real break throughs in
computing, it'll likely be based on quantum effect.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 16:13 ` erik quanstrom
@ 2009-10-19 18:23   ` tlaronde
  2009-10-20  1:38   ` matt
  1 sibling, 0 replies; 90+ messages in thread
From: tlaronde @ 2009-10-19 18:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Oct 19, 2009 at 12:13:34PM -0400, erik quanstrom wrote:
>
> 1.  p. 8. "the most promising devices are quantum effect
> devices."  (none are currently in use in processors.)

Since quantics means unpredictable, I think that we see more and more
quantum effects in hardware and software. So, I beg to disagree ;)

--
Thierry Laronde (Alceste) <tlaronde +AT+ polynum +dot+ com>
                 http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 17:30     ` ron minnich
  2009-10-19 17:57       ` W B Hacker
@ 2009-10-19 18:14       ` David Leimbach
  1 sibling, 0 replies; 90+ messages in thread
From: David Leimbach @ 2009-10-19 18:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

On Mon, Oct 19, 2009 at 10:30 AM, ron minnich <rminnich@gmail.com> wrote:

> On Mon, Oct 19, 2009 at 9:34 AM, Sam Watkins <sam@nipl.net> wrote:
>
> > The "processors" (actually smaller processing units) would mostly be
> configured
> > at load time, much like an FPGA.  Most units would execute a single
> simple
> > operation repeatedly on streams of data, they would not read instructions
> and
> > execute them sequentially like a normal CPU.
> >
> > The data would travel through the system step by step, it would mostly
> not need
> > to be stored in RAM.  If some RAM was needed, it would be small amounts
> on
> > chip, at appropriate places in the pipeline.
> >
> > Some programs (not so much video encoding I think) do need a lot of RAM
> for
> > intermediate calculations, or IO for example to fetch stuff from a
> database.
> > Such systems can also be designed as networks of simple processing units
> > connected by data streams / pipelines.
>
> I think we could connect them with hyperbarrier technology. Basically
> we would use the Jeffreys tube, and exploit Bell's theorem and quantum
> entanglement. Then we could blitz the snarf with the babble, tie it
> all together with a blotz, and we're done.
>
> ron
>
> As Sir Robin said in the Holy Grail just before getting tossed off The
Bridge of Death.
"that's EASY!!"

[-- Attachment #2: Type: text/html, Size: 1836 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 17:30     ` ron minnich
@ 2009-10-19 17:57       ` W B Hacker
  2009-10-19 18:14       ` David Leimbach
  1 sibling, 0 replies; 90+ messages in thread
From: W B Hacker @ 2009-10-19 17:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

ron minnich wrote:
> On Mon, Oct 19, 2009 at 9:34 AM, Sam Watkins <sam@nipl.net> wrote:
>
>> The "processors" (actually smaller processing units) would mostly be configured
>> at load time, much like an FPGA.  Most units would execute a single simple
>> operation repeatedly on streams of data, they would not read instructions and
>> execute them sequentially like a normal CPU.
>>
>> The data would travel through the system step by step, it would mostly not need
>> to be stored in RAM.  If some RAM was needed, it would be small amounts on
>> chip, at appropriate places in the pipeline.
>>
>> Some programs (not so much video encoding I think) do need a lot of RAM for
>> intermediate calculations, or IO for example to fetch stuff from a database.
>> Such systems can also be designed as networks of simple processing units
>> connected by data streams / pipelines.
>
> I think we could connect them with hyperbarrier technology. Basically
> we would use the Jeffreys tube, and exploit Bell's theorem and quantum
> entanglement. Then we could blitz the snarf with the babble, tie it
> all together with a blotz, and we're done.
>
> ron
>
>

Sounds magical.

Can any of that approach be used to address Plan9's shortage of drivers and such?

Bill

(Ducks and waddles away....)

;-)



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 16:34   ` Sam Watkins
@ 2009-10-19 17:30     ` ron minnich
  2009-10-19 17:57       ` W B Hacker
  2009-10-19 18:14       ` David Leimbach
  0 siblings, 2 replies; 90+ messages in thread
From: ron minnich @ 2009-10-19 17:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Oct 19, 2009 at 9:34 AM, Sam Watkins <sam@nipl.net> wrote:

> The "processors" (actually smaller processing units) would mostly be configured
> at load time, much like an FPGA.  Most units would execute a single simple
> operation repeatedly on streams of data, they would not read instructions and
> execute them sequentially like a normal CPU.
>
> The data would travel through the system step by step, it would mostly not need
> to be stored in RAM.  If some RAM was needed, it would be small amounts on
> chip, at appropriate places in the pipeline.
>
> Some programs (not so much video encoding I think) do need a lot of RAM for
> intermediate calculations, or IO for example to fetch stuff from a database.
> Such systems can also be designed as networks of simple processing units
> connected by data streams / pipelines.

I think we could connect them with hyperbarrier technology. Basically
we would use the Jeffreys tube, and exploit Bell's theorem and quantum
entanglement. Then we could blitz the snarf with the babble, tie it
all together with a blotz, and we're done.

ron



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 15:57     ` Sam Watkins
  2009-10-19 16:03       ` ron minnich
@ 2009-10-19 16:46       ` Russ Cox
  2009-10-20  2:16       ` matt
  2 siblings, 0 replies; 90+ messages in thread
From: Russ Cox @ 2009-10-19 16:46 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> My point is, one can design systems to solve practical problems that use almost
> arbitrarily large numbers of processing units running in parallel.

design != build

russ


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 16:05 ` erik quanstrom
@ 2009-10-19 16:34   ` Sam Watkins
  2009-10-19 17:30     ` ron minnich
  0 siblings, 1 reply; 90+ messages in thread
From: Sam Watkins @ 2009-10-19 16:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Oct 19, 2009 at 12:05:19PM -0400, erik quanstrom wrote:
> > Details of the calculation: 7200 seconds * 30fps * 12*16 (50*50 pixel
> > chunks) * 500000 elementary arithmetic/logical operations in a pipeline
> > (unrolled).  7200*30*12*16*500000 = 20 trillion (20,000,000,000,000)
> > processing units.  This is only a very rough estimate and does not consider
> > all the issues.
>
> could you do a similar calcuation for the memory bandwidth required to
> deliver said instructions to the processors?

The "processors" (actually smaller processing units) would mostly be configured
at load time, much like an FPGA.  Most units would execute a single simple
operation repeatedly on streams of data, they would not read instructions and
execute them sequentially like a normal CPU.

The data would travel through the system step by step, it would mostly not need
to be stored in RAM.  If some RAM was needed, it would be small amounts on
chip, at appropriate places in the pipeline.

Some programs (not so much video encoding I think) do need a lot of RAM for
intermediate calculations, or IO for example to fetch stuff from a database.
Such systems can also be designed as networks of simple processing units
connected by data streams / pipelines.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<4ADC7439.3060502@maht0x0r.net>
@ 2009-10-19 16:13 ` erik quanstrom
  2009-10-19 18:23   ` tlaronde
  2009-10-20  1:38   ` matt
  0 siblings, 2 replies; 90+ messages in thread
From: erik quanstrom @ 2009-10-19 16:13 UTC (permalink / raw)
  To: 9fans

> I ran the numbers the other day based on sped doubles every 2 years, a
> 60Mhz Pentium would be running 16Ghz by now
> I think it was the 1ghz that should be 35ghz

you motivated me to find my copy of _high speed
semiconductor devices_, s.m. sze, ed., 1990.

there might be one our two little problems with
chips that speed that have nothing to do with
power — make that cooling.

0.  frequency prop electron mobility prop 1/eff. bandgap.
unfortunately there's a lower limit on the band gap —
kT, thermal energy.

1.  p. 8. "the most promising devices are quantum effect
devices."  (none are currently in use in processors.)

2.  p. 192, "...device size will continue to be limited by
hot-electron damage."  oops.

that fills one with confidence, doesn't it?

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<20091019155738.GB13857@nipl.net>
@ 2009-10-19 16:05 ` erik quanstrom
  2009-10-19 16:34   ` Sam Watkins
  0 siblings, 1 reply; 90+ messages in thread
From: erik quanstrom @ 2009-10-19 16:05 UTC (permalink / raw)
  To: 9fans

> Details of the calculation: 7200 seconds * 30fps * 12*16 (50*50 pixel chunks) *
> 500000 elementary arithmetic/logical operations in a pipeline (unrolled).
> 7200*30*12*16*500000 = 20 trillion (20,000,000,000,000) processing units.
> This is only a very rough estimate and does not consider all the issues.

could you do a similar calcuation for the memory
bandwidth required to deliver said instructions to
the processors?

if you add that to the memory bandwith required
to move the data around, what kind of memory architecture
do you propose to move this much data around?

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 15:57     ` Sam Watkins
@ 2009-10-19 16:03       ` ron minnich
  2009-10-19 16:46       ` Russ Cox
  2009-10-20  2:16       ` matt
  2 siblings, 0 replies; 90+ messages in thread
From: ron minnich @ 2009-10-19 16:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Oct 19, 2009 at 8:57 AM, Sam Watkins <sam@nipl.net> wrote:

> This is only a very rough estimate and does not consider all the issues.

well that part is right anyway.

ron



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-18  1:12         ` Roman Shaposhnik
  2009-10-19 14:14           ` matt
@ 2009-10-19 16:00           ` Sam Watkins
  1 sibling, 0 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-19 16:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Oct 18, 2009 at 01:12:58AM +0000, Roman Shaposhnik wrote:
> I would appreciate if the folks who were in the room correct me, but if I'm
> not mistaken Ken was alluding to some FPGA work/ideas that he had done
> and my interpretation of his comments was that if we *really* want to
> make things parallel we have to bite the bullet, ditch multicore and rethink
> our strategy.

Certainly, I agree that normal multi-core is not the best approach, FPGA systems
or similar could run a lot faster.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-17 18:45   ` Eris Discordia
  2009-10-17 21:07     ` Steve Simon
@ 2009-10-19 15:57     ` Sam Watkins
  2009-10-19 16:03       ` ron minnich
                         ` (2 more replies)
  1 sibling, 3 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-19 15:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, Oct 17, 2009 at 07:45:40PM +0100, Eris Discordia wrote:
> Another embarrassingly parallel problem, as Sam Watkins pointed out, arises
> in digital audio processing.

The pipelining + divide-and-conquer method which I would use for parallel
systems is much like a series of production lines in a large factory.

I calculated roughly that encoding a 2-hour video could be parallelized by a
factor of perhaps 20 trillion, using pipelining and divide-and-conquer, with a
longest path length of 10000 operations in series.  Such a system running at
1Ghz could encode a single 2-hour video in 1/100000 second (latency), or 2
billion hours of video per second (throughput).

Details of the calculation: 7200 seconds * 30fps * 12*16 (50*50 pixel chunks) *
500000 elementary arithmetic/logical operations in a pipeline (unrolled).
7200*30*12*16*500000 = 20 trillion (20,000,000,000,000) processing units.
This is only a very rough estimate and does not consider all the issues.

The "slow" latency of 1/100000 second to encode a video is due to Ahmdal's Law,
assuming a longest path of 10000 operations.  The throughput of 2 billion hours
of video per second would be achieved by pipelining.  The throughput is not
limited by Ahmdal's Law, as a longer pipeline/network holds more data.

Ahmdal's Law gives us a lower limit for the time taken to perform a task with
some serial components; but does not limit the throughput of a pipelining
system, the throughput is simply one data unit per clock cycle.

In reality, it would be hard to build such a system, and one would prefer a
system with much less parallelization.  However, the human brain does contain
100 billion neurons, and electronic units can be smaller than neurons.

My point is, one can design systems to solve practical problems that use almost
arbitrarily large numbers of processing units running in parallel.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 15:26       ` Sam Watkins
  2009-10-19 15:33         ` andrey mirtchovski
@ 2009-10-19 15:50         ` ron minnich
  1 sibling, 0 replies; 90+ messages in thread
From: ron minnich @ 2009-10-19 15:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Oct 19, 2009 at 8:26 AM, Sam Watkins <sam@nipl.net> wrote:
> On Fri, Oct 16, 2009 at 12:18:47PM -0600, Latchesar Ionkov wrote:
>> How do you plan to feed data to these 31 thousand processors so they
>> can be fully utilized? Have you done the calculations and checked what
>> memory bandwidth would you need for that?
>
> I would use a pipelining + divide-and-conquer approach, with some RAM on chip.
> Units would be smaller than a 6502, more like an adder.

I'm not convinced. Lucho just dropped a well known hard problem in
your lap (one he deals with every day) but your reply sounds like
handwaving.

This stuff is harder than it sounds. Unless you're ready to come up
with a simulation of your claim -- and it had better be a pretty good
one -- I don't think anybody is going to be buying.

If you're going to just have adders, for example, you're going to have
to explain where the instruction sequencing happens. If there's only
one sequencer, then you're going to have to explain why you have not
just reinvented the CM-2 or similar MPP.

Again, this stuff is quantifiable. A pipeline implies a clock rate.
Divide and conquer implies fanout. Where are the numbers?

ron



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 15:26       ` Sam Watkins
@ 2009-10-19 15:33         ` andrey mirtchovski
  2009-10-19 15:50         ` ron minnich
  1 sibling, 0 replies; 90+ messages in thread
From: andrey mirtchovski @ 2009-10-19 15:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I would use a pipelining + divide-and-conquer approach, with some RAM on chip.
> Units would be smaller than a 6502, more like an adder.

you mean like the Thinking Machines CM-1 and CM-2?

it's not like it hasn't been done before :)



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 18:18     ` Latchesar Ionkov
@ 2009-10-19 15:26       ` Sam Watkins
  2009-10-19 15:33         ` andrey mirtchovski
  2009-10-19 15:50         ` ron minnich
  0 siblings, 2 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-19 15:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Oct 16, 2009 at 12:18:47PM -0600, Latchesar Ionkov wrote:
> How do you plan to feed data to these 31 thousand processors so they
> can be fully utilized? Have you done the calculations and checked what
> memory bandwidth would you need for that?

I would use a pipelining + divide-and-conquer approach, with some RAM on chip.
Units would be smaller than a 6502, more like an adder.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-19 13:44 ` erik quanstrom
@ 2009-10-19 14:36   ` David Leimbach
  0 siblings, 0 replies; 90+ messages in thread
From: David Leimbach @ 2009-10-19 14:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 947 bytes --]

On Mon, Oct 19, 2009 at 6:44 AM, erik quanstrom <quanstro@quanstro.net>wrote:

> > At the hardware level we do have message passing between a
> > processor and the memory controller -- this is exactly the
> > same as talking to a shared server and has the same issues of
> > scaling etc. If you have very few clients, a single shared
> > server is indeed a cost effective solution.
>
> just to repeat myself in a context that hopefully makes things
> clearer:  sometimes we don't admit it's a network.  and that's
> not always a bad thing.
>
> - erik
>
> Yes, we abstract things so it doesn't look like it is... so we can have a
programming model where we don't have to care about keeping all the
distributed bits in sync.
However, I get the feeling that those abstractions, at any level, suffer
from the same weaknesses.   Well I think that's why certain RISC instruction
sets have instructions like eieio  anyway :-)

Dave

[-- Attachment #2: Type: text/html, Size: 1316 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-18  1:12         ` Roman Shaposhnik
@ 2009-10-19 14:14           ` matt
  2009-10-19 16:00           ` Sam Watkins
  1 sibling, 0 replies; 90+ messages in thread
From: matt @ 2009-10-19 14:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


>
>The  misinterpretation of Moore's Law is to blame here, of course: Moore
>is a smart guy and he was talking about transistor density, but pop culture
>made is sound like he was talking speed up. For some time the two were
>in lock-step. Not anymore.
>
>
I ran the numbers the other day based on sped doubles every 2 years, a
60Mhz Pentium would be running 16Ghz by now
I think it was the 1ghz that should be 35ghz




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<20091018031508.717CE5B30@mail.bitblocks.com>
@ 2009-10-19 13:44 ` erik quanstrom
  2009-10-19 14:36   ` David Leimbach
  0 siblings, 1 reply; 90+ messages in thread
From: erik quanstrom @ 2009-10-19 13:44 UTC (permalink / raw)
  To: 9fans

> At the hardware level we do have message passing between a
> processor and the memory controller -- this is exactly the
> same as talking to a shared server and has the same issues of
> scaling etc. If you have very few clients, a single shared
> server is indeed a cost effective solution.

just to repeat myself in a context that hopefully makes things
clearer:  sometimes we don't admit it's a network.  and that's
not always a bad thing.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-18 19:18                 ` Bakul Shah
@ 2009-10-18 20:12                   ` ron minnich
  0 siblings, 0 replies; 90+ messages in thread
From: ron minnich @ 2009-10-18 20:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Since we seem to be having the parallel programming discussion again
please look at this:
https://asc.llnl.gov/sequoia/benchmarks/

The summaries are interesting.

ron



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-18 13:22               ` Roman Shaposhnik
@ 2009-10-18 19:18                 ` Bakul Shah
  2009-10-18 20:12                   ` ron minnich
  0 siblings, 1 reply; 90+ messages in thread
From: Bakul Shah @ 2009-10-18 19:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, 18 Oct 2009 06:22:33 PDT Roman Shaposhnik <roman@shaposhnik.org>  wrote:
> On Sun, Oct 18, 2009 at 6:06 AM, Roman Shaposhnik <shaposhnik@gmail.com> wrot
> e
> >> It is. But what's your proposal on code sharing? All those PC
> >> registers belonging to
> >> different cores have to point somewhere. Is that somewhere is not shared m
> e=
> >> mory
> >> the code has to be put there for every single core, right?
> >
> > At the hardware level we do have message passing between a
> > processor and the memory controller -- this is exactly the
> > same as talking to a shared server and has the same issues of
> > scaling etc. If you have very few clients, a single shared
> > server is indeed a cost effective solution.
>
> I guess I'm not following. My question to OP was strictly about
> code sharing. Basically were do the cores get instructions from
> if not from shared memory.

Sorry, I should've done a better job of editing.  I was
really responding to the OP's point that sharing memory
between processes is a stupid approach. My point was that
"sharing memory" is just a low level programming interface
(implemented by message passing in h/w) and it makes sense at
some scale.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-18  2:09         ` Jason Catena
@ 2009-10-18 16:02           ` Dave Eckhardt
  0 siblings, 0 replies; 90+ messages in thread
From: Dave Eckhardt @ 2009-10-18 16:02 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> See cons/scons.
>
> Thanks for the suggestion.  In this project someone actually
> made that same suggestion, but rudely?basically insulting the
> very thought that someone would be stupid enough to base a
> build system for commercial software on make.

The non-Plan 9 world suffers from several structural problems
which have undermined make's original model.  A big one is
file systems with routine large clock skew, and another is the
N-layers-deep (for large N) nature (build libraries to build
tools to build libraries to build applications) which is
considered reasonable, or at least unavoidable.

Combining that last one with the absence of namespaces makes
the problem truly painful in ways which (I think) stretch it
outside of the make model.  It's possible to "make it work"
with enough thrust, but I think people who have done that once
and then tried cons/scons think the change is worthwhile.  Cons
was written by somebody who was in charge of "strap enough thrust
onto make" twice and he wrote it to address exactly the problems
he saw so he could skip past that part at startup #3.

> Am I expected to complicate my project management tool with
> python, just to get it to rebuild if a file dependency's date
> changes at all, rather than only if the file dependency has
> a newer date?

Cons and scons get you more than that.  Few make systems notice
when your compiler has changed out from under you.  With gcc's
drift rate that could be particularly valuable :-)

> What's wrong with a little language these days?

Personally I don't find make as typically "augmented" by m4 plus
3,000-line shell scripts to qualify as a "little language".  But
YMMV and this isn't a make-vs-cons list.

Dave Eckhardt



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found]             ` <e763acc10910180606q1312ff7cw9a465d6af39c0fbe@mail.gmail.com>
@ 2009-10-18 13:22               ` Roman Shaposhnik
  2009-10-18 19:18                 ` Bakul Shah
  0 siblings, 1 reply; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-18 13:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Oct 18, 2009 at 6:06 AM, Roman Shaposhnik <shaposhnik@gmail.com> wrote
>> It is. But what's your proposal on code sharing? All those PC
>> registers belonging to
>> different cores have to point somewhere. Is that somewhere is not shared me=
>> mory
>> the code has to be put there for every single core, right?
>
> At the hardware level we do have message passing between a
> processor and the memory controller -- this is exactly the
> same as talking to a shared server and has the same issues of
> scaling etc. If you have very few clients, a single shared
> server is indeed a cost effective solution.

I guess I'm not following. My question to OP was strictly about
code sharing. Basically were do the cores get instructions from
if not from shared memory.

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-17 21:18       ` Eric Van Hensbergen
@ 2009-10-18  8:48         ` Eris Discordia
  0 siblings, 0 replies; 90+ messages in thread
From: Eris Discordia @ 2009-10-18  8:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Could be wrong, but I think he's referring to the SPURS Engine:
> http://en.wikipedia.org/wiki/SpursEngine

I had never seen that but I had encountered news on the Leadtek card based
on it.

--On Saturday, October 17, 2009 16:18 -0500 Eric Van Hensbergen
<ericvh@gmail.com> wrote:

> Could be wrong, but I think he's referring to the SPURS Engine:
> http://en.wikipedia.org/wiki/SpursEngine
>
>           -eric
>
> On Oct 17, 2009, at 4:07 PM, Steve Simon wrote:
>
>>> I'm a tiny fish, this is the ocean. Nevertheless, I venture: there
>>> are
>>> already Cell-based expansion cards out there for "real-time"
>>> H.264/VC-1/MPEG-4 AVC encoding. Meaning, 1080p video in, H.264
>>> stream out,
>>> "real-time."
>>
>> Interesting, 1080p? you have a link?
>>
>> -Steve
>>
>
>







^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-17 21:07     ` Steve Simon
  2009-10-17 21:18       ` Eric Van Hensbergen
@ 2009-10-18  8:44       ` Eris Discordia
  1 sibling, 0 replies; 90+ messages in thread
From: Eris Discordia @ 2009-10-18  8:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Interesting, 1080p? you have a link?

The one I read long ago:
<http://www.anandtech.com/video/showdoc.aspx?i=3339>

First Google "sponsored link:"
<http://www.haivision.com/products/mako-hd>
(This one's an industrial rackmounted machine. No expansion card.)

BadaBoom is just software that uses CUDA:
<http://www.badaboomit.com/node/4>

"Real-time" performance with CUDA can be achieved on (not-so-)recent
Cell-based GPUs.

BadaBoom did make a boom in fansubbing community. Every group wants an
"encoding officer" with either an i7 or a highly performing GPU. Custom
builds of x264 (the most widely used software codec at the moment) already
can take advantage of multi-core in encoding.


--On Saturday, October 17, 2009 22:07 +0100 Steve Simon
<steve@quintile.net> wrote:

>> I'm a tiny fish, this is the ocean. Nevertheless, I venture: there are
>> already Cell-based expansion cards out there for "real-time"
>> H.264/VC-1/MPEG-4 AVC encoding. Meaning, 1080p video in, H.264 stream
>> out,  "real-time."
>
> Interesting, 1080p? you have a link?
>
> -Steve
>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-18  1:15         ` Roman Shaposhnik
@ 2009-10-18  3:15           ` Bakul Shah
       [not found]             ` <e763acc10910180606q1312ff7cw9a465d6af39c0fbe@mail.gmail.com>
  0 siblings, 1 reply; 90+ messages in thread
From: Bakul Shah @ 2009-10-18  3:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, 18 Oct 2009 01:15:45 -0000 Roman Shaposhnik <roman@shaposhnik.org>  wrote:
> On Thu, Oct 15, 2009 at 10:53 AM, Sam Watkins <sam@nipl.net> wrote:
> > On Wed, Oct 14, 2009 at 06:50:28PM -0700, Roman Shaposhnik wrote:
> >> > The mention that "... the overhead of cache coherence restricts the ab=
> ility
> >> > to scale up to even 80 cores" is also eye openeing. If we're at aprox =
> 8
> >> > cores today, thats only 5 yrs away (if we double cores every
> >> > 1.5 yrs).
> >
> > Sharing the memory between processes is a stupid approach to multi-proces=
> sing /
> > multi-threading. =A0Modern popular computer architecture and software des=
> ign is
> > fairly much uniformly stupid.

> It is. But what's your proposal on code sharing? All those PC
> registers belonging to
> different cores have to point somewhere. Is that somewhere is not shared me=
> mory
> the code has to be put there for every single core, right?

Different technoglogies/techniques make sense at different
levels of scaling and at different points in time so sharing
memory is not necessarily stupid -- unless one thinks that
any compromise (to produce usable solutions in a realistic
time frame) is stupid.

At the hardware level we do have message passing between a
processor and the memory controller -- this is exactly the
same as talking to a shared server and has the same issues of
scaling etc. If you have very few clients, a single shared
server is indeed a cost effective solution.

When you absolutely have to share state, somebody has to
mediate access to the shared state and you can't get around
the fact that it's going to cost you.  But if you know
something about the patterns of sharing, you can get away
from a single shared memory & increase concurrency.  A simple
example is a h/w fifo (to connect producer/consumer but you
also gave up some flexibility).

As the number of processors increases on a device, sharing
state between neighbors will be increasingly cheaper compared
any global sharing. Even if you use message passing, messages
between near neighbors will be far cheaper than between
processors in different neighboorhoods. So switching to
message passing is not going to fix things; you have to worry
about placement as well (just like in h/w design).



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-17 20:58       ` Dave Eckhardt
@ 2009-10-18  2:09         ` Jason Catena
  2009-10-18 16:02           ` Dave Eckhardt
  0 siblings, 1 reply; 90+ messages in thread
From: Jason Catena @ 2009-10-18  2:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> One thing complicating this is that make and its common
>> variants aren't smart enough to handle the case where
>> version control systems regress a file and present an
>> earlier date not newer than the derived object.
>
> See cons/scons.

Thanks for the suggestion.  In this project someone actually made that
same suggestion, but rudely—basically insulting the very thought that
someone would be stupid enough to base a build system for commercial
software on make. (Right in line with gnu bias, I thought at the time:
forceful and disrespectful is no way to make change happen, even if
your target was previously inclined your way.)

In any event it's not compatible with the speedup tool we selected.
Which brings up the unnecessary additional complexity of embedding a
dependency analysis and shell-command tool in a general language.  Am
I expected to complicate my project management tool with python, just
to get it to rebuild if a file dependency's date changes at all,
rather than only if the file dependency has a newer date?  What's
wrong with a little language these days?  I don't think python needs a
file system dependency analysis engine any more than make needs a full
language around it.

I'd rather store the date of every leaf file on the dependency tree,
and in the next build delete any objects derived from a file with a
different date.  At least that's a consistent programming metaphor.

Even the academic project managers out there don't try to mind-merge a
general language.  For example, Odin complicates make's syntax and
execution, almost introducing a type system for targets.  This makes
it very tricky to generate and edit makefiles dynamically in an
existing system, since (IIRC) you have to reload the whole ruleset if
you change something.

> Dave Eckhardt

Jason Catena



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 10:53       ` Sam Watkins
                           ` (2 preceding siblings ...)
  2009-10-15 13:11         ` hiro
@ 2009-10-18  1:15         ` Roman Shaposhnik
  2009-10-18  3:15           ` Bakul Shah
  3 siblings, 1 reply; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-18  1:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Oct 15, 2009 at 10:53 AM, Sam Watkins <sam@nipl.net> wrote:
> On Wed, Oct 14, 2009 at 06:50:28PM -0700, Roman Shaposhnik wrote:
>> > The mention that "... the overhead of cache coherence restricts the ability
>> > to scale up to even 80 cores" is also eye openeing. If we're at aprox 8
>> > cores today, thats only 5 yrs away (if we double cores every
>> > 1.5 yrs).
>
> Sharing the memory between processes is a stupid approach to multi-processing /
> multi-threading.  Modern popular computer architecture and software design is
> fairly much uniformly stupid.

It is. But what's your proposal on code sharing? All those PC
registers belonging to
different cores have to point somewhere. Is that somewhere is not shared memory
the code has to be put there for every single core, right?

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 17:21       ` Sam Watkins
  2009-10-16 23:39         ` Nick LaForge
@ 2009-10-18  1:12         ` Roman Shaposhnik
  2009-10-19 14:14           ` matt
  2009-10-19 16:00           ` Sam Watkins
  1 sibling, 2 replies; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-18  1:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Oct 16, 2009 at 5:21 PM, Sam Watkins <sam@nipl.net> wrote:
> On Thu, Oct 15, 2009 at 04:21:16PM +0100, roger peppe wrote:
>> BTW it seems the gates quote is false:
>>
>> http://en.wikiquote.org/wiki/Bill_Gates
>
> maybe the Ken quote is false too - hard to believe he's that out of touch

I think the reverse is true -- the fact that he was asking these questions
(and again -- he was asking them wrt. garden variety way of doing multicore
with a special emphasis on *desktops*) makes him very much in touch
with reality, unlike most folks who think that once they get 65535 core they
would run 65535 times faster.

The  misinterpretation of Moore's Law is to blame here, of course: Moore
is a smart guy and he was talking about transistor density, but pop culture
made is sound like he was talking speed up. For some time the two were
in lock-step. Not anymore.

I would appreciate if the folks who were in the room correct me, but if I'm
not mistaken Ken was alluding to some FPGA work/ideas that he had done
and my interpretation of his comments was that if we *really* want to
make things parallel we have to bite the bullet, ditch multicore and rethink
our strategy.

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-18  0:06     ` ron minnich
@ 2009-10-18  0:54       ` Roman Shaposhnik
  0 siblings, 0 replies; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-18  0:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Oct 18, 2009 at 12:06 AM, ron minnich <rminnich@gmail.com> wrote:
> the use of qualitative terms such as "embarassingly parallel" often
> leads to confusion.
>
> Scaling can be measured. It can be quantified. Nothing scales forever,
> because at some point you want to get an answer back to a person,
> and/or the components of the app need to talk to each other. It's
> these basic timing elements that can tell you a lot about scaling.
> Actually running the app tells you a bit more, of course.
>
> Even the really easy apps hit a wall sooner or later. I still remember
> the struggle I had to scale a simple app to a 16 node cluster in the
> early days (1992). That was a long time ago and we've gone a lot
> further than that, but you'd be surprised just how hard it can be,
> even with "easy" applications.

Can't agree more. I'd say the biggest problem I have with
"embarassingly parallel"
is the fact that it conjures up images of linear increase of speedup. Nobody
ever does math or even experiments to see how quickly we reach point
of diminishing returns.

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found]   ` <A90043D02D52B2CBF2804FA4@192.168.1.2>
@ 2009-10-18  0:06     ` ron minnich
  2009-10-18  0:54       ` Roman Shaposhnik
  0 siblings, 1 reply; 90+ messages in thread
From: ron minnich @ 2009-10-18  0:06 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

the use of qualitative terms such as "embarassingly parallel" often
leads to confusion.

Scaling can be measured. It can be quantified. Nothing scales forever,
because at some point you want to get an answer back to a person,
and/or the components of the app need to talk to each other. It's
these basic timing elements that can tell you a lot about scaling.
Actually running the app tells you a bit more, of course.

Even the really easy apps hit a wall sooner or later. I still remember
the struggle I had to scale a simple app to a 16 node cluster in the
early days (1992). That was a long time ago and we've gone a lot
further than that, but you'd be surprised just how hard it can be,
even with "easy" applications.

ron



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-17 21:07     ` Steve Simon
@ 2009-10-17 21:18       ` Eric Van Hensbergen
  2009-10-18  8:48         ` Eris Discordia
  2009-10-18  8:44       ` Eris Discordia
  1 sibling, 1 reply; 90+ messages in thread
From: Eric Van Hensbergen @ 2009-10-17 21:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Could be wrong, but I think he's referring to the SPURS Engine:
http://en.wikipedia.org/wiki/SpursEngine

          -eric

On Oct 17, 2009, at 4:07 PM, Steve Simon wrote:

>> I'm a tiny fish, this is the ocean. Nevertheless, I venture: there
>> are
>> already Cell-based expansion cards out there for "real-time"
>> H.264/VC-1/MPEG-4 AVC encoding. Meaning, 1080p video in, H.264
>> stream out,
>> "real-time."
>
> Interesting, 1080p? you have a link?
>
> -Steve
>




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-17 18:45   ` Eris Discordia
@ 2009-10-17 21:07     ` Steve Simon
  2009-10-17 21:18       ` Eric Van Hensbergen
  2009-10-18  8:44       ` Eris Discordia
  2009-10-19 15:57     ` Sam Watkins
  1 sibling, 2 replies; 90+ messages in thread
From: Steve Simon @ 2009-10-17 21:07 UTC (permalink / raw)
  To: 9fans

> I'm a tiny fish, this is the ocean. Nevertheless, I venture: there are
> already Cell-based expansion cards out there for "real-time"
> H.264/VC-1/MPEG-4 AVC encoding. Meaning, 1080p video in, H.264 stream out,
> "real-time."

Interesting, 1080p? you have a link?

-Steve



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 21:17     ` Jason Catena
@ 2009-10-17 20:58       ` Dave Eckhardt
  2009-10-18  2:09         ` Jason Catena
  0 siblings, 1 reply; 90+ messages in thread
From: Dave Eckhardt @ 2009-10-17 20:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> One thing complicating this is that make and its common
> variants aren't smart enough to handle the case where
> version control systems regress a file and present an
> earlier date not newer than the derived object.

See cons/scons.

Dave Eckhardt



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 13:27 ` erik quanstrom
  2009-10-15 13:40   ` Richard Miller
  2009-10-16 17:20   ` Sam Watkins
@ 2009-10-17 18:45   ` Eris Discordia
  2009-10-17 21:07     ` Steve Simon
  2009-10-19 15:57     ` Sam Watkins
       [not found]   ` <A90043D02D52B2CBF2804FA4@192.168.1.2>
  3 siblings, 2 replies; 90+ messages in thread
From: Eris Discordia @ 2009-10-17 18:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> There is a vast range of applications that cannot
>> be managed in real time using existing single-core technology.
>
> please name one.

I'm a tiny fish, this is the ocean. Nevertheless, I venture: there are
already Cell-based expansion cards out there for "real-time"
H.264/VC-1/MPEG-4 AVC encoding. Meaning, 1080p video in, H.264 stream out,
"real-time." I can imagine a large market for this in broadcasting,
netcasting, simulcasting industry. Simulcasting in particular is a prime
application. Station X in Japan broadcasts a popular animated series in
1080i, while US licensor of the same content simulcasts for subscribers
through its web interface. This applies all the more to live feeds.

What seems to go ignored here is the class of embarrassingly parallel
problems which--while they may or may not be important to CS people, I
don't know--appear in many areas of applied computing. I know one person
working at an institute of the Max Planck Society who regularly runs a few
hundred instances of the same program (doing some sort of matrix
calculation for a problem in physics) with different input. He certainly
could benefit from a hundred cores inside his desktop computing platform
_if_ fitting that many cores in there wouldn't cause latencies larger than
the network latencies he currently experiences (at the moment he uses a job
manager that controls a cluster). "INB4" criticism, his input matrices are
small and his work is compute-intensive rather than memory-intensive.

Another embarrassingly parallel problem, as Sam Watkins pointed out, arises
in digital audio processing. I might add to his example of applying a
filter to sections of one track the example of applying the same or
different filters to multiple tracks at once. Multitrack editing was/is a
killer application of digital audio. Multitrack video editing, too. I
believe video/audio processing software were among the first applications
for "workstation"-class desktops that were parallelized.

By the way, I learnt about embarrassingly parallel problems from that same
Max Planck research fellow who runs embarrassingly parallel matrix
calculations.



--On Thursday, October 15, 2009 09:27 -0400 erik quanstrom
<quanstro@quanstro.net> wrote:

> On Thu Oct 15 06:55:24 EDT 2009, sam@nipl.net wrote:
>> task.  With respect to Ken, Bill Gates said something along the lines of
>> "who would need more than 640K?".
>
> on the other hand, there were lots of people using computers with 4mb
> of memory when bill gates said this.  it was quite easy to see how to use
> more than 1mb at the time.  in fact, i believe i used an apple ][ around
> that time that had ~744k.  it was a wierd amount of memory.
>
>> There is a vast range of applications that cannot
>> be managed in real time using existing single-core technology.
>
> please name one.
>
> - erik
>




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 17:03           ` Sam Watkins
  2009-10-16 18:17             ` ron minnich
@ 2009-10-17 12:42             ` Roman Shaposhnik
  1 sibling, 0 replies; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-17 12:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Oct 16, 2009 at 10:03 AM, Sam Watkins <sam@nipl.net> wrote:
> On Thu, Oct 15, 2009 at 12:50:48PM +0100, Richard Miller wrote:
>> > It's easy to write good code that will take advantage of arbitrarily many
>> > processors to run faster / smoother, if you have a proper language for the
>> > task.
>>
>> ... and if you can find a way around Amdahl's law (qv).
>
> "The speedup of a program using multiple processors in parallel computing is
> limited by the time needed for the sequential fraction of the program."
>
> So it would only be a problem supposing that a significant part of the program
> is unparallelizable.  I can think of many many tasks where "Amdahl's law" is
> not going to be a problem at all, for a properly designed system.

Lets do a little math, shall we? Better yet, lets graph it:
   http://en.wikipedia.org/wiki/File:AmdahlsLaw.svg

Now, do you see what's on the right side of X axis? That's
right 65536 cores. Pause and appreciate the measeleness
of speedup...

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 17:21       ` Sam Watkins
@ 2009-10-16 23:39         ` Nick LaForge
  2009-10-18  1:12         ` Roman Shaposhnik
  1 sibling, 0 replies; 90+ messages in thread
From: Nick LaForge @ 2009-10-16 23:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> maybe the Ken quote is false too - hard to believe he's that out of touch

The whole table was ganging up on Roman and his crazy idea, I believe
;).  The objection mostly was to Intel dumping the complexity of
another core on the programmer after it ran out of steam in containing
parallelism within the pipeline.

Even though Inferno / CSP / Erlang / etc. type people were clearly
anxious to make use of parallelism at the level of multiple processor
cores, I don't think the average Java programmer was.

(That's not to say that Java programmers hadn't been asking for a rude
awakening.  Perhaps someday, they will also learn what
'Object-Oriented' programming is. ☺)

Nick



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<d50d7d460910161417w45b5c675p8740315aaf6861f@mail.gmail.com>
@ 2009-10-16 22:25 ` erik quanstrom
  0 siblings, 0 replies; 90+ messages in thread
From: erik quanstrom @ 2009-10-16 22:25 UTC (permalink / raw)
  To: 9fans

i missed this the first time

On Fri Oct 16 17:19:36 EDT 2009, jason.catena@gmail.com wrote:
> > Instantaneous building of a complex project from source.
> > (I'm defining instantaneous as less than 1 second for this.)
>
> Depends on how complex.

good story.  it's hard to know when to rewrite.

gcc itself has several files that take ~20s to compile on my
machine.  what is the plan for getting them to compile in < 1s?

also, suppose you have n source files.  and suppose you also
just happen to have n+1 processors.  what's the plan for coordinating
them in sub O(n) time?  what's the plan for a fs to keep up?
heck, linux boot time is bottlenecked not by processor speed
but by lowly random disk i/o.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 17:20   ` Sam Watkins
  2009-10-16 18:18     ` Latchesar Ionkov
@ 2009-10-16 21:17     ` Jason Catena
  2009-10-17 20:58       ` Dave Eckhardt
  1 sibling, 1 reply; 90+ messages in thread
From: Jason Catena @ 2009-10-16 21:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Instantaneous building of a complex project from source.
> (I'm defining instantaneous as less than 1 second for this.)

Depends on how complex.  I spent two years retrofitting a commercial
parallel make (which only promises a 20x speedup, even with dedicated
hardware) into the build system of a telecommunications product.  In
retrospect, it would have taken less time to write a new build system
with parallelism designed into it, but it seemed less risky to be
incremental.

There are a lot of dependencies in a complex project.  Bundles wrap up
a set of files which include executable tasks composed of libraries
(linked from their own objects derived from source code) and their own
source code: some hand-coded, and some derived from object-oriented
models, lexical analyzers and compiler-compilers, and message-passing
code generators (it can take a surprisingly long time to generate
optimized C code with a functional language).

Compile some of this for an ordinary unixy platform, some for any
platform which supports java, some for systems without a filesystem
where all code runs in the same space as the kernel.  Each unit of
code wants its own options; all code is expected to honor any global
options; build system should not restrict porting code between
platforms with different build processes (or produce any delay in the
schedule at all;).

All of these factors influence the build time of a project, in a
complex web of dependencies, even after you write or modify all the
build tools to be reentrant so you can run them all at once.

The most effective build strategy I've found is avoidance: just don't
build what you don't have to, and make sure you only build something
once.  One thing complicating this is that make and its common
variants aren't smart enough to handle the case where version control
systems regress a file and present an earlier date not newer than the
derived object.

In a nutshell, my experience is that unless developers abandon all the
fancy tools that supposedly make it easier for them to write mountains
of brittle, special-purpose, especially model-generated code, the tool
chain created by these dependencies will defeat efforts to make it run
faster in parallel.  So all your extra processors will only be useful
for running many of these heavy builds in parallel, as you try to have
each developer build and test before integration.

> Sam

Jason Catena



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 18:17             ` ron minnich
@ 2009-10-16 18:39               ` Wes Kussmaul
  0 siblings, 0 replies; 90+ messages in thread
From: Wes Kussmaul @ 2009-10-16 18:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

ron minnich wrote:

> Insignificant
> bits of code that were not even visible suddenly dominate the time.

Reminds me of some project development teams.

Maybe Marvin Minsky was on to something.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 17:20   ` Sam Watkins
@ 2009-10-16 18:18     ` Latchesar Ionkov
  2009-10-19 15:26       ` Sam Watkins
  2009-10-16 21:17     ` Jason Catena
  1 sibling, 1 reply; 90+ messages in thread
From: Latchesar Ionkov @ 2009-10-16 18:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

How do you plan to feed data to these 31 thousand processors so they
can be fully utilized? Have you done the calculations and checked what
memory bandwidth would you need for that?

There are reasons Pentium 4 has the performance you mention, but these
reasons don't necessary include the "great hulking piece of crap"
statement.

Thanks,
    Lucho

On Fri, Oct 16, 2009 at 11:20 AM, Sam Watkins <sam@nipl.net> wrote:
> I shouldn't have to explain how powerful something like this could be.  31000
> 8-bit 6502 processors running at 1Ghz, fully utilized, could achieve over 7
> trillion 32-bit integer operations per second.  That is over 7000 times more
> powerful than a pentium 4 having the same number of transistors.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-16 17:03           ` Sam Watkins
@ 2009-10-16 18:17             ` ron minnich
  2009-10-16 18:39               ` Wes Kussmaul
  2009-10-17 12:42             ` Roman Shaposhnik
  1 sibling, 1 reply; 90+ messages in thread
From: ron minnich @ 2009-10-16 18:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Oct 16, 2009 at 10:03 AM, Sam Watkins <sam@nipl.net> wrote:

> So it would only be a problem supposing that a significant part of the program
> is unparallelizable.  I can think of many many tasks where "Amdahl's law" is
> not going to be a problem at all, for a properly designed system.
>
> For example if I had a thousand processors I might raytrace complex scenes for
> an animated game at 100 fps, or do complex dsp over a 2 hour audio track in one
> millisecond.

Yes, if you had that few processors, it might seem easy.

It gets somewhat harder when you have, say, 128,000. Insignificant
bits of code that were not even visible suddenly dominate the time.

ron



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 15:21     ` roger peppe
@ 2009-10-16 17:21       ` Sam Watkins
  2009-10-16 23:39         ` Nick LaForge
  2009-10-18  1:12         ` Roman Shaposhnik
  0 siblings, 2 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-16 17:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Oct 15, 2009 at 04:21:16PM +0100, roger peppe wrote:
> BTW it seems the gates quote is false:
>
> http://en.wikiquote.org/wiki/Bill_Gates

maybe the Ken quote is false too - hard to believe he's that out of touch



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 13:27 ` erik quanstrom
  2009-10-15 13:40   ` Richard Miller
@ 2009-10-16 17:20   ` Sam Watkins
  2009-10-16 18:18     ` Latchesar Ionkov
  2009-10-16 21:17     ` Jason Catena
  2009-10-17 18:45   ` Eris Discordia
       [not found]   ` <A90043D02D52B2CBF2804FA4@192.168.1.2>
  3 siblings, 2 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-16 17:20 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> > There is a vast range of applications that cannot
> > be managed in real time using existing single-core technology.
>
> please name one.

Your apparent lack of imagination surprises me.

Surely you can see that a whole range of applications becomes possible when
using a massively parallel system, when compared to a single-CPU system.  You
could perhaps also achieve these applications using a large network of 1000
normal computers, but that would be expensive and use a lot of space.

I named two in another post: real-time animated raytracing, and instantaneous
complex dsp over a long audio track.  I'll also mention instantaneous video
encoding.  Instantaneous building of a complex project from source.
(I'm defining instantaneous as less than 1 second for this.)

There are also qualitatively different applications, such as effective
computer vision, which can be achieved with parallel systems.  The operation of
animal eyes and brains is obviously massively parallel.

A 6502 cpu could achieve a lot in its day with 4000 transistors at 2Mhz.
A pentium 4 has 125 million transistors.  So, with modern IC tech and excluding
the necessary networking and RAM etc on the chip, one could put 31000 6502
processors on a single chip using pentium 4 integration technology, and I
suppose you could also clock it up to perhaps 1 Ghz.

I shouldn't have to explain how powerful something like this could be.  31000
8-bit 6502 processors running at 1Ghz, fully utilized, could achieve over 7
trillion 32-bit integer operations per second.  That is over 7000 times more
powerful than a pentium 4 having the same number of transistors.

We have 31000 times denser ICs today, and at least 500 times higher clock
speeds, but I do not see a 15.5 million times improvement in computer
performance when comparing a 6502 to a pentium 4!  That is because pentium 4 is
a great hulking piece of crap and a waste of transistors.

I could easily think of another hundred applications for parallel systems, but
I'm sure that if you're intelligent enough to understand what I am saying you
can think of your own examples.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 11:50         ` Richard Miller
  2009-10-15 12:00           ` W B Hacker
@ 2009-10-16 17:03           ` Sam Watkins
  2009-10-16 18:17             ` ron minnich
  2009-10-17 12:42             ` Roman Shaposhnik
  1 sibling, 2 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-16 17:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Oct 15, 2009 at 12:50:48PM +0100, Richard Miller wrote:
> > It's easy to write good code that will take advantage of arbitrarily many
> > processors to run faster / smoother, if you have a proper language for the
> > task.
>
> ... and if you can find a way around Amdahl's law (qv).

"The speedup of a program using multiple processors in parallel computing is
limited by the time needed for the sequential fraction of the program."

So it would only be a problem supposing that a significant part of the program
is unparallelizable.  I can think of many many tasks where "Amdahl's law" is
not going to be a problem at all, for a properly designed system.

For example if I had a thousand processors I might raytrace complex scenes for
an animated game at 100 fps, or do complex dsp over a 2 hour audio track in one
millisecond.

I suppose most difficult/interesting tasks can be parallelized effectively.
Seems that Amdahl's law is a minor issue.  Of course if you are trying to run
old-fashioned sequential programs on a parallel machine you will not benefit.
You would need to rewrite them.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 18:28 ` Christopher Nielsen
@ 2009-10-15 18:55   ` W B Hacker
  0 siblings, 0 replies; 90+ messages in thread
From: W B Hacker @ 2009-10-15 18:55 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Christopher Nielsen wrote:
> I think this is an interesting approach.
>
> There are several interesting ideas being pursued here. The focus of
> the discussion has been on the multikernel approach, which I think has
> merit.
>
> Something that has not been discussed here is the wide use of DSLs for
> systems programming, and using haskell to write a framework for
> rapidly developing and proving correctness of DSLs. This is just as
> significant as the multikernel ideas.
>
> I downloaded the source, built the system, and will be playing with it.
>
> Thoughts?

Their 'plan' for security needs a recce as well.

Message-passing-based 'creatures' - kernel-level or otherwise - have their own
challenges in this regard (Windows, to name one bad example). Likewise, though
I've only just started looking at it, if 'Haiku' even *has* a security model, I
am (still, yet) blissfuly unaware of it...

Bill Hacker

>
> On Wed, Oct 14, 2009 at 12:09, Tim Newsham <newsham@lava.net> wrote:
>> Rethinking multi-core systems as distributed heterogeneous
>> systems.  Thoughts?
>>
>> http://www.sigops.org/sosp/sosp09/papers/baumann-sosp09.pdf
>>
>> Tim Newsham
>> http://www.thenewsh.com/~newsham/
>>
>>
>
>
>




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 19:09 Tim Newsham
  2009-10-14 19:54 ` Roman Shaposhnik
@ 2009-10-15 18:28 ` Christopher Nielsen
  2009-10-15 18:55   ` W B Hacker
  1 sibling, 1 reply; 90+ messages in thread
From: Christopher Nielsen @ 2009-10-15 18:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I think this is an interesting approach.

There are several interesting ideas being pursued here. The focus of
the discussion has been on the multikernel approach, which I think has
merit.

Something that has not been discussed here is the wide use of DSLs for
systems programming, and using haskell to write a framework for
rapidly developing and proving correctness of DSLs. This is just as
significant as the multikernel ideas.

I downloaded the source, built the system, and will be playing with it.

Thoughts?

On Wed, Oct 14, 2009 at 12:09, Tim Newsham <newsham@lava.net> wrote:
> Rethinking multi-core systems as distributed heterogeneous
> systems.  Thoughts?
>
> http://www.sigops.org/sosp/sosp09/papers/baumann-sosp09.pdf
>
> Tim Newsham
> http://www.thenewsh.com/~newsham/
>
>



-- 
Christopher Nielsen
"They who can give up essential liberty for temporary
safety, deserve neither liberty nor safety." --Benjamin Franklin



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15  3:59           ` Eric Van Hensbergen
@ 2009-10-15 17:39             ` Tim Newsham
  0 siblings, 0 replies; 90+ messages in thread
From: Tim Newsham @ 2009-10-15 17:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> it sounds like the kernel (L4-like, supposedly tuned to the specific
>> hardware) and the "monitor" (userland, portable) are shared, from
>> the paper.
>
> I'm confused what you mean by "shared".

ugh, I completely botched that.. I meant "replicated" not "shared".

>     -eric

Tim Newsham
http://www.thenewsh.com/~newsham/



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<3e1162e60910150805q2ea3f682w688299a39274051c@mail.gmail.com>
@ 2009-10-15 15:28 ` erik quanstrom
  0 siblings, 0 replies; 90+ messages in thread
From: erik quanstrom @ 2009-10-15 15:28 UTC (permalink / raw)
  To: 9fans

On Thu Oct 15 11:06:41 EDT 2009, leimy2k@gmail.com wrote:

> On Thu, Oct 15, 2009 at 6:11 AM, hiro <23hiro@googlemail.com> wrote:
>
> > > There is a vast range of applications that cannot
> > > be managed in real time using existing single-core technology.
> >
> > I'm sorry to interrupt your discussion, but what is real time?
> >

that's a sly one for the fortune file.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 15:07   ` David Leimbach
@ 2009-10-15 15:21     ` roger peppe
  2009-10-16 17:21       ` Sam Watkins
  0 siblings, 1 reply; 90+ messages in thread
From: roger peppe @ 2009-10-15 15:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

BTW it seems the gates quote is false:

http://en.wikiquote.org/wiki/Bill_Gates



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 13:52 ` erik quanstrom
@ 2009-10-15 15:07   ` David Leimbach
  2009-10-15 15:21     ` roger peppe
  0 siblings, 1 reply; 90+ messages in thread
From: David Leimbach @ 2009-10-15 15:07 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On Thu, Oct 15, 2009 at 6:52 AM, erik quanstrom <quanstro@quanstro.net>wrote:

> On Thu Oct 15 09:41:29 EDT 2009, 9fans@hamnavoe.com wrote:
> > > in fact, i believe i used an apple ][ around
> > > that time that had ~744k.
> >
> > Are you sure that was an apple II? When I bought mine I remember
> > wrestling with the decision over whether to get the standard 48k of
> > RAM or upgrade to the full 64k.  This was long before the IBM PC.
>
> iirc, it had an odd add in card that accounted for almost all the
> memory in the system.  it wasn't emabled by default.
>
> - erik
>
>
Was this an Apple ][ GS?  I had one with an add on board with I think 1 MB
of RAM.

People still have those things on the internet... there's ethernet adapters
for em and a TCP/IP stack.

http://www.apple2.org/marinetti/index.html

Dave

[-- Attachment #2: Type: text/html, Size: 1379 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 13:11         ` hiro
@ 2009-10-15 15:05           ` David Leimbach
  0 siblings, 0 replies; 90+ messages in thread
From: David Leimbach @ 2009-10-15 15:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

On Thu, Oct 15, 2009 at 6:11 AM, hiro <23hiro@googlemail.com> wrote:

> > There is a vast range of applications that cannot
> > be managed in real time using existing single-core technology.
>
> I'm sorry to interrupt your discussion, but what is real time?
>
>
Real time just means "fast enough to work properly".  You can throw all
kinds of other crap on top of that and say things about scheduling
requirements and timeslices within which a process must complete, and duty
cycles, but those are just things to look at to figure out if your system is
"fast enough to work properly"

Dave

[-- Attachment #2: Type: text/html, Size: 932 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<4AD70EE9.1010208@conducive.org>
@ 2009-10-15 13:52 ` erik quanstrom
  0 siblings, 0 replies; 90+ messages in thread
From: erik quanstrom @ 2009-10-15 13:52 UTC (permalink / raw)
  To: 9fans

On Thu Oct 15 08:01:29 EDT 2009, wbh@conducive.org wrote:
> Richard Miller wrote:
> >> It's easy to write good code that will take advantage of arbitrarily many
> >> processors to run faster / smoother, if you have a proper language for the
> >> task.
> >
> > ... and if you can find a way around Amdahl's law (qv).
> >
> >
> >
>
> http://www.cis.temple.edu/~shi/docs/amdahl/amdahl.html

the author is hard to search for.  http://en.wikipedia.org/wiki/Yuanshi_Era

perhaps i misread the paper,  but i think it boils down to
chopping up a O(n^2) can give you super-linear speedups—
no big surprise.  and there might be better algorithims.
(no examples given.)  but you're still going to fall of a cliff when
you run out of processors.  and you don't get rid of the
sequential part.

the problem i see with this approach is (a) you need a lot
of processors to do this.  if p is the number of processors then
the speedup for a quadratic algorithm would be

	 n^2 / (sum_{1 .. p} (n/p)^2
		= n^2 / p(n/p)^2
		= p

so if you want a order of magnitude speedup *for a given n*
you need 10 processors.  but of course the number of processors
needed goes as O(n^2).  bummer.  oh, and we haven't considered
communication overhead.

and (b) the paper claims that all network communication is worst
case O(n) §3¶8.  we know this to be false.  consider n+1 computers with
an ethernet connection connected by a switch.  suppose that
computers 0-n are ready to send a result back to n and each
blast away at the network's limit.  it's going to take more time
to get the data back in n -> 1 configuration than it would to
get the same data back in an 1:1 configuration because the
switch will drop packets.  even assuming pause frames, the
switching time can't be zero.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<207092dc429fe476c2046d537aeaa400@hamnavoe.com>
@ 2009-10-15 13:52 ` erik quanstrom
  2009-10-15 15:07   ` David Leimbach
  0 siblings, 1 reply; 90+ messages in thread
From: erik quanstrom @ 2009-10-15 13:52 UTC (permalink / raw)
  To: 9fans

On Thu Oct 15 09:41:29 EDT 2009, 9fans@hamnavoe.com wrote:
> > in fact, i believe i used an apple ][ around
> > that time that had ~744k.
>
> Are you sure that was an apple II? When I bought mine I remember
> wrestling with the decision over whether to get the standard 48k of
> RAM or upgrade to the full 64k.  This was long before the IBM PC.

iirc, it had an odd add in card that accounted for almost all the
memory in the system.  it wasn't emabled by default.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 13:27 ` erik quanstrom
@ 2009-10-15 13:40   ` Richard Miller
  2009-10-16 17:20   ` Sam Watkins
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 90+ messages in thread
From: Richard Miller @ 2009-10-15 13:40 UTC (permalink / raw)
  To: 9fans

> in fact, i believe i used an apple ][ around
> that time that had ~744k.

Are you sure that was an apple II? When I bought mine I remember
wrestling with the decision over whether to get the standard 48k of
RAM or upgrade to the full 64k.  This was long before the IBM PC.




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
       [not found] <<20091015105328.GA18947@nipl.net>
@ 2009-10-15 13:27 ` erik quanstrom
  2009-10-15 13:40   ` Richard Miller
                     ` (3 more replies)
  0 siblings, 4 replies; 90+ messages in thread
From: erik quanstrom @ 2009-10-15 13:27 UTC (permalink / raw)
  To: 9fans

On Thu Oct 15 06:55:24 EDT 2009, sam@nipl.net wrote:
> task.  With respect to Ken, Bill Gates said something along the lines of "who
> would need more than 640K?".

on the other hand, there were lots of people using computers with 4mb
of memory when bill gates said this.  it was quite easy to see how to use
more than 1mb at the time.  in fact, i believe i used an apple ][ around
that time that had ~744k.  it was a wierd amount of memory.

> There is a vast range of applications that cannot
> be managed in real time using existing single-core technology.

please name one.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 10:53       ` Sam Watkins
  2009-10-15 11:50         ` Richard Miller
  2009-10-15 11:56         ` Josh Wood
@ 2009-10-15 13:11         ` hiro
  2009-10-15 15:05           ` David Leimbach
  2009-10-18  1:15         ` Roman Shaposhnik
  3 siblings, 1 reply; 90+ messages in thread
From: hiro @ 2009-10-15 13:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> There is a vast range of applications that cannot
> be managed in real time using existing single-core technology.

I'm sorry to interrupt your discussion, but what is real time?



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 11:50         ` Richard Miller
@ 2009-10-15 12:00           ` W B Hacker
  2009-10-16 17:03           ` Sam Watkins
  1 sibling, 0 replies; 90+ messages in thread
From: W B Hacker @ 2009-10-15 12:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Richard Miller wrote:
>> It's easy to write good code that will take advantage of arbitrarily many
>> processors to run faster / smoother, if you have a proper language for the
>> task.
>
> ... and if you can find a way around Amdahl's law (qv).
>
>
>

http://www.cis.temple.edu/~shi/docs/amdahl/amdahl.html





^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 10:53       ` Sam Watkins
  2009-10-15 11:50         ` Richard Miller
@ 2009-10-15 11:56         ` Josh Wood
  2009-10-15 13:11         ` hiro
  2009-10-18  1:15         ` Roman Shaposhnik
  3 siblings, 0 replies; 90+ messages in thread
From: Josh Wood @ 2009-10-15 11:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On Oct 15, 2009, at 3:53 AM, Sam Watkins wrote:

> With respect to Ken, Bill Gates said something along the lines of "who
> would need more than 640K?"

With respect to Ken, from Roman's report, you only know that he asked
a question. Roman was the one without an answer, and no one echoed
Gates's arbitrary limit.

-Josh




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15 10:53       ` Sam Watkins
@ 2009-10-15 11:50         ` Richard Miller
  2009-10-15 12:00           ` W B Hacker
  2009-10-16 17:03           ` Sam Watkins
  2009-10-15 11:56         ` Josh Wood
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 90+ messages in thread
From: Richard Miller @ 2009-10-15 11:50 UTC (permalink / raw)
  To: 9fans

> It's easy to write good code that will take advantage of arbitrarily many
> processors to run faster / smoother, if you have a proper language for the
> task.

... and if you can find a way around Amdahl's law (qv).




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15  1:50     ` Roman Shaposhnik
  2009-10-15  2:12       ` Eric Van Hensbergen
@ 2009-10-15 10:53       ` Sam Watkins
  2009-10-15 11:50         ` Richard Miller
                           ` (3 more replies)
  1 sibling, 4 replies; 90+ messages in thread
From: Sam Watkins @ 2009-10-15 10:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Oct 14, 2009 at 06:50:28PM -0700, Roman Shaposhnik wrote:
> > The mention that "... the overhead of cache coherence restricts the ability
> > to scale up to even 80 cores" is also eye openeing. If we're at aprox 8
> > cores today, thats only 5 yrs away (if we double cores every
> > 1.5 yrs).

Sharing the memory between processes is a stupid approach to multi-processing /
multi-threading.  Modern popular computer architecture and software design is
fairly much uniformly stupid.

> A couple of years ago we had a Plan9 summit @Google campus and Ken was
> there. I still remember the question he asked me: what exactly would you make
> all those core do on your desktop?

It's easy to write good code that will take advantage of arbitrarily many
processors to run faster / smoother, if you have a proper language for the
task.  With respect to Ken, Bill Gates said something along the lines of "who
would need more than 640K?".  There is a vast range of applications that cannot
be managed in real time using existing single-core technology.

Sam



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15  3:32         ` Tim Newsham
@ 2009-10-15  3:59           ` Eric Van Hensbergen
  2009-10-15 17:39             ` Tim Newsham
  0 siblings, 1 reply; 90+ messages in thread
From: Eric Van Hensbergen @ 2009-10-15  3:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On Oct 14, 2009, at 9:32 PM, Tim Newsham wrote:

>> If you look at the core of Barrelfish, you'll see that this is
>> essentially what they are doing -- essentially using an extremely
>> small microkernel (like L4) that's very
>> efficient at various forms of message passing.  That's the only
>> thing that is duplicated on the various cores.  The services
>> themselves can be distributed
>> and/or replicated as appropriate (although their approach favors
>> replication) -- it all depends on the characteristics of the
>> workload.
>
> it sounds like the kernel (L4-like, supposedly tuned to the specific
> hardware) and the "monitor" (userland, portable) are shared, from
> the paper.

I'm confused what you mean by "shared".
The monitor is replicated on every core as it is responsible for
coordination amongst the cores - some things are replicated while
others are coordinated.  They do choose to replicate most things as
part of their core scalability argument, in an effort to reduce lock
contention to centralized resources.

(from section 4.4): On each core, replicated data structures, such as
memory alloca- tion tables and address space mappings, are kept
globally consistent by means of an agreement protocol run by the
monitors. Application requests that access global state are handled by
the monitors, which mediate access to remote copies of state.

       -eric



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15  2:17       ` Eric Van Hensbergen
@ 2009-10-15  3:32         ` Tim Newsham
  2009-10-15  3:59           ` Eric Van Hensbergen
  0 siblings, 1 reply; 90+ messages in thread
From: Tim Newsham @ 2009-10-15  3:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> If you look at the core of Barrelfish, you'll see that this is essentially
> what they are doing -- essentially using an extremely small microkernel (like
> L4) that's very
> efficient at various forms of message passing.  That's the only thing that is
> duplicated on the various cores.  The services themselves can be distributed
> and/or replicated as appropriate (although their approach favors replication)
> -- it all depends on the characteristics of the workload.

it sounds like the kernel (L4-like, supposedly tuned to the specific
hardware) and the "monitor" (userland, portable) are shared, from
the paper.  Btw, they have the source code up for free
(http://www.barrelfish.org/release_20090914.html) which I supposed
could be used to more definitively answer these questions with
some effort...

>    -eric

Tim Newsham
http://www.thenewsh.com/~newsham/



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15  2:05     ` Roman Shaposhnik
@ 2009-10-15  2:17       ` Eric Van Hensbergen
  2009-10-15  3:32         ` Tim Newsham
  0 siblings, 1 reply; 90+ messages in thread
From: Eric Van Hensbergen @ 2009-10-15  2:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On Oct 14, 2009, at 8:05 PM, Roman Shaposhnik wrote:

>> And how does one deal with heterogeneous cores and complex on chip
>> interconnect topologies?
>
> Good question. Do they have to be heterogeneous? My oppinion is that
> the
> future of big multicore will be more Cell-like.
>

They don't have to be, but that is part of both the multikernel and
satellite kernel vision.

>> There's no real evdence that single kernels do well with hundreds
>> of real
>> cores (as opposed to hw threads) - in fact most of the data I've
>> seen is to
>> the contrary.
>
> Agreed. But then, again, you don't really want a kernel for anything
> but message
> passing in such an architecture (the other function of the kernel --
> multiplexing
> I/O is only needed on selected few cores) at which point it really
> becomes a
> misnomer to even call it a kernel -- a thin hypervisor perhaps...
>

If you look at the core of Barrelfish, you'll see that this is
essentially what they are doing -- essentially using an extremely
small microkernel (like L4) that's very
efficient at various forms of message passing.  That's the only thing
that is duplicated on the various cores.  The services themselves can
be distributed
and/or replicated as appropriate (although their approach favors
replication) -- it all depends on the characteristics of the workload.

      -eric



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-15  1:50     ` Roman Shaposhnik
@ 2009-10-15  2:12       ` Eric Van Hensbergen
  2009-10-15 10:53       ` Sam Watkins
  1 sibling, 0 replies; 90+ messages in thread
From: Eric Van Hensbergen @ 2009-10-15  2:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


On Oct 14, 2009, at 7:50 PM, Roman Shaposhnik wrote:

> On Wed, Oct 14, 2009 at 2:21 PM, Tim Newsham <newsham@lava.net> wrote:
>> I'm not familiar with the berkeley work.
>
> Sorry I can't readily find the paper (the URL is somewhere on IMAP
> @Sun :-()
> But it got presented at the Birkeley ParLab overview given to us by
> Dave Patterson.
> They were talking thin hypervisors and that sort of stuff. More
> details here:
>   http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-23.pdf
> (page 10) but still no original paper in sight...

You may be thinking about the Tesselation work from Berkley ParLab (http://parlab.eecs.berkeley.edu/publication/221
)

        -eric




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:36   ` Eric Van Hensbergen
@ 2009-10-15  2:05     ` Roman Shaposhnik
  2009-10-15  2:17       ` Eric Van Hensbergen
  0 siblings, 1 reply; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-15  2:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> And how does one deal with heterogeneous cores and complex on chip
> interconnect topologies?

Good question. Do they have to be heterogeneous? My oppinion is that the
future of big multicore will be more Cell-like.

> There's no real evdence that single kernels do well with hundreds of real
> cores (as opposed to hw threads) - in fact most of the data I've seen is to
> the contrary.

Agreed. But then, again, you don't really want a kernel for anything but message
passing in such an architecture (the other function of the kernel --
multiplexing
I/O is only needed on selected few cores) at which point it really becomes a
misnomer to even call it a kernel -- a thin hypervisor perhaps...

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:21   ` Tim Newsham
  2009-10-14 21:33     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2009-10-15  1:03     ` David Leimbach
@ 2009-10-15  1:50     ` Roman Shaposhnik
  2009-10-15  2:12       ` Eric Van Hensbergen
  2009-10-15 10:53       ` Sam Watkins
  2 siblings, 2 replies; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-15  1:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Oct 14, 2009 at 2:21 PM, Tim Newsham <newsham@lava.net> wrote:
> I'm not familiar with the berkeley work.

Sorry I can't readily find the paper (the URL is somewhere on IMAP @Sun :-()
But it got presented at the Birkeley ParLab overview given to us by
Dave Patterson.
They were talking thin hypervisors and that sort of stuff. More details here:
   http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-23.pdf
(page 10) but still no original paper in sight...

> I'm still digesting it.  My first thoughts were that if my pc is a
> distributed heterogeneous computer

It may very well be that, but why would you want to manage that complexity?
Your GPU is already heavily "multicore", yet you don't see it (and you really
don't want to see it!)

> The mention that "... the overhead of cache coherence restricts the ability
> to scale up to even 80 cores" is also eye openeing. If we're at aprox 8
> cores today, thats only 5 yrs away (if we double cores every
> 1.5 yrs).

A couple of years ago we had a Plan9 summit @Google campus and Ken was
there. I still remember the question he asked me: what exactly would you make
all those core do on your desktop?

Frankly, I still don't have a good answer.

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:21   ` Tim Newsham
  2009-10-14 21:33     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
@ 2009-10-15  1:03     ` David Leimbach
  2009-10-15  1:50     ` Roman Shaposhnik
  2 siblings, 0 replies; 90+ messages in thread
From: David Leimbach @ 2009-10-15  1:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

>
>
>
>  Did you find any ideas there particularly engaging?
>>
>
> I'm still digesting it.  My first thoughts were that if my pc is a
> distributed heterogeneous computer, what lessons it can borrow from earlier
> work on distributed heterogeneous computing (ie. plan9).
>
> I found the discussion on cache coherency, message passing and optimization
> to be enlightening.  The fact that you may want to
> organize your core OS quite a bit differently depending on which
> model cpus in the same family you use is kind of scary.
>
> The mention that "... the overhead of cache coherence restricts the ability
> to scale up to even 80 cores" is also eye openeing. If we're at aprox 8
> cores today, thats only 5 yrs away (if we double cores every
> 1.5 yrs).
>


I personally thought the use of DSLs built on Haskell was rather clever, but
the other discoveries are the sort of feedback I suspect our CPU vendors
aren't going to think about on their own somehow :-)


>
>  Roman.
>>
>
> Tim Newsham
> http://www.thenewsh.com/~newsham/
>
>

[-- Attachment #2: Type: text/html, Size: 1788 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 22:10         ` Eric Van Hensbergen
@ 2009-10-14 22:21           ` Noah Evans
  0 siblings, 0 replies; 90+ messages in thread
From: Noah Evans @ 2009-10-14 22:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Do want.

On Thu, Oct 15, 2009 at 12:10 AM, Eric Van Hensbergen <ericvh@gmail.com> wrote:
> On Oct 14, 2009, at 3:42 PM, Noah Evans wrote:
>
>> http://ramp.eecs.berkeley.edu/
>>
>> Tim: Andrew Baumann is aware of Plan 9 but their approach is quite a
>> bit different. They are consciously avoiding the networking issue as
>> well(they've been asked to extend their messaging model to the network
>> and have actively said they're not interested).
>>
>
> While they may not be interested in implementing a network messaging model,
> they don't oppose it.  Jonathan and I talked with Andrew about porting
> Barrelfish to Blue Gene yesterday to test some of their scalability claims
> at large scale.
>
>       -eric
>
>
>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:42       ` Noah Evans
  2009-10-14 21:45         ` erik quanstrom
@ 2009-10-14 22:10         ` Eric Van Hensbergen
  2009-10-14 22:21           ` Noah Evans
  1 sibling, 1 reply; 90+ messages in thread
From: Eric Van Hensbergen @ 2009-10-14 22:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Oct 14, 2009, at 3:42 PM, Noah Evans wrote:

> http://ramp.eecs.berkeley.edu/
>
> Tim: Andrew Baumann is aware of Plan 9 but their approach is quite a
> bit different. They are consciously avoiding the networking issue as
> well(they've been asked to extend their messaging model to the network
> and have actively said they're not interested).
>

While they may not be interested in implementing a network messaging
model, they don't oppose it.  Jonathan and I talked with Andrew about
porting Barrelfish to Blue Gene yesterday to test some of their
scalability claims at large scale.

        -eric




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:45         ` erik quanstrom
@ 2009-10-14 21:57           ` Noah Evans
  0 siblings, 0 replies; 90+ messages in thread
From: Noah Evans @ 2009-10-14 21:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Have you read the paper? I don't think you understand the difference
in scope or goals here.

On Wed, Oct 14, 2009 at 11:45 PM, erik quanstrom <quanstro@coraid.com> wrote:
>> http://ramp.eecs.berkeley.edu/
>>
>> Tim: Andrew Baumann is aware of Plan 9 but their approach is quite a
>> bit different. They are consciously avoiding the networking issue as
>> well(they've been asked to extend their messaging model to the network
>> and have actively said they're not interested).
>
> every interconnect is a network.  sometimes we don't admit it.
>
> - erik
>
>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:42       ` Noah Evans
@ 2009-10-14 21:45         ` erik quanstrom
  2009-10-14 21:57           ` Noah Evans
  2009-10-14 22:10         ` Eric Van Hensbergen
  1 sibling, 1 reply; 90+ messages in thread
From: erik quanstrom @ 2009-10-14 21:45 UTC (permalink / raw)
  To: 9fans

> http://ramp.eecs.berkeley.edu/
>
> Tim: Andrew Baumann is aware of Plan 9 but their approach is quite a
> bit different. They are consciously avoiding the networking issue as
> well(they've been asked to extend their messaging model to the network
> and have actively said they're not interested).

every interconnect is a network.  sometimes we don't admit it.

- erik



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:33     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
@ 2009-10-14 21:42       ` Noah Evans
  2009-10-14 21:45         ` erik quanstrom
  2009-10-14 22:10         ` Eric Van Hensbergen
  0 siblings, 2 replies; 90+ messages in thread
From: Noah Evans @ 2009-10-14 21:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

http://ramp.eecs.berkeley.edu/

Tim: Andrew Baumann is aware of Plan 9 but their approach is quite a
bit different. They are consciously avoiding the networking issue as
well(they've been asked to extend their messaging model to the network
and have actively said they're not interested).

On Wed, Oct 14, 2009 at 11:33 PM, Lyndon Nerenberg (VE6BBM/VE7TFX)
<lyndon@orthanc.ca> wrote:
>> I'm not familiar with the berkeley work.
>
> Me either. Any chance of some references to this?
>
>
>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 19:54 ` Roman Shaposhnik
  2009-10-14 21:21   ` Tim Newsham
@ 2009-10-14 21:36   ` Eric Van Hensbergen
  2009-10-15  2:05     ` Roman Shaposhnik
  1 sibling, 1 reply; 90+ messages in thread
From: Eric Van Hensbergen @ 2009-10-14 21:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

And how does one deal with heterogeneous cores and complex on chip
interconnect topologies?  Barrelfish also gas a nice benefit in that
it could span coherence domains.

There's no real evdence that single kernels do well with hundreds of
real cores (as opposed to hw threads) - in fact most of the data I've
seen is to the contrary.

Sent from my iPhone

On Oct 14, 2009, at 1:54 PM, Roman Shaposhnik <roman@shaposhnik.org>
wrote:

> On Wed, Oct 14, 2009 at 12:09 PM, Tim Newsham <newsham@lava.net>
> wrote:
>> Rethinking multi-core systems as distributed heterogeneous
>> systems.  Thoughts?
>
> Somehow this feels related to the work that came out of Berkeley a
> year
> or so ago. I'm still not convinced what is the benefits of multiple
> kernels. If you are managing a couple of 100s of cores a single kernel
> would do just fine, once the industry is ready for a couple dozen of
> thousands PUs -- the kernel is most likely to be dispensed with
> anyway.
>
> Did you find any ideas there particularly engaging?
>
> Thanks,
> Roman.
>



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 21:21   ` Tim Newsham
@ 2009-10-14 21:33     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2009-10-14 21:42       ` Noah Evans
  2009-10-15  1:03     ` David Leimbach
  2009-10-15  1:50     ` Roman Shaposhnik
  2 siblings, 1 reply; 90+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2009-10-14 21:33 UTC (permalink / raw)
  To: 9fans

> I'm not familiar with the berkeley work.

Me either. Any chance of some references to this?




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 19:54 ` Roman Shaposhnik
@ 2009-10-14 21:21   ` Tim Newsham
  2009-10-14 21:33     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
                       ` (2 more replies)
  2009-10-14 21:36   ` Eric Van Hensbergen
  1 sibling, 3 replies; 90+ messages in thread
From: Tim Newsham @ 2009-10-14 21:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Somehow this feels related to the work that came out of Berkeley a year
> or so ago. I'm still not convinced what is the benefits of multiple
> kernels. If you are managing a couple of 100s of cores a single kernel
> would do just fine, once the industry is ready for a couple dozen of
> thousands PUs -- the kernel is most likely to be dispensed with anyway.

I'm not familiar with the berkeley work.

> Did you find any ideas there particularly engaging?

I'm still digesting it.  My first thoughts were that if my pc is a
distributed heterogeneous computer, what lessons it can borrow from
earlier work on distributed heterogeneous computing (ie. plan9).

I found the discussion on cache coherency, message passing and
optimization to be enlightening.  The fact that you may want to
organize your core OS quite a bit differently depending on which
model cpus in the same family you use is kind of scary.

The mention that "... the overhead of cache coherence restricts the
ability to scale up to even 80 cores" is also eye openeing. If we're at
aprox 8 cores today, thats only 5 yrs away (if we double cores every
1.5 yrs).

> Roman.

Tim Newsham
http://www.thenewsh.com/~newsham/



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [9fans] Barrelfish
  2009-10-14 19:09 Tim Newsham
@ 2009-10-14 19:54 ` Roman Shaposhnik
  2009-10-14 21:21   ` Tim Newsham
  2009-10-14 21:36   ` Eric Van Hensbergen
  2009-10-15 18:28 ` Christopher Nielsen
  1 sibling, 2 replies; 90+ messages in thread
From: Roman Shaposhnik @ 2009-10-14 19:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Oct 14, 2009 at 12:09 PM, Tim Newsham <newsham@lava.net> wrote:
> Rethinking multi-core systems as distributed heterogeneous
> systems.  Thoughts?

Somehow this feels related to the work that came out of Berkeley a year
or so ago. I'm still not convinced what is the benefits of multiple
kernels. If you are managing a couple of 100s of cores a single kernel
would do just fine, once the industry is ready for a couple dozen of
thousands PUs -- the kernel is most likely to be dispensed with anyway.

Did you find any ideas there particularly engaging?

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* [9fans] Barrelfish
@ 2009-10-14 19:09 Tim Newsham
  2009-10-14 19:54 ` Roman Shaposhnik
  2009-10-15 18:28 ` Christopher Nielsen
  0 siblings, 2 replies; 90+ messages in thread
From: Tim Newsham @ 2009-10-14 19:09 UTC (permalink / raw)
  To: 9fans

Rethinking multi-core systems as distributed heterogeneous
systems.  Thoughts?

http://www.sigops.org/sosp/sosp09/papers/baumann-sosp09.pdf

Tim Newsham
http://www.thenewsh.com/~newsham/



^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2009-10-28 15:37 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <<20091016172030.GB3135@nipl.net>
2009-10-16 18:34 ` [9fans] Barrelfish erik quanstrom
     [not found] <<4ADD147A.4090801@maht0x0r.net>
2009-10-20  2:11 ` erik quanstrom
2009-10-20  2:33   ` matt
     [not found] <<20091019182352.GA1688@polynum.com>
2009-10-19 18:48 ` erik quanstrom
     [not found] <<4ADC7439.3060502@maht0x0r.net>
2009-10-19 16:13 ` erik quanstrom
2009-10-19 18:23   ` tlaronde
2009-10-20  1:38   ` matt
2009-10-20  1:58     ` Eris Discordia
2009-10-20  2:17       ` matt
     [not found] <<20091019155738.GB13857@nipl.net>
2009-10-19 16:05 ` erik quanstrom
2009-10-19 16:34   ` Sam Watkins
2009-10-19 17:30     ` ron minnich
2009-10-19 17:57       ` W B Hacker
2009-10-19 18:14       ` David Leimbach
     [not found] <<20091018031508.717CE5B30@mail.bitblocks.com>
2009-10-19 13:44 ` erik quanstrom
2009-10-19 14:36   ` David Leimbach
     [not found] <<d50d7d460910161417w45b5c675p8740315aaf6861f@mail.gmail.com>
2009-10-16 22:25 ` erik quanstrom
     [not found] <<3e1162e60910150805q2ea3f682w688299a39274051c@mail.gmail.com>
2009-10-15 15:28 ` erik quanstrom
     [not found] <<4AD70EE9.1010208@conducive.org>
2009-10-15 13:52 ` erik quanstrom
     [not found] <<207092dc429fe476c2046d537aeaa400@hamnavoe.com>
2009-10-15 13:52 ` erik quanstrom
2009-10-15 15:07   ` David Leimbach
2009-10-15 15:21     ` roger peppe
2009-10-16 17:21       ` Sam Watkins
2009-10-16 23:39         ` Nick LaForge
2009-10-18  1:12         ` Roman Shaposhnik
2009-10-19 14:14           ` matt
2009-10-19 16:00           ` Sam Watkins
     [not found] <<20091015105328.GA18947@nipl.net>
2009-10-15 13:27 ` erik quanstrom
2009-10-15 13:40   ` Richard Miller
2009-10-16 17:20   ` Sam Watkins
2009-10-16 18:18     ` Latchesar Ionkov
2009-10-19 15:26       ` Sam Watkins
2009-10-19 15:33         ` andrey mirtchovski
2009-10-19 15:50         ` ron minnich
2009-10-16 21:17     ` Jason Catena
2009-10-17 20:58       ` Dave Eckhardt
2009-10-18  2:09         ` Jason Catena
2009-10-18 16:02           ` Dave Eckhardt
2009-10-17 18:45   ` Eris Discordia
2009-10-17 21:07     ` Steve Simon
2009-10-17 21:18       ` Eric Van Hensbergen
2009-10-18  8:48         ` Eris Discordia
2009-10-18  8:44       ` Eris Discordia
2009-10-19 15:57     ` Sam Watkins
2009-10-19 16:03       ` ron minnich
2009-10-19 16:46       ` Russ Cox
2009-10-20  2:16       ` matt
2009-10-20  9:15         ` Steve Simon
2009-10-21 15:43         ` Sam Watkins
2009-10-21 16:11           ` Russ Cox
2009-10-21 16:37             ` Sam Watkins
2009-10-21 18:01           ` ron minnich
2009-10-28 15:37           ` matt
     [not found]   ` <A90043D02D52B2CBF2804FA4@192.168.1.2>
2009-10-18  0:06     ` ron minnich
2009-10-18  0:54       ` Roman Shaposhnik
2009-10-14 19:09 Tim Newsham
2009-10-14 19:54 ` Roman Shaposhnik
2009-10-14 21:21   ` Tim Newsham
2009-10-14 21:33     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2009-10-14 21:42       ` Noah Evans
2009-10-14 21:45         ` erik quanstrom
2009-10-14 21:57           ` Noah Evans
2009-10-14 22:10         ` Eric Van Hensbergen
2009-10-14 22:21           ` Noah Evans
2009-10-15  1:03     ` David Leimbach
2009-10-15  1:50     ` Roman Shaposhnik
2009-10-15  2:12       ` Eric Van Hensbergen
2009-10-15 10:53       ` Sam Watkins
2009-10-15 11:50         ` Richard Miller
2009-10-15 12:00           ` W B Hacker
2009-10-16 17:03           ` Sam Watkins
2009-10-16 18:17             ` ron minnich
2009-10-16 18:39               ` Wes Kussmaul
2009-10-17 12:42             ` Roman Shaposhnik
2009-10-15 11:56         ` Josh Wood
2009-10-15 13:11         ` hiro
2009-10-15 15:05           ` David Leimbach
2009-10-18  1:15         ` Roman Shaposhnik
2009-10-18  3:15           ` Bakul Shah
     [not found]             ` <e763acc10910180606q1312ff7cw9a465d6af39c0fbe@mail.gmail.com>
2009-10-18 13:22               ` Roman Shaposhnik
2009-10-18 19:18                 ` Bakul Shah
2009-10-18 20:12                   ` ron minnich
2009-10-14 21:36   ` Eric Van Hensbergen
2009-10-15  2:05     ` Roman Shaposhnik
2009-10-15  2:17       ` Eric Van Hensbergen
2009-10-15  3:32         ` Tim Newsham
2009-10-15  3:59           ` Eric Van Hensbergen
2009-10-15 17:39             ` Tim Newsham
2009-10-15 18:28 ` Christopher Nielsen
2009-10-15 18:55   ` W B Hacker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).