* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
[not found] <f1209aefaab5eece7465c3d0df545ddd@quanstro.net>
@ 2008-07-14 20:33 ` Roman V. Shaposhnik
2008-07-15 1:37 ` Joel C. Salomon
2008-07-15 8:01 ` Bakul Shah
0 siblings, 2 replies; 22+ messages in thread
From: Roman V. Shaposhnik @ 2008-07-14 20:33 UTC (permalink / raw)
To: erik quanstrom; +Cc: 9fans
On Mon, 2008-07-14 at 12:35 -0400, erik quanstrom wrote:
> > > Plan 9 makes it easy via 9p, its file system/resource sharing
> > > protocol. In plan 9, things like graphics and network drivers export a
> > > 9p interface (a filetree). Furthermore, 9p is network transparent
> > > which means accesses to remote resources look exactly like accesses to
> > > local resources, and this is the main trick - processes do not care
> > > whether the file they are interested in is being served by the kernel,
> > > a userspace process, or a machine half way across the world.
> >
> > All very true. And it sure does provide enormous benefits on distributed
> > memory architectures. But do you know of any part that would be
> > beneficial for highly-SMP systems?
>
> do you have some reason to believe that 9p (or just read and write)
> is not effective on such a machine?
I have some (not a whole lot, since I haven't looked at source code
for a while) reason to believe that the current 9P implementation
doesn't seem to exploit the opportunity when both ends happen to run
on the same shared memory. I would love to be proved wrong. Although,
the higher level issue that I have with 9P on a shared memory
architectures is the fact that file and communication abstractions
might not be the best way to represent the shared memory resources
to begin with. IOW, mmap()-like things might be a closer match.
> since scheduling would be the main shared resource, do you think
> it would be the limiting factor?
Yes. And that's where the comment in my first email came from:
scheduling is a tricky thing on a shared memory NUMA-like systems.
Solaris's scheduler is not shy when it comes to big iron (100+ CPU SMP
boxes) but even it had to be heavily tuned when a Batoka box first
came to the labs. When you have physcical threads (CPUs), virtual
threads and a non trivial memory hierarchy -- the decision of what
is the best place (hardware-wise) for a give thread to run becomes
a non-trivial one. Kernels that can track affinity properly rule
the day. I don't think that Plan9 scheduler has had an
opportunity to be tuned for such an environment. Same goes for
virtual memory page related algorithms.
Here's a decent (albeit brief) overview of what kernel has to
face these days in order to be reasonably savvy on shared memory,
multicore architectures with NUMA-like memory hierarchy:
http://www.redhat.com/promo/summit/2008/downloads/pdf/Wednesday_1015am_Rik_Van_Riel_Hot_Topics.pdf
Start from slide #13.
Thanks,
Roman.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 20:33 ` [9fans] Plan 9 and multicores/parallelism/concurrency? Roman V. Shaposhnik
@ 2008-07-15 1:37 ` Joel C. Salomon
2008-07-15 8:01 ` Bakul Shah
1 sibling, 0 replies; 22+ messages in thread
From: Joel C. Salomon @ 2008-07-15 1:37 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Mon, Jul 14, 2008 at 4:33 PM, Roman V. Shaposhnik <rvs@sun.com> wrote:
> the day. I don't think that Plan9 scheduler has had an
> opportunity to be tuned for such an environment. Same goes for
> virtual memory page related algorithms.
The scheduling code does have a heuristic for processor affinity, so
there's a model for what to tune when you have the MSMP machine to
play with.
--Joel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 20:33 ` [9fans] Plan 9 and multicores/parallelism/concurrency? Roman V. Shaposhnik
2008-07-15 1:37 ` Joel C. Salomon
@ 2008-07-15 8:01 ` Bakul Shah
2008-07-15 17:50 ` Paul Lalonde
1 sibling, 1 reply; 22+ messages in thread
From: Bakul Shah @ 2008-07-15 8:01 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Mon, 14 Jul 2008 13:33:01 PDT "Roman V. Shaposhnik" <rvs@sun.com> wrote:
> Solaris's scheduler is not shy when it comes to big iron (100+ CPU SMP
> boxes) but even it had to be heavily tuned when a Batoka box first
> came to the labs. When you have physcical threads (CPUs), virtual
> threads and a non trivial memory hierarchy -- the decision of what
> is the best place (hardware-wise) for a give thread to run becomes
> a non-trivial one. Kernels that can track affinity properly rule
> the day.
I suspect a lot of this complexity will end up being dropped
when you don't have to worry about efficiently using the last
N% of cpu cycles. When your bottleneck is memory bandwidth
using core 100% is not going to happen in general. And I am
not sure thread placement belongs in the kernel. Why not let
an application manage its allocation of h/w thread x cycle
resources? I am not even sure a full kernel belongs on every
core.
Unlike you I think the kernel should do even less as more and
more cores are added. It should basically stay out of the
way. Less government, more privatization :-) So may be
the plan9 kernel would a better starting point than a Unix
kernel.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-15 8:01 ` Bakul Shah
@ 2008-07-15 17:50 ` Paul Lalonde
2008-07-17 19:29 ` Bakul Shah
0 siblings, 1 reply; 22+ messages in thread
From: Paul Lalonde @ 2008-07-15 17:50 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 15-Jul-08, at 1:01 AM, Bakul Shah wrote:
>
> I suspect a lot of this complexity will end up being dropped
> when you don't have to worry about efficiently using the last
> N% of cpu cycles.
Would that I weren't working on a multi-core graphics part... That N%
is what the game is all about.
> When your bottleneck is memory bandwidth
> using core 100% is not going to happen in general.
But in most cases, that memory movement has to share the bus with
increasingly remote cache accesses, which in turn take bandwidth.
Affinity is a serious win for reducing on-chip bandwidth usage in
cache-coherent many-core systems.
> And I am
> not sure thread placement belongs in the kernel. Why not let
> an application manage its allocation of h/w thread x cycle
> resources? I am not even sure a full kernel belongs on every
> core.
I'm still looking for the right scheduler, in kernel or user space,
that lets me deal with affinitizing 3 resources that run at different
granularities: per-core cache, hardware-thread-to-core, and cross-chip
caches. There's a rough hierarchy implied by these three resources,
and perfect scheduling might be possible in a purely cooperative
world, but reality imposes pre-emption and resource virtualization.
> Unlike you I think the kernel should do even less as more and
> more cores are added. It should basically stay out of the
> way. Less government, more privatization :-) So may be
> the plan9 kernel would a better starting point than a Unix
> kernel.
Agreed, less and less in the kernel, but *enough*. I like resource
virtualization, and as long as it gets affinity right, I win.
Paul
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-15 17:50 ` Paul Lalonde
@ 2008-07-17 19:29 ` Bakul Shah
2008-07-18 3:31 ` Paul Lalonde
0 siblings, 1 reply; 22+ messages in thread
From: Bakul Shah @ 2008-07-17 19:29 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Tue, 15 Jul 2008 10:50:46 PDT Paul Lalonde <plalonde@telus.net> wrote:
>
> On 15-Jul-08, at 1:01 AM, Bakul Shah wrote:
> >
> > I suspect a lot of this complexity will end up being dropped
> > when you don't have to worry about efficiently using the last
> > N% of cpu cycles.
>
> Would that I weren't working on a multi-core graphics part... That N%
> is what the game is all about.
I was really wondering about what might happen when there are
100s of cores per die. My reasoning was that more and more
cores can be (and will be) put on a die but a corresponding
increase in off chip memory bandwidth will not be possible so
at some point memory bottleneck will prevent 100% use of
cores even if you assume ideal placement of threads and no
thread movement to a different core.
> > When your bottleneck is memory bandwidth
> > using core 100% is not going to happen in general.
>
> But in most cases, that memory movement has to share the bus with
> increasingly remote cache accesses, which in turn take bandwidth.
> Affinity is a serious win for reducing on-chip bandwidth usage in
> cache-coherent many-core systems.
I was certainly not suggesting moving threads around. I was
speculating that as the number of cores goes up perhaps the
kernel is not the right place to do affinity scheduling or
much any sophisticated scheduling.
> > And I am
> > not sure thread placement belongs in the kernel. Why not let
> > an application manage its allocation of h/w thread x cycle
> > resources? I am not even sure a full kernel belongs on every
> > core.
>
> I'm still looking for the right scheduler, in kernel or user space,
> that lets me deal with affinitizing 3 resources that run at different
> granularities: per-core cache, hardware-thread-to-core, and cross-chip
> caches. There's a rough hierarchy implied by these three resources,
> and perfect scheduling might be possible in a purely cooperative
> world, but reality imposes pre-emption and resource virtualization.
Some friends of mine are able to sqeeze a lot of parallelism
out supposedly hard to parallelize code. But this is in a
purely cooperative worlds where they assume threads don't
move and where machines are dedicated to specific tasks.
> > Unlike you I think the kernel should do even less as more and
> > more cores are added. It should basically stay out of the
> > way. Less government, more privatization :-) So may be
> > the plan9 kernel would a better starting point than a Unix
> > kernel.
>
> Agreed, less and less in the kernel, but *enough*. I like resource
> virtualization, and as long as it gets affinity right, I win.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-17 19:29 ` Bakul Shah
@ 2008-07-18 3:31 ` Paul Lalonde
0 siblings, 0 replies; 22+ messages in thread
From: Paul Lalonde @ 2008-07-18 3:31 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Jul 17, 2008, at 12:29 PM, Bakul Shah wrote:
> My reasoning was that more and more
> cores can be (and will be) put on a die but a corresponding
> increase in off chip memory bandwidth will not be possible so
> at some point memory bottleneck will prevent 100% use of
> cores even if you assume ideal placement of threads and no
> thread movement to a different core.
As the number of cores increases you have to hugely increase the
amount of cache - you need cache enough for a large enough working
set to keep a core busy during the long wait for its next slice of
bandwidth (figurative slice - the multiplexing clearly should finer
grained). Latency hiding on those fetches is critically important.
>
> I was certainly not suggesting moving threads around. I was
> speculating that as the number of cores goes up perhaps the
> kernel is not the right place to do affinity scheduling or
> much any sophisticated scheduling.
Largely agreed. The real tension is in virtualizing the resources,
which beats against affinity. Affinity is clearly an early loser in
oversubscribed situations, but it would be a major win to have a
scheduler (in or out of kernel) that could degrade intelligently in
the face of oversubscription, instead of the hard wall you get when
you throw away affinity.
> Some friends of mine are able to sqeeze a lot of parallelism
> out supposedly hard to parallelize code. But this is in a
> purely cooperative worlds where they assume threads don't
> move and where machines are dedicated to specific tasks.
Envy.
The other part not to forget about is data-parallel. At least in
graphics we get to recast most of our heavy loads to data-parallel,
which has huge benefits. If you can manage data-parallel with a nice
task DAG and decent load-balancing you can do wonders at keeping data
on-chip while pushing lots of flops.
Paul, patiently awaiting hardware announcements so he can talk freely.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)
iD8DBQFIgA6hpJeHo/Fbu1wRAvZUAJ0WxfsfPHZJSclLwhgLj8ibkdgDiwCgx80y
7WT72MW7TsELUwi7jSATr/8=
=5nHw
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
@ 2008-07-14 16:35 erik quanstrom
0 siblings, 0 replies; 22+ messages in thread
From: erik quanstrom @ 2008-07-14 16:35 UTC (permalink / raw)
To: rvs, 9fans
> > Plan 9 makes it easy via 9p, its file system/resource sharing
> > protocol. In plan 9, things like graphics and network drivers export a
> > 9p interface (a filetree). Furthermore, 9p is network transparent
> > which means accesses to remote resources look exactly like accesses to
> > local resources, and this is the main trick - processes do not care
> > whether the file they are interested in is being served by the kernel,
> > a userspace process, or a machine half way across the world.
>
> All very true. And it sure does provide enormous benefits on distributed
> memory architectures. But do you know of any part that would be
> beneficial for highly-SMP systems?
do you have some reason to believe that 9p (or just read and write)
is not effective on such a machine?
since scheduling would be the main shared resource, do you think
it would be the limiting factor?
- erik
^ permalink raw reply [flat|nested] 22+ messages in thread
* [9fans] Plan 9 and multicores/parallelism/concurrency?
@ 2008-07-14 8:45 ssecorp
2008-07-14 9:08 ` sqweek
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: ssecorp @ 2008-07-14 8:45 UTC (permalink / raw)
To: 9fans
from wikipedia:
"Plan 9 from Bell Labs is a distributed operating system, primarily
used for research."
but it doesnt say anything more about the distributed part.
I have recently found a big interest in concurrency, distributed
systems and multicore-programming.
So is Plan 9 good for a multicore-computer or what kind of distributed
system is it made for?
In what way does it make it easy?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 8:45 ssecorp
@ 2008-07-14 9:08 ` sqweek
2008-07-14 16:17 ` Iruata Souza
2008-07-14 16:31 ` Roman V. Shaposhnik
2008-07-14 10:15 ` a
` (2 subsequent siblings)
3 siblings, 2 replies; 22+ messages in thread
From: sqweek @ 2008-07-14 9:08 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Mon, Jul 14, 2008 at 4:45 PM, ssecorp <circularfunc@gmail.com> wrote:
> from wikipedia:
> "Plan 9 from Bell Labs is a distributed operating system, primarily
> used for research."
>
> but it doesnt say anything more about the distributed part.
>
> In what way does it make it easy?
Plan 9 makes it easy via 9p, its file system/resource sharing
protocol. In plan 9, things like graphics and network drivers export a
9p interface (a filetree). Furthermore, 9p is network transparent
which means accesses to remote resources look exactly like accesses to
local resources, and this is the main trick - processes do not care
whether the file they are interested in is being served by the kernel,
a userspace process, or a machine half way across the world.
-sqweek
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 9:08 ` sqweek
@ 2008-07-14 16:17 ` Iruata Souza
2008-07-14 16:31 ` Roman V. Shaposhnik
1 sibling, 0 replies; 22+ messages in thread
From: Iruata Souza @ 2008-07-14 16:17 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 7/14/08, sqweek <sqweek@gmail.com> wrote:
> On Mon, Jul 14, 2008 at 4:45 PM, ssecorp <circularfunc@gmail.com> wrote:
> > from wikipedia:
> > "Plan 9 from Bell Labs is a distributed operating system, primarily
> > used for research."
> >
> > but it doesnt say anything more about the distributed part.
> >
>
> > In what way does it make it easy?
>
>
> Plan 9 makes it easy via 9p, its file system/resource sharing
> protocol. In plan 9, things like graphics and network drivers export a
> 9p interface (a filetree). Furthermore, 9p is network transparent
> which means accesses to remote resources look exactly like accesses to
> local resources, and this is the main trick - processes do not care
> whether the file they are interested in is being served by the kernel,
> a userspace process, or a machine half way across the world.
>
more on 9p at http://9p.cat-v.org
--
iru
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 9:08 ` sqweek
2008-07-14 16:17 ` Iruata Souza
@ 2008-07-14 16:31 ` Roman V. Shaposhnik
1 sibling, 0 replies; 22+ messages in thread
From: Roman V. Shaposhnik @ 2008-07-14 16:31 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Mon, 2008-07-14 at 17:08 +0800, sqweek wrote:
> On Mon, Jul 14, 2008 at 4:45 PM, ssecorp <circularfunc@gmail.com> wrote:
> > from wikipedia:
> > "Plan 9 from Bell Labs is a distributed operating system, primarily
> > used for research."
> >
> > but it doesnt say anything more about the distributed part.
> >
> > In what way does it make it easy?
>
> Plan 9 makes it easy via 9p, its file system/resource sharing
> protocol. In plan 9, things like graphics and network drivers export a
> 9p interface (a filetree). Furthermore, 9p is network transparent
> which means accesses to remote resources look exactly like accesses to
> local resources, and this is the main trick - processes do not care
> whether the file they are interested in is being served by the kernel,
> a userspace process, or a machine half way across the world.
All very true. And it sure does provide enormous benefits on distributed
memory architectures. But do you know of any part that would be
beneficial for highly-SMP systems?
Thanks,
Roman.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 8:45 ssecorp
2008-07-14 9:08 ` sqweek
@ 2008-07-14 10:15 ` a
2008-07-14 15:32 ` David Leimbach
2008-07-14 16:29 ` Roman V. Shaposhnik
3 siblings, 0 replies; 22+ messages in thread
From: a @ 2008-07-14 10:15 UTC (permalink / raw)
To: 9fans
In addition to sqweek's good reply:
The "distributed" part also refers to how a typical installation is
structured. The system responsible for authenticating you, your
file server, the cpu server you run processes on, and the terminal
you're typing at may well all be distinct computers, but for the
most part none of the application code knows anything about
networking. The system takes care of it for you (mostly thanks to
9p, as sqweek described).
In terms of concurrency, that's more of a programming question
than an OS question (which isn't to say the OS isn't relevant).
Plan 9's thread(2) library is probably the most relevant thing
there. It follows a very different (and easier to learn, read, and
write) model than the threads you see in other systems. Plan
9 mostly helps there by making things lighter, encouraging the
use of multiple processes where other systems penalize it; the
library itself is now available on unix through plan9port.
Anthony
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 8:45 ssecorp
2008-07-14 9:08 ` sqweek
2008-07-14 10:15 ` a
@ 2008-07-14 15:32 ` David Leimbach
2008-07-14 16:00 ` erik quanstrom
2008-07-14 16:29 ` Roman V. Shaposhnik
3 siblings, 1 reply; 22+ messages in thread
From: David Leimbach @ 2008-07-14 15:32 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
[-- Attachment #1: Type: text/plain, Size: 1436 bytes --]
On Mon, Jul 14, 2008 at 1:45 AM, ssecorp <circularfunc@gmail.com> wrote:
> from wikipedia:
> "Plan 9 from Bell Labs is a distributed operating system, primarily
> used for research."
>
> but it doesnt say anything more about the distributed part.
>
> I have recently found a big interest in concurrency, distributed
> systems and multicore-programming.
>
> So is Plan 9 good for a multicore-computer or what kind of distributed
> system is it made for?
>
> In what way does it make it easy?
>
> Assuming that the kernel can address multiple cores and SMP systems (I've
never tried it but I assume it can), one can write code in C via a library
called libthread, which provides a mechanism for writing concurrent programs
(originally meant to help port the programs written in the ill-fated but
very cool language Alef to C).
It features threads and typed data channels for interprocess communication
in a CSP sort of organization.
If libthread is able to grab real processors per thread and get them
scheduled, one's concurrent style code ultimately ends up having potential
to run in parallel on those cores/processors.
I think this method of writing programs designed to work on multi-core
systems is a good one. As do the folks who use Concurrent Haskell, or even
Erlang and perhaps Scala and other "new languages". In a sense this makes
writing code for multiple cores "easy" on Plan 9.
Dave
[-- Attachment #2: Type: text/html, Size: 1808 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 15:32 ` David Leimbach
@ 2008-07-14 16:00 ` erik quanstrom
0 siblings, 0 replies; 22+ messages in thread
From: erik quanstrom @ 2008-07-14 16:00 UTC (permalink / raw)
To: 9fans
> If libthread is able to grab real processors per thread and get them
> scheduled, one's concurrent style code ultimately ends up having potential
> to run in parallel on those cores/processors.
due to the specific meaning of "thread" in the thread
library, this statement is misleading.
only procs may run in parallel. threads are scheduled
cooperatively. since procs are scheduled by the kernel,
the kernel is responsible for scheduling procs; the thread
library doesn't grab processors.
- erik
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 8:45 ssecorp
` (2 preceding siblings ...)
2008-07-14 15:32 ` David Leimbach
@ 2008-07-14 16:29 ` Roman V. Shaposhnik
2008-07-14 20:08 ` a
3 siblings, 1 reply; 22+ messages in thread
From: Roman V. Shaposhnik @ 2008-07-14 16:29 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Mon, 2008-07-14 at 08:45 +0000, ssecorp wrote:
> from wikipedia:
> "Plan 9 from Bell Labs is a distributed operating system, primarily
> used for research."
>
> but it doesnt say anything more about the distributed part.
>
> I have recently found a big interest in concurrency, distributed
> systems and multicore-programming.
>
> So is Plan 9 good for a multicore-computer or what kind of distributed
> system is it made for?
I believe the real question is not whether Plan9 is good for multicore,
but whether multicore is any good as a long term computing strategy.
My personal impression has always been that Plan9 is the best OS for
distributed memory systems. I believe that folks working with IBM can
elaborate on that. As for the shared memory (whether NUMA or not) the
pressure is more on application (and thus application level languages
and tools) than on OS. It'll be interesting to see how a single Plan9
kernel scales on something like a Batoka box (256 hardware threads per
box, 64 physical cores). On the other hand, may be the trick is not
to scale a single kernel on something like that but have multiple
kernels running under something like Xen or kvm.
Thanks,
Roman.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 16:29 ` Roman V. Shaposhnik
@ 2008-07-14 20:08 ` a
2008-07-14 20:39 ` Roman V. Shaposhnik
2008-07-14 20:43 ` Charles Forsyth
0 siblings, 2 replies; 22+ messages in thread
From: a @ 2008-07-14 20:08 UTC (permalink / raw)
To: 9fans
// But do you know of any part [of Plan 9] that would be
// beneficial for highly-SMP systems?
Beneficial compared to what, I guess. I agree with your
comment that most of the pressure is on the application
rather than the kernel. The kernel's biggest contribution here
is keeping processes inexpensive compared to unix. As for the
system overall, there's something to be said for decomposing
problems to interfaces that can be represented in the
namespace; then, to a large extent, it doesn't matter whether
we're talking about one box or many.
// On the other hand, may be the trick is not to scale a single
// kernel on something like that but have multiple kernels
// running under something like Xen or kvm.
There's certainly something to be said for this in many cases,
but it hardly takes away any burden from application
developers. They've just got more practice doing it for logically
distinct machines. It lets kernel developers off the hook, but
I'm not sure that's a good thing.
// It'll be interesting to see how a single Plan9 kernel scales on
// something like a Batoka box (256 hardware threads per box,
// 64 physical cores).
Send me one and I'll see if I can find out. ☺
Anthony
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 20:08 ` a
@ 2008-07-14 20:39 ` Roman V. Shaposhnik
2008-07-14 22:12 ` a
2008-07-14 20:43 ` Charles Forsyth
1 sibling, 1 reply; 22+ messages in thread
From: Roman V. Shaposhnik @ 2008-07-14 20:39 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Mon, 2008-07-14 at 16:08 -0400, a@9srv.net wrote:
> // But do you know of any part [of Plan 9] that would be
> // beneficial for highly-SMP systems?
>
> Beneficial compared to what, I guess.
Lets say a typical Linux kernel.
> The kernel's biggest contribution here is keeping processes inexpensive
> compared to unix.
Not just inexpensive, but also better aligned with how
they use compute resources (virtual vs. physical threads)
and memory resources.
> // It'll be interesting to see how a single Plan9 kernel scales on
> // something like a Batoka box (256 hardware threads per box,
> // 64 physical cores).
>
> Send me one and I'll see if I can find out. ☺
Speaking of which -- is SPARC port of Plan9 still alive?
Thanks,
Roman.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 20:39 ` Roman V. Shaposhnik
@ 2008-07-14 22:12 ` a
2008-07-17 12:26 ` Roman V. Shaposhnik
0 siblings, 1 reply; 22+ messages in thread
From: a @ 2008-07-14 22:12 UTC (permalink / raw)
To: 9fans
// Not just inexpensive, but also better aligned with how
// they use compute resources (virtual vs. physical threads)
// and memory resources.
Hrm. I know about the memory/cache issues, but it sounds
like there's more on the CPU side I don't know much about.
Is there more here beyond the memory question and
prediction/pipelining?
I'm reading the PDF you referenced now.
// Speaking of which -- is SPARC port of Plan9 still alive?
Not really. There's a partially-working sparc64 port out there
which ran on the Ultra 1 (I think), but it's neither been kept up
to date nor made to run on anything beyond that. A few folks
(including myself) have poked at it a bit with varying results,
none of which were particularly good.
Something on Batoka would provide better motivation than my
old Ultra 5, though!
Anthony
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 22:12 ` a
@ 2008-07-17 12:26 ` Roman V. Shaposhnik
2008-07-17 12:40 ` erik quanstrom
0 siblings, 1 reply; 22+ messages in thread
From: Roman V. Shaposhnik @ 2008-07-17 12:26 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Mon, 2008-07-14 at 18:12 -0400, a@9srv.net wrote:
> // Not just inexpensive, but also better aligned with how
> // they use compute resources (virtual vs. physical threads)
> // and memory resources.
>
> Hrm. I know about the memory/cache issues, but it sounds
> like there's more on the CPU side I don't know much about.
> Is there more here beyond the memory question and
> prediction/pipelining?
With, what is called CMT, we now have the following hierarchy
of compute resources:
-> physical CPU
-> cores
-> virtual threads
and the following set (I can't quite call it a hierarchy) of memory
resources:
physical RAM attached to a particular CPU's memory controller
L3/L2 cache
L1 cache
These two set of resources can be "attached" to each other in a number
of different ways (e.g. L1 could be the only per-core cache or L2
could also be per-core, etc.) and the job of a scheduler is to figure
out the best mapping of tasks to compute resources based on
alignment constraints. Paul had a nice post on these constraints
earlier. Here's an old post from Ingo outlining what is NOT free
with HyperThreading:
http://lwn.net/Articles/8553/
> // Speaking of which -- is SPARC port of Plan9 still alive?
>
> Not really. There's a partially-working sparc64 port out there
> which ran on the Ultra 1 (I think), but it's neither been kept up
> to date nor made to run on anything beyond that. A few folks
> (including myself) have poked at it a bit with varying results,
> none of which were particularly good.
>
> Something on Batoka would provide better motivation than my
> old Ultra 5, though!
I don't think I can help much with that, but if I ever see a
homeless Batoka in the hallway it'll be FedExed to you in
no time ;-)
Thanks,
Roman.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-17 12:26 ` Roman V. Shaposhnik
@ 2008-07-17 12:40 ` erik quanstrom
2008-07-17 13:00 ` ron minnich
0 siblings, 1 reply; 22+ messages in thread
From: erik quanstrom @ 2008-07-17 12:40 UTC (permalink / raw)
To: 9fans
> These two set of resources can be "attached" to each other in a number
> of different ways (e.g. L1 could be the only per-core cache or L2
> could also be per-core, etc.) and the job of a scheduler is to figure
> out the best mapping of tasks to compute resources based on
> alignment constraints. Paul had a nice post on these constraints
> earlier. Here's an old post from Ingo outlining what is NOT free
> with HyperThreading:
> http://lwn.net/Articles/8553/
in my performace testing, try and theorize as i might, i have not
yet been able to see l2 or other cache effects on intel machines.
i may have seen l1 cache effects, but i rather think the reason
that pinning the process to a cpu helped was that it was being
scheduled when it wasn't needed on the other cpu. (that is, the
design was wrong anway.)
what i have seen is that the intel 82598 10gbit chip, by keeping
its tx and rx descriptor rings in cachable regular memory can
mash the fsb to little bits. it's still pretty fast, though.
there's no use going fast, if you have no data to go fast on.
- erik
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-17 12:40 ` erik quanstrom
@ 2008-07-17 13:00 ` ron minnich
0 siblings, 0 replies; 22+ messages in thread
From: ron minnich @ 2008-07-17 13:00 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Thu, Jul 17, 2008 at 5:40 AM, erik quanstrom <quanstro@quanstro.net> wrote:
> what i have seen is that the intel 82598 10gbit chip, by keeping
> its tx and rx descriptor rings in cachable regular memory can
> mash the fsb to little bits. it's still pretty fast, though.
>
it's funny how often this lesson is relearned. But cost is the driver,
and memory-free NICS are cheaper ... there are even memory-free
infiniband cards now. But I remember a A Quadrics guy telling me he
never wanted to put descriptors into main memory ever, ever again.
ron
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [9fans] Plan 9 and multicores/parallelism/concurrency?
2008-07-14 20:08 ` a
2008-07-14 20:39 ` Roman V. Shaposhnik
@ 2008-07-14 20:43 ` Charles Forsyth
1 sibling, 0 replies; 22+ messages in thread
From: Charles Forsyth @ 2008-07-14 20:43 UTC (permalink / raw)
To: 9fans
> // But do you know of any part [of Plan 9] that would be
> // beneficial for highly-SMP systems?
one difference from many of the others is that plan 9, both kernel and applications,
were written with multiprocessors in mind, at least up to 32 or so, so data structure
locking was included as the code was written, and processes (kprocs in the kernel)
were used (or not) as appropriate. generally, the granularity used seems appropriate.
kernel processes can be pre-empted. there is plenty of scope for parallelism.
many others started with a non-preemptible kernel and added a big global lock,
and gradually, very gradually changed code to use locks at finer granularity.
sometimes.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2008-07-18 3:31 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <f1209aefaab5eece7465c3d0df545ddd@quanstro.net>
2008-07-14 20:33 ` [9fans] Plan 9 and multicores/parallelism/concurrency? Roman V. Shaposhnik
2008-07-15 1:37 ` Joel C. Salomon
2008-07-15 8:01 ` Bakul Shah
2008-07-15 17:50 ` Paul Lalonde
2008-07-17 19:29 ` Bakul Shah
2008-07-18 3:31 ` Paul Lalonde
2008-07-14 16:35 erik quanstrom
-- strict thread matches above, loose matches on Subject: below --
2008-07-14 8:45 ssecorp
2008-07-14 9:08 ` sqweek
2008-07-14 16:17 ` Iruata Souza
2008-07-14 16:31 ` Roman V. Shaposhnik
2008-07-14 10:15 ` a
2008-07-14 15:32 ` David Leimbach
2008-07-14 16:00 ` erik quanstrom
2008-07-14 16:29 ` Roman V. Shaposhnik
2008-07-14 20:08 ` a
2008-07-14 20:39 ` Roman V. Shaposhnik
2008-07-14 22:12 ` a
2008-07-17 12:26 ` Roman V. Shaposhnik
2008-07-17 12:40 ` erik quanstrom
2008-07-17 13:00 ` ron minnich
2008-07-14 20:43 ` Charles Forsyth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).