From mboxrd@z Thu Jan  1 00:00:00 1970
From: Linus Torvalds <torvalds@osdl.org>
To: Scott Schwartz <schwartz@bio.cse.psu.edu>,
	Rob Pike <rob@mightycheese.com>
Cc: 9fans@cse.psu.edu
Subject: Re: [9fans] Re: Threads: Sewing badges of honor onto a Kernel
In-Reply-To: <20040227081500.14366.qmail@g.galapagos.bx.psu.edu>
Message-ID: <Pine.LNX.4.58.0402270026120.2563@ppc970.osdl.org>
References: <20040227081500.14366.qmail@g.galapagos.bx.psu.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Date: Fri, 27 Feb 2004 00:58:30 -0800
Topicbox-Message-UUID: fe201b00-eacc-11e9-9e20-41e7f4b1d025



On Fri, 27 Feb 2004, Rob Pike wrote:
>
> > So in a C/UNIX-like environment, private stacks are wrong. You could
> > imagine _other_ environments where they might be valid, but even those
> > other environments would not invalidate my points about efficiency and
> > simplicity.
>
> as i said before, the stacks are not private.  you're right, that's a
> bad thing.
> that's why they're not private.
>
> the segment called 'stack' is private, but that's a different thing.
> i stress: stack != stack segment.  stack is where your sp is; stack
> segment is a figment of the VM system.

Well, in another email I already said that "private stack" and "segments"
are really exactly the same thing - some people think of segments as
paging things, others think of them in the x86 sense, but in the end it
all comes down to the fact that a "stack address" ends up making sense
only within a specific context (and that context can sometimes be
partially visible to other threads by using explicit segment registers or
other magic, like special instructions that can take another address
space).

And private/segmented stacks are bad.

They are bad exactly because they magically make automatic variables
fundamentally different from other variables.  And they really have no
reason they should be different.

There is absolutely nothing wrong with having a thread take the address of
some automatic variable, and then just pass that address off to another
routine. And if that other routine decides that it is going to create a
hundred threads to solve the problem that the variable described in
parallell, then that should JUST WORK. Anything else would be EVIL.

Having a pointer that sometimes works, and sometimes doesn't, based on who
uses it - that's just crazy talk.

> i ask again: how does linux create per-thread storage?

The same way it creates any other storage: with mmap() and brk(). You just
malloc the thing, and you pass in the new stack as an argument to the
thread creation mechanism (which linux calls "clone()", just to be
different).

And because that storage is just storage, things like the one I described
above "just work". If you pass another thread a pointer to your stack, the
other thread can happily manipulate it, and never even needs to know that
it's an automatic variable somewhere else.

And sure, you can shoot yourself in the foot that way. You can pass off a
pointer to another thread, and then return from the function without
synchronizing with that other thread properly, and now the other thread
will scribble all over your stack. But that's really nothing different
than using "alloca()" and passing off that to something that remembers the
address.

> the way the plan 9 thread library works is so different from linux's
> that they're hard to compare.  program design in the two worlds is
> radically different.  so your claim of 'better' is curious to me.  by
> 'better' you seem to mean 'faster' and 'cleaner'.  faster at least can
> be measured.

To me, the final decision on "better" tends to be a fairly wide issue.
Performance is part of it - especially infrastructure that everybody
depends on should always strive to at least _allow_ good performance, even
if not everybody ends up caring.

But the concept, to me, is more important. Basically, I do not see how you
can really have a portable and even _remotely_ efficient partial VM
sharing. And if you can't have it, then you shouldn't design the
interfaces around it.

> you speak with certainty.  have you seen performance comparisons? i
> haven't, although it wouldn't surprise me to learn that there are useful
> programs for which linux outperforms plan 9, and vice versa of course.

When it comes to threads, I only see three interesting performance
metrics: how fast can you create them, how fast can you synchronize them
(both "join" and locking), and how well do you switch between them.

The locking is pretty much OS-independent, since fast locking has to be
done in user space anyway (with just the contention case falling back to
the OS, and if your app cares about performance it hopefully won't have
much contention).

So we're left with create, tear-down and switch. All of which are
_fundamentally_ faster if you just have a "share everything" model.
Create and tear-down are just increment/decrement a reference counter
(there's a spinlock involved too). Task switch is a no-op from a VM
standpoint (except we have a per-thread lazy TLB invalidate that will
trigger).

In contrast, partial sharing is a major pain. You definitely don't just do
a reference count increment for your VM.

			Linus