9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Itanium
@ 2005-01-23 21:50 Ben Huntsman
  2005-01-23 23:56 ` geoff
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Huntsman @ 2005-01-23 21:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: Inferno mailing list

Quick question-

Have any attempts been made to port Plan 9 to Itanium?
Any attempts to build Inferno hosted on an Itanium system?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2005-01-23 21:50 [9fans] Itanium Ben Huntsman
@ 2005-01-23 23:56 ` geoff
  0 siblings, 0 replies; 14+ messages in thread
From: geoff @ 2005-01-23 23:56 UTC (permalink / raw)
  To: 9fans

I'm not aware of any attempts to port anything to the Itanic.
The Itanic has hit the iceberg x86-64 (AMD64) and all aboard
have been lost.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-14 10:04     ` Christopher
@ 2009-01-14 10:54       ` erik quanstrom
  0 siblings, 0 replies; 14+ messages in thread
From: erik quanstrom @ 2009-01-14 10:54 UTC (permalink / raw)
  To: 9fans

On Wed Jan 14 05:12:07 EST 2009, nadiasvertex@gmail.com wrote:
> On Jan 12, 10:42 am, quans...@quanstro.net (erik quanstrom) wrote:
> > > [...] Many architectures get register
> > > windows wrong, but the Itanium has a variable-length register fill/
> > > spill engine that gets invoked automatically.  Of course, you can
> > > program the engine too.
> >
> > what's the advantage of this over the stanford style?
>
> I'm not sure what exactly you mean by that.

http://en.wikipedia.org/wiki/Register_window

> ARM is fine, but itanium predicated instructions allow you to have a
> great number of predicate registers.  This isn't like cmov and friends
> either.

does this buy anything in practice?  references?

> > unless it's an 8- or 16-bit part, i don't see why anyone cares
> > if the assembly is simplier.  but since this is an epic part,
> > the assembly is never simple.
>
>
> I don't know why bit size matters. Anyway, making the assembly simpler
> has a lot of benefits.  A human has to write the stuff at some point.
> When there are bugs, a human has to read it.  It also simplifies code
> generation by the compiler.

bit size matters because little 8- and 16-bit parts are so
constrained that one's best option is generally writing
assembler. (hint on an 8 bit computer, addressable
memory is 256 bytes.) for a 64-bit cpu, writing a substantial amount
of code in assembler is a waste of time.  here are
less than 2k lines of 386 assembler in the kernel and libc,
(1919 on my system).

>   It also simplifies code generation by the compiler.

having to build parallel instructions is a hard enough
problem, it delayed the introduction of the itanium by several
years and it's the reason amd had a window to sneak amd64
through.

> > how do you get around the fact that the parallelism
> > is limited by the instruction set and the fact that one
> > slow sub-instruction could stall the whole instruction?
>
> Parallelism isn't anymore limited by the instruction set on Itanium
> than it is anywhere else.  The processor has multiple issue units that
> can crunch multiple instructions in parallel.  Some units can execute
> multiple instructions per cycle.

okay, then.  please explain why it helps to have an explictly
parallel instruction set with a architechturally defined number of
parallel slots?  adding cores makes like much flexable and easier
to understand.  (and i don't need to recompile or write a new compiler.)

> There is a massive difference.  As the other poster pointed out,
> closures are cool in and of themselves.

what do they get me?  dlls don't count.  plan 9 doesn't have dynamic
linking.

> On x86 processors, you get 4 stacks.  One for each privilege level.
> You can change a stack anytime you want, but it requires either an
> instruction to do so, or instruction patching by the loader.
> Everything gets stuck there and there are very few restrictions about
> what you do with stuff on the stack.

i don't think anybody cares about the intricate details
of who sets up the stack or how it is managed.  from the user's
perspective, one can have as many stacks as one wishes per user
application with the thread(2) library.

if that's hardware support for any number of stacks or not
is not an interesting question.

> Quite a bit.  Having the processor scan the incoming instruction
> stream to locate potential parallizations is ludicrous.  It works fine
> when the processor guesses correctly, but it is horrendously expensive
> when the processor guesses wrong.  Requiring that the processor scan
> incoming instructions to suss out potential parallelizations also
> means that much less die space for doing real work.

i don't think explictly parallel vs. implicitly parallel is an
question that can be answered without a reference in the
real world.  do you have any references telling me why i
can never get epic-like performance out of a non-epic
cpu, transistor for transistor?

one could consider epic a layering violation.  why do i have
to care how many execution units the architecture defines?

by the way, epic still does speculative execution, etc.
so what does epic get me?

http://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing

i still fail to see how one could call instruction bundles
"simple" at the assembly level.

> IA64 got a bad rap because the first hardware implementations of IA64
> were less than stellar, and the compilers were harder to write than
> expected.  The Itanium-2 and modern compilers are actually quite
> nice.

almost any problem can be worked out in 10 million lines of code
and 2 billion transistors.

i'd really be suprised if itanium could compete with a regular
x86 system for most tasks, since memory bandwidth is so important,
the fastest fsb available for a itanium is 667mhz.  that's
many x86 generations ago.

- erik



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-12 15:04   ` Christopher
  2009-01-12 15:36     ` erik quanstrom
@ 2009-01-14 10:04     ` Christopher
  2009-01-14 10:54       ` erik quanstrom
  1 sibling, 1 reply; 14+ messages in thread
From: Christopher @ 2009-01-14 10:04 UTC (permalink / raw)
  To: 9fans

On Jan 12, 10:42�am, quans...@quanstro.net (erik quanstrom) wrote:
> > [...] Many architectures get register
> > windows wrong, but the Itanium has a variable-length register fill/
> > spill engine that gets invoked automatically. �Of course, you can
> > program the engine too.
>
> what's the advantage of this over the stanford style?

I'm not sure what exactly you mean by that.

>
> >I also REALLY like predicated instructions.
>
> like arm?

ARM is fine, but itanium predicated instructions allow you to have a
great number of predicate registers.  This isn't like cmov and friends
either.

> > That is, you perform an operation and then predicate the instructions
> > that should execute if it comes out the way you want. �It really
> > simplifies assembly-level if/then and switch-style blocks. �
>
> unless it's an 8- or 16-bit part, i don't see why anyone cares
> if the assembly is simplier. �but since this is an epic part,
> the assembly is never simple.


I don't know why bit size matters. Anyway, making the assembly simpler
has a lot of benefits.  A human has to write the stuff at some point.
When there are bugs, a human has to read it.  It also simplifies code
generation by the compiler.

> how do you get around the fact that the parallelism
> is limited by the instruction set and the fact that one
> slow sub-instruction could stall the whole instruction?

Parallelism isn't anymore limited by the instruction set on Itanium
than it is anywhere else.  The processor has multiple issue units that
can crunch multiple instructions in parallel.  Some units can execute
multiple instructions per cycle.

> > The hardware also has built-in support for closures. �Every function
> > executed is implicitly paired with a given local memory region. �
>
> what's the difference between this and stack?

There is a massive difference.  As the other poster pointed out,
closures are cool in and of themselves.
On x86 processors, you get 4 stacks.  One for each privilege level.
You can change a stack anytime you want, but it requires either an
instruction to do so, or instruction patching by the loader.
Everything gets stuck there and there are very few restrictions about
what you do with stuff on the stack.

 On Itanium you have two kinds of stacks AND a global pointer for
local memory accesses.  One kind of stack is much like what you are
used to.  The other kind of stack is ONLY for the register spill/fill
engine and cannot be programmatically accessed while it's in use.
Which means that you can't smash the stack and have the function
return to an arbitrary location.  The global pointer is for indirect
memory accesses, and allows you to do all sorts of interesting
things.  From .dll to simplified thread-local storage.

> > There is a *lot* to like about Itanium.
>
> there's a lot not to like about itanium. �epic means that
> instructions need to be hand-crufted. �in itanium land, you
> schedule instructions. �in x86-64 land, instructions
> schedule you.
>
> what's to like about that?

Quite a bit.  Having the processor scan the incoming instruction
stream to locate potential parallizations is ludicrous.  It works fine
when the processor guesses correctly, but it is horrendously expensive
when the processor guesses wrong.  Requiring that the processor scan
incoming instructions to suss out potential parallelizations also
means that much less die space for doing real work.  Finally, the
processor has almost NO context about the instructions.  A compiler
has immensely more context and can do a much better job indicating
which instructions can execute in parallel.

IA64 got a bad rap because the first hardware implementations of IA64
were less than stellar, and the compilers were harder to write than
expected.  The Itanium-2 and modern compilers are actually quite
nice.

-={C}=-



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-12 15:36     ` erik quanstrom
  2009-01-12 16:29       ` Bakul Shah
@ 2009-01-14 10:04       ` Christopher
  1 sibling, 0 replies; 14+ messages in thread
From: Christopher @ 2009-01-14 10:04 UTC (permalink / raw)
  To: 9fans


> However, in general what Itanium does is not a win since in
> practice most functions do not need local storage (even if
> written in a language richer than C!).


That's not true.  .dlls are the primary use case for this.  If a .dll
has it's own local memory and local allocator, this is a big, big
deal.  The vast majority of plugin issues are memory ownership issues.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-12 15:36     ` erik quanstrom
@ 2009-01-12 16:29       ` Bakul Shah
  2009-01-14 10:04       ` Christopher
  1 sibling, 0 replies; 14+ messages in thread
From: Bakul Shah @ 2009-01-12 16:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, 12 Jan 2009 10:36:37 EST erik quanstrom <quanstro@quanstro.net>  wrote:
> how do you get around the fact that the parallelism
> is limited by the instruction set and the fact that one
> slow sub-instruction could stall the whole instruction?
>
> > The hardware also has built-in support for closures.  Every function
> > executed is implicitly paired with a given local memory region.
>
> what's the difference between this and stack?

Consider this:

(define (counter n)
  (lambda () (set! n (+1 n)) n))

In C like syntax:

(int(*)()) counter(int n) {
	int foo() { return ++n;}
	return foo;
}

...
	int(*c1)() = counter(5);
	int(*c2)() = counter(2);

	int x = c1(); // x is 6
	x += c2(); // x is 6+3
	x += c2(); // x is 9+4

n lives past the lifetime of counter so a stack is not
enough.  So for the returned function from counter(), you
have to allocate n on the heap.  And you need to store ptr
to this space along with the returned function (e.g. c1 has
its own local store, so does c2).

However, in general what Itanium does is not a win since in
practice most functions do not need local storage (even if
written in a language richer than C!).

Such a ptr to the local store associated with a function ptr
can be used to implement objects -- ptrs of all "methods" to
the same object point to the same storage.  Not sure if
anyone has implemented objects this way.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-12 15:04   ` Christopher
@ 2009-01-12 15:36     ` erik quanstrom
  2009-01-12 16:29       ` Bakul Shah
  2009-01-14 10:04       ` Christopher
  2009-01-14 10:04     ` Christopher
  1 sibling, 2 replies; 14+ messages in thread
From: erik quanstrom @ 2009-01-12 15:36 UTC (permalink / raw)
  To: 9fans

> [...] Many architectures get register
> windows wrong, but the Itanium has a variable-length register fill/
> spill engine that gets invoked automatically.  Of course, you can
> program the engine too.

what's the advantage of this over the stanford style?

>I also REALLY like predicated instructions.

like arm?

> That is, you perform an operation and then predicate the instructions
> that should execute if it comes out the way you want.  It really
> simplifies assembly-level if/then and switch-style blocks.

unless it's an 8- or 16-bit part, i don't see why anyone cares
if the assembly is simplier.  but since this is an epic part,
the assembly is never simple.

how do you get around the fact that the parallelism
is limited by the instruction set and the fact that one
slow sub-instruction could stall the whole instruction?

> The hardware also has built-in support for closures.  Every function
> executed is implicitly paired with a given local memory region.

what's the difference between this and stack?

> There is a *lot* to like about Itanium.

there's a lot not to like about itanium.  epic means that
instructions need to be hand-crufted.  in itanium land, you
schedule instructions.  in x86-64 land, instructions
schedule you.

what's to like about that?

- erik




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-08 10:05 ` Christopher
  2009-01-08 13:55   ` erik quanstrom
@ 2009-01-12 15:04   ` Christopher
  2009-01-12 15:36     ` erik quanstrom
  2009-01-14 10:04     ` Christopher
  1 sibling, 2 replies; 14+ messages in thread
From: Christopher @ 2009-01-12 15:04 UTC (permalink / raw)
  To: 9fans

On Jan 8, 9:02�am, quans...@quanstro.net (erik quanstrom) wrote:
> On Thu Jan �8 05:11:37 EST 2009, nadiasver...@gmail.com wrote:
>
> > > Here's my standard true Itanic story. I know a guy who wrote the sin()
> > > intrinsic. His comment: "I do not intend to write cos()".
>
> > I am working on a python ctypes FFI trampoline for IA-64 Windows. �I
> > find the processor architecture lovely. �I am sorry your friend was
> > turned off by it, but it has a number of excellent features. �I wish I
> > could do more IA-64 development.
>
> would you care to share why you think this chip
> is good?

The instruction set is quite lovely.  Many architectures get register
windows wrong, but the Itanium has a variable-length register fill/
spill engine that gets invoked automatically.  Of course, you can
program the engine too.  I also REALLY like predicated instructions.
That is, you perform an operation and then predicate the instructions
that should execute if it comes out the way you want.  It really
simplifies assembly-level if/then and switch-style blocks.  The
hardware also has built-in support for closures.  Every function
executed is implicitly paired with a given local memory region.  There
is a *lot* to like about Itanium.

w/r to the world passing the IA64 by, sadly you can only get decent
IA64 systems from HP.  I haven't been able to find decent boards or
processors available elsewhere.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-08 17:09     ` geoff
@ 2009-01-08 18:14       ` Bakul Shah
  0 siblings, 0 replies; 14+ messages in thread
From: Bakul Shah @ 2009-01-08 18:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, 08 Jan 2009 12:09:51 EST geoff@plan9.bell-labs.com  wrote:
> You don't want to use an amd29k (even if you could get one).
> They look cute on paper but their freeze-mode interrupt
> handling is a Chinese puzzle and unless you use Ken's compiler
> (previously called 9c), you're stuck with register windows,
> which tend to need to be spilled when an interrupt occurs,
> thus slowing interrupt response unpredictably.

It was called freeze mode because on *any* trap/interrupts it
disabled interrupts (and IIRC most traps).  Its designers
tried to apply the "RISC philosophy" to interrupt handling as
well and left most everything upto software.

Its freeze mode trap/interrupt handling was actually pretty
simple as the processor did very little for you!  If you
could do everything in the handler, you didn't need to save
any registers (except the ones you need in the handler).
So for instance TLB handling was a few instructions, done
entirely in freeze mode (unless the page table was invalid).

Trap code to vector to a user mode register spill/fill code
handler was about 5 instructions. The user mode spill/fill
handlers were about 10 instructions each and they were
interruptible.

The painful part was preparing things to call a C language
interrupt handler as it required a consistent stack but
interrupt can occur in the middle of a spill/fill.  You can
save all 128 registers (+ a few special registers) but
typically the code tried to save only the registers in use;
and this added a lot of complexity and variable latency.

It would've been better off with a pair of instructions to
load/store full context (just like in all the CISCs!).  But
of course with so many registers the cost of saving goes up.

But in spite of this wart it was a real pleasure to write
*assembly* code for it.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-08 13:55   ` erik quanstrom
@ 2009-01-08 17:09     ` geoff
  2009-01-08 18:14       ` Bakul Shah
  0 siblings, 1 reply; 14+ messages in thread
From: geoff @ 2009-01-08 17:09 UTC (permalink / raw)
  To: 9fans

You don't want to use an amd29k (even if you could get one).
They look cute on paper but their freeze-mode interrupt
handling is a Chinese puzzle and unless you use Ken's compiler
(previously called 9c), you're stuck with register windows,
which tend to need to be spilled when an interrupt occurs,
thus slowing interrupt response unpredictably.

The one I used, the 29200, was even worse: it had no main
memory caches.  So a 16MHz processor could only achieve about
3 MIPS.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-08 10:05 ` Christopher
@ 2009-01-08 13:55   ` erik quanstrom
  2009-01-08 17:09     ` geoff
  2009-01-12 15:04   ` Christopher
  1 sibling, 1 reply; 14+ messages in thread
From: erik quanstrom @ 2009-01-08 13:55 UTC (permalink / raw)
  To: 9fans

On Thu Jan  8 05:11:37 EST 2009, nadiasvertex@gmail.com wrote:
> > Here's my standard true Itanic story. I know a guy who wrote the sin()
> > intrinsic. His comment: "I do not intend to write cos()".
> >
>
> I am working on a python ctypes FFI trampoline for IA-64 Windows.  I
> find the processor architecture lovely.  I am sorry your friend was
> turned off by it, but it has a number of excellent features.  I wish I
> could do more IA-64 development.

would you care to share why you think this chip
is good?

if it is good, the world has passed itanium by.
the fastest i2 chip has a 667Mhz fsb and the chipset
it's paired with uses ddr 200 (not ddr2) memory.
i couldn't find pricing on i2 motherboards, they're
not popular enough.

if it were all about style points, we'd be using
the 64-bit version of the amd29k.

- erik



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-05 16:33 Benjamin Huntsman
  2009-01-06  5:38 ` ron minnich
@ 2009-01-08 10:05 ` Christopher
  2009-01-08 13:55   ` erik quanstrom
  2009-01-12 15:04   ` Christopher
  1 sibling, 2 replies; 14+ messages in thread
From: Christopher @ 2009-01-08 10:05 UTC (permalink / raw)
  To: 9fans

> Here's my standard true Itanic story. I know a guy who wrote the sin()
> intrinsic. His comment: "I do not intend to write cos()".
>

I am working on a python ctypes FFI trampoline for IA-64 Windows.  I
find the processor architecture lovely.  I am sorry your friend was
turned off by it, but it has a number of excellent features.  I wish I
could do more IA-64 development.

-={C}=-



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] Itanium
  2009-01-05 16:33 Benjamin Huntsman
@ 2009-01-06  5:38 ` ron minnich
  2009-01-08 10:05 ` Christopher
  1 sibling, 0 replies; 14+ messages in thread
From: ron minnich @ 2009-01-06  5:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Jan 5, 2009 at 8:33 AM, Benjamin Huntsman
<BHuntsman@mail2.cu-portland.edu> wrote:
> I know most everyone here hates the Itanium, but it is in some pretty large and fast systems, and it's on the Top500 list.

if you mean thunder, that machine is getting turned off soon. What new
machines have made it on top500? Sorry I can't look just now.

> So out of curiosity, has anyone looked at putting together a compiler for Itanium, or otherwise looked at a Plan 9 port?

Here's my standard true Itanic story. I know a guy who wrote the sin()
intrinsic. His comment: "I do not intend to write cos()".

ron



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [9fans] Itanium
@ 2009-01-05 16:33 Benjamin Huntsman
  2009-01-06  5:38 ` ron minnich
  2009-01-08 10:05 ` Christopher
  0 siblings, 2 replies; 14+ messages in thread
From: Benjamin Huntsman @ 2009-01-05 16:33 UTC (permalink / raw)
  To: 9fans

I know most everyone here hates the Itanium, but it is in some pretty large and fast systems, and it's on the Top500 list.

So out of curiosity, has anyone looked at putting together a compiler for Itanium, or otherwise looked at a Plan 9 port?



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-01-14 10:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-23 21:50 [9fans] Itanium Ben Huntsman
2005-01-23 23:56 ` geoff
2009-01-05 16:33 Benjamin Huntsman
2009-01-06  5:38 ` ron minnich
2009-01-08 10:05 ` Christopher
2009-01-08 13:55   ` erik quanstrom
2009-01-08 17:09     ` geoff
2009-01-08 18:14       ` Bakul Shah
2009-01-12 15:04   ` Christopher
2009-01-12 15:36     ` erik quanstrom
2009-01-12 16:29       ` Bakul Shah
2009-01-14 10:04       ` Christopher
2009-01-14 10:04     ` Christopher
2009-01-14 10:54       ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).