* [9fans] Itanium @ 2009-01-05 16:33 Benjamin Huntsman 2009-01-06 5:38 ` ron minnich 2009-01-08 10:05 ` Christopher 0 siblings, 2 replies; 14+ messages in thread From: Benjamin Huntsman @ 2009-01-05 16:33 UTC (permalink / raw) To: 9fans I know most everyone here hates the Itanium, but it is in some pretty large and fast systems, and it's on the Top500 list. So out of curiosity, has anyone looked at putting together a compiler for Itanium, or otherwise looked at a Plan 9 port? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-05 16:33 [9fans] Itanium Benjamin Huntsman @ 2009-01-06 5:38 ` ron minnich 2009-01-08 10:05 ` Christopher 1 sibling, 0 replies; 14+ messages in thread From: ron minnich @ 2009-01-06 5:38 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, Jan 5, 2009 at 8:33 AM, Benjamin Huntsman <BHuntsman@mail2.cu-portland.edu> wrote: > I know most everyone here hates the Itanium, but it is in some pretty large and fast systems, and it's on the Top500 list. if you mean thunder, that machine is getting turned off soon. What new machines have made it on top500? Sorry I can't look just now. > So out of curiosity, has anyone looked at putting together a compiler for Itanium, or otherwise looked at a Plan 9 port? Here's my standard true Itanic story. I know a guy who wrote the sin() intrinsic. His comment: "I do not intend to write cos()". ron ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-05 16:33 [9fans] Itanium Benjamin Huntsman 2009-01-06 5:38 ` ron minnich @ 2009-01-08 10:05 ` Christopher 2009-01-08 13:55 ` erik quanstrom 2009-01-12 15:04 ` Christopher 1 sibling, 2 replies; 14+ messages in thread From: Christopher @ 2009-01-08 10:05 UTC (permalink / raw) To: 9fans > Here's my standard true Itanic story. I know a guy who wrote the sin() > intrinsic. His comment: "I do not intend to write cos()". > I am working on a python ctypes FFI trampoline for IA-64 Windows. I find the processor architecture lovely. I am sorry your friend was turned off by it, but it has a number of excellent features. I wish I could do more IA-64 development. -={C}=- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-08 10:05 ` Christopher @ 2009-01-08 13:55 ` erik quanstrom 2009-01-08 17:09 ` geoff 2009-01-12 15:04 ` Christopher 1 sibling, 1 reply; 14+ messages in thread From: erik quanstrom @ 2009-01-08 13:55 UTC (permalink / raw) To: 9fans On Thu Jan 8 05:11:37 EST 2009, nadiasvertex@gmail.com wrote: > > Here's my standard true Itanic story. I know a guy who wrote the sin() > > intrinsic. His comment: "I do not intend to write cos()". > > > > I am working on a python ctypes FFI trampoline for IA-64 Windows. I > find the processor architecture lovely. I am sorry your friend was > turned off by it, but it has a number of excellent features. I wish I > could do more IA-64 development. would you care to share why you think this chip is good? if it is good, the world has passed itanium by. the fastest i2 chip has a 667Mhz fsb and the chipset it's paired with uses ddr 200 (not ddr2) memory. i couldn't find pricing on i2 motherboards, they're not popular enough. if it were all about style points, we'd be using the 64-bit version of the amd29k. - erik ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-08 13:55 ` erik quanstrom @ 2009-01-08 17:09 ` geoff 2009-01-08 18:14 ` Bakul Shah 0 siblings, 1 reply; 14+ messages in thread From: geoff @ 2009-01-08 17:09 UTC (permalink / raw) To: 9fans You don't want to use an amd29k (even if you could get one). They look cute on paper but their freeze-mode interrupt handling is a Chinese puzzle and unless you use Ken's compiler (previously called 9c), you're stuck with register windows, which tend to need to be spilled when an interrupt occurs, thus slowing interrupt response unpredictably. The one I used, the 29200, was even worse: it had no main memory caches. So a 16MHz processor could only achieve about 3 MIPS. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-08 17:09 ` geoff @ 2009-01-08 18:14 ` Bakul Shah 0 siblings, 0 replies; 14+ messages in thread From: Bakul Shah @ 2009-01-08 18:14 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, 08 Jan 2009 12:09:51 EST geoff@plan9.bell-labs.com wrote: > You don't want to use an amd29k (even if you could get one). > They look cute on paper but their freeze-mode interrupt > handling is a Chinese puzzle and unless you use Ken's compiler > (previously called 9c), you're stuck with register windows, > which tend to need to be spilled when an interrupt occurs, > thus slowing interrupt response unpredictably. It was called freeze mode because on *any* trap/interrupts it disabled interrupts (and IIRC most traps). Its designers tried to apply the "RISC philosophy" to interrupt handling as well and left most everything upto software. Its freeze mode trap/interrupt handling was actually pretty simple as the processor did very little for you! If you could do everything in the handler, you didn't need to save any registers (except the ones you need in the handler). So for instance TLB handling was a few instructions, done entirely in freeze mode (unless the page table was invalid). Trap code to vector to a user mode register spill/fill code handler was about 5 instructions. The user mode spill/fill handlers were about 10 instructions each and they were interruptible. The painful part was preparing things to call a C language interrupt handler as it required a consistent stack but interrupt can occur in the middle of a spill/fill. You can save all 128 registers (+ a few special registers) but typically the code tried to save only the registers in use; and this added a lot of complexity and variable latency. It would've been better off with a pair of instructions to load/store full context (just like in all the CISCs!). But of course with so many registers the cost of saving goes up. But in spite of this wart it was a real pleasure to write *assembly* code for it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-08 10:05 ` Christopher 2009-01-08 13:55 ` erik quanstrom @ 2009-01-12 15:04 ` Christopher 2009-01-12 15:36 ` erik quanstrom 2009-01-14 10:04 ` Christopher 1 sibling, 2 replies; 14+ messages in thread From: Christopher @ 2009-01-12 15:04 UTC (permalink / raw) To: 9fans On Jan 8, 9:02�am, quans...@quanstro.net (erik quanstrom) wrote: > On Thu Jan �8 05:11:37 EST 2009, nadiasver...@gmail.com wrote: > > > > Here's my standard true Itanic story. I know a guy who wrote the sin() > > > intrinsic. His comment: "I do not intend to write cos()". > > > I am working on a python ctypes FFI trampoline for IA-64 Windows. �I > > find the processor architecture lovely. �I am sorry your friend was > > turned off by it, but it has a number of excellent features. �I wish I > > could do more IA-64 development. > > would you care to share why you think this chip > is good? The instruction set is quite lovely. Many architectures get register windows wrong, but the Itanium has a variable-length register fill/ spill engine that gets invoked automatically. Of course, you can program the engine too. I also REALLY like predicated instructions. That is, you perform an operation and then predicate the instructions that should execute if it comes out the way you want. It really simplifies assembly-level if/then and switch-style blocks. The hardware also has built-in support for closures. Every function executed is implicitly paired with a given local memory region. There is a *lot* to like about Itanium. w/r to the world passing the IA64 by, sadly you can only get decent IA64 systems from HP. I haven't been able to find decent boards or processors available elsewhere. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-12 15:04 ` Christopher @ 2009-01-12 15:36 ` erik quanstrom 2009-01-12 16:29 ` Bakul Shah 2009-01-14 10:04 ` Christopher 2009-01-14 10:04 ` Christopher 1 sibling, 2 replies; 14+ messages in thread From: erik quanstrom @ 2009-01-12 15:36 UTC (permalink / raw) To: 9fans > [...] Many architectures get register > windows wrong, but the Itanium has a variable-length register fill/ > spill engine that gets invoked automatically. Of course, you can > program the engine too. what's the advantage of this over the stanford style? >I also REALLY like predicated instructions. like arm? > That is, you perform an operation and then predicate the instructions > that should execute if it comes out the way you want. It really > simplifies assembly-level if/then and switch-style blocks. unless it's an 8- or 16-bit part, i don't see why anyone cares if the assembly is simplier. but since this is an epic part, the assembly is never simple. how do you get around the fact that the parallelism is limited by the instruction set and the fact that one slow sub-instruction could stall the whole instruction? > The hardware also has built-in support for closures. Every function > executed is implicitly paired with a given local memory region. what's the difference between this and stack? > There is a *lot* to like about Itanium. there's a lot not to like about itanium. epic means that instructions need to be hand-crufted. in itanium land, you schedule instructions. in x86-64 land, instructions schedule you. what's to like about that? - erik ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-12 15:36 ` erik quanstrom @ 2009-01-12 16:29 ` Bakul Shah 2009-01-14 10:04 ` Christopher 1 sibling, 0 replies; 14+ messages in thread From: Bakul Shah @ 2009-01-12 16:29 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, 12 Jan 2009 10:36:37 EST erik quanstrom <quanstro@quanstro.net> wrote: > how do you get around the fact that the parallelism > is limited by the instruction set and the fact that one > slow sub-instruction could stall the whole instruction? > > > The hardware also has built-in support for closures. Every function > > executed is implicitly paired with a given local memory region. > > what's the difference between this and stack? Consider this: (define (counter n) (lambda () (set! n (+1 n)) n)) In C like syntax: (int(*)()) counter(int n) { int foo() { return ++n;} return foo; } ... int(*c1)() = counter(5); int(*c2)() = counter(2); int x = c1(); // x is 6 x += c2(); // x is 6+3 x += c2(); // x is 9+4 n lives past the lifetime of counter so a stack is not enough. So for the returned function from counter(), you have to allocate n on the heap. And you need to store ptr to this space along with the returned function (e.g. c1 has its own local store, so does c2). However, in general what Itanium does is not a win since in practice most functions do not need local storage (even if written in a language richer than C!). Such a ptr to the local store associated with a function ptr can be used to implement objects -- ptrs of all "methods" to the same object point to the same storage. Not sure if anyone has implemented objects this way. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-12 15:36 ` erik quanstrom 2009-01-12 16:29 ` Bakul Shah @ 2009-01-14 10:04 ` Christopher 1 sibling, 0 replies; 14+ messages in thread From: Christopher @ 2009-01-14 10:04 UTC (permalink / raw) To: 9fans > However, in general what Itanium does is not a win since in > practice most functions do not need local storage (even if > written in a language richer than C!). That's not true. .dlls are the primary use case for this. If a .dll has it's own local memory and local allocator, this is a big, big deal. The vast majority of plugin issues are memory ownership issues. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-12 15:04 ` Christopher 2009-01-12 15:36 ` erik quanstrom @ 2009-01-14 10:04 ` Christopher 2009-01-14 10:54 ` erik quanstrom 1 sibling, 1 reply; 14+ messages in thread From: Christopher @ 2009-01-14 10:04 UTC (permalink / raw) To: 9fans On Jan 12, 10:42�am, quans...@quanstro.net (erik quanstrom) wrote: > > [...] Many architectures get register > > windows wrong, but the Itanium has a variable-length register fill/ > > spill engine that gets invoked automatically. �Of course, you can > > program the engine too. > > what's the advantage of this over the stanford style? I'm not sure what exactly you mean by that. > > >I also REALLY like predicated instructions. > > like arm? ARM is fine, but itanium predicated instructions allow you to have a great number of predicate registers. This isn't like cmov and friends either. > > That is, you perform an operation and then predicate the instructions > > that should execute if it comes out the way you want. �It really > > simplifies assembly-level if/then and switch-style blocks. � > > unless it's an 8- or 16-bit part, i don't see why anyone cares > if the assembly is simplier. �but since this is an epic part, > the assembly is never simple. I don't know why bit size matters. Anyway, making the assembly simpler has a lot of benefits. A human has to write the stuff at some point. When there are bugs, a human has to read it. It also simplifies code generation by the compiler. > how do you get around the fact that the parallelism > is limited by the instruction set and the fact that one > slow sub-instruction could stall the whole instruction? Parallelism isn't anymore limited by the instruction set on Itanium than it is anywhere else. The processor has multiple issue units that can crunch multiple instructions in parallel. Some units can execute multiple instructions per cycle. > > The hardware also has built-in support for closures. �Every function > > executed is implicitly paired with a given local memory region. � > > what's the difference between this and stack? There is a massive difference. As the other poster pointed out, closures are cool in and of themselves. On x86 processors, you get 4 stacks. One for each privilege level. You can change a stack anytime you want, but it requires either an instruction to do so, or instruction patching by the loader. Everything gets stuck there and there are very few restrictions about what you do with stuff on the stack. On Itanium you have two kinds of stacks AND a global pointer for local memory accesses. One kind of stack is much like what you are used to. The other kind of stack is ONLY for the register spill/fill engine and cannot be programmatically accessed while it's in use. Which means that you can't smash the stack and have the function return to an arbitrary location. The global pointer is for indirect memory accesses, and allows you to do all sorts of interesting things. From .dll to simplified thread-local storage. > > There is a *lot* to like about Itanium. > > there's a lot not to like about itanium. �epic means that > instructions need to be hand-crufted. �in itanium land, you > schedule instructions. �in x86-64 land, instructions > schedule you. > > what's to like about that? Quite a bit. Having the processor scan the incoming instruction stream to locate potential parallizations is ludicrous. It works fine when the processor guesses correctly, but it is horrendously expensive when the processor guesses wrong. Requiring that the processor scan incoming instructions to suss out potential parallelizations also means that much less die space for doing real work. Finally, the processor has almost NO context about the instructions. A compiler has immensely more context and can do a much better job indicating which instructions can execute in parallel. IA64 got a bad rap because the first hardware implementations of IA64 were less than stellar, and the compilers were harder to write than expected. The Itanium-2 and modern compilers are actually quite nice. -={C}=- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2009-01-14 10:04 ` Christopher @ 2009-01-14 10:54 ` erik quanstrom 0 siblings, 0 replies; 14+ messages in thread From: erik quanstrom @ 2009-01-14 10:54 UTC (permalink / raw) To: 9fans On Wed Jan 14 05:12:07 EST 2009, nadiasvertex@gmail.com wrote: > On Jan 12, 10:42 am, quans...@quanstro.net (erik quanstrom) wrote: > > > [...] Many architectures get register > > > windows wrong, but the Itanium has a variable-length register fill/ > > > spill engine that gets invoked automatically. Of course, you can > > > program the engine too. > > > > what's the advantage of this over the stanford style? > > I'm not sure what exactly you mean by that. http://en.wikipedia.org/wiki/Register_window > ARM is fine, but itanium predicated instructions allow you to have a > great number of predicate registers. This isn't like cmov and friends > either. does this buy anything in practice? references? > > unless it's an 8- or 16-bit part, i don't see why anyone cares > > if the assembly is simplier. but since this is an epic part, > > the assembly is never simple. > > > I don't know why bit size matters. Anyway, making the assembly simpler > has a lot of benefits. A human has to write the stuff at some point. > When there are bugs, a human has to read it. It also simplifies code > generation by the compiler. bit size matters because little 8- and 16-bit parts are so constrained that one's best option is generally writing assembler. (hint on an 8 bit computer, addressable memory is 256 bytes.) for a 64-bit cpu, writing a substantial amount of code in assembler is a waste of time. here are less than 2k lines of 386 assembler in the kernel and libc, (1919 on my system). > It also simplifies code generation by the compiler. having to build parallel instructions is a hard enough problem, it delayed the introduction of the itanium by several years and it's the reason amd had a window to sneak amd64 through. > > how do you get around the fact that the parallelism > > is limited by the instruction set and the fact that one > > slow sub-instruction could stall the whole instruction? > > Parallelism isn't anymore limited by the instruction set on Itanium > than it is anywhere else. The processor has multiple issue units that > can crunch multiple instructions in parallel. Some units can execute > multiple instructions per cycle. okay, then. please explain why it helps to have an explictly parallel instruction set with a architechturally defined number of parallel slots? adding cores makes like much flexable and easier to understand. (and i don't need to recompile or write a new compiler.) > There is a massive difference. As the other poster pointed out, > closures are cool in and of themselves. what do they get me? dlls don't count. plan 9 doesn't have dynamic linking. > On x86 processors, you get 4 stacks. One for each privilege level. > You can change a stack anytime you want, but it requires either an > instruction to do so, or instruction patching by the loader. > Everything gets stuck there and there are very few restrictions about > what you do with stuff on the stack. i don't think anybody cares about the intricate details of who sets up the stack or how it is managed. from the user's perspective, one can have as many stacks as one wishes per user application with the thread(2) library. if that's hardware support for any number of stacks or not is not an interesting question. > Quite a bit. Having the processor scan the incoming instruction > stream to locate potential parallizations is ludicrous. It works fine > when the processor guesses correctly, but it is horrendously expensive > when the processor guesses wrong. Requiring that the processor scan > incoming instructions to suss out potential parallelizations also > means that much less die space for doing real work. i don't think explictly parallel vs. implicitly parallel is an question that can be answered without a reference in the real world. do you have any references telling me why i can never get epic-like performance out of a non-epic cpu, transistor for transistor? one could consider epic a layering violation. why do i have to care how many execution units the architecture defines? by the way, epic still does speculative execution, etc. so what does epic get me? http://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing i still fail to see how one could call instruction bundles "simple" at the assembly level. > IA64 got a bad rap because the first hardware implementations of IA64 > were less than stellar, and the compilers were harder to write than > expected. The Itanium-2 and modern compilers are actually quite > nice. almost any problem can be worked out in 10 million lines of code and 2 billion transistors. i'd really be suprised if itanium could compete with a regular x86 system for most tasks, since memory bandwidth is so important, the fastest fsb available for a itanium is 667mhz. that's many x86 generations ago. - erik ^ permalink raw reply [flat|nested] 14+ messages in thread
* [9fans] Itanium @ 2005-01-23 21:50 Ben Huntsman 2005-01-23 23:56 ` geoff 0 siblings, 1 reply; 14+ messages in thread From: Ben Huntsman @ 2005-01-23 21:50 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs; +Cc: Inferno mailing list Quick question- Have any attempts been made to port Plan 9 to Itanium? Any attempts to build Inferno hosted on an Itanium system? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [9fans] Itanium 2005-01-23 21:50 Ben Huntsman @ 2005-01-23 23:56 ` geoff 0 siblings, 0 replies; 14+ messages in thread From: geoff @ 2005-01-23 23:56 UTC (permalink / raw) To: 9fans I'm not aware of any attempts to port anything to the Itanic. The Itanic has hit the iceberg x86-64 (AMD64) and all aboard have been lost. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-01-14 10:54 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-01-05 16:33 [9fans] Itanium Benjamin Huntsman 2009-01-06 5:38 ` ron minnich 2009-01-08 10:05 ` Christopher 2009-01-08 13:55 ` erik quanstrom 2009-01-08 17:09 ` geoff 2009-01-08 18:14 ` Bakul Shah 2009-01-12 15:04 ` Christopher 2009-01-12 15:36 ` erik quanstrom 2009-01-12 16:29 ` Bakul Shah 2009-01-14 10:04 ` Christopher 2009-01-14 10:04 ` Christopher 2009-01-14 10:54 ` erik quanstrom -- strict thread matches above, loose matches on Subject: below -- 2005-01-23 21:50 Ben Huntsman 2005-01-23 23:56 ` geoff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).