* [9fans] NUMA @ 2011-07-15 15:15 tlaronde 2011-07-15 20:21 ` tlaronde 0 siblings, 1 reply; 32+ messages in thread From: tlaronde @ 2011-07-15 15:15 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs This a generic question and not peculiar to plan9. In my main software (KerGIS), there are often two versions of the data. One version is saved, in a filesystem, in a portable format (size of int and float defined; endianness defined) so that the data can be served by a fileserver and used by whatever kind of CPU. In memory, the structures used to manage processing, and the data itself, are as expected by the CPU as a result of the C types used and the compilation. Writing down an explanation about the differences between the on file saved version, and the runtime structures, I wrote that portable was for sharing between whatever CPU architectures, while the in memory was fitting a particular architecture because the memory is tightly coupled to the cores and not shar... Oups! Hence the question. In my limited view and knowledge on this subject, an elemetary CPU (an atom) is not only a processing unit, but also the main memory tightly coupled with it by some main bus. I guess that the main NUMA _hardware_ are composed of same architecture cores, and there is no mix with different cores architecture. But there are also "software" NUMA. Even if I don't plan at all to change this runtime "localization"---so the question is a theoretical one---are there systems designed for migrating portions of main memory between different architecture cores? Thanks for any lesson. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-15 15:15 [9fans] NUMA tlaronde @ 2011-07-15 20:21 ` tlaronde 2011-07-15 20:47 ` ron minnich 0 siblings, 1 reply; 32+ messages in thread From: tlaronde @ 2011-07-15 20:21 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, Jul 15, 2011 at 05:15:35PM +0200, tlaronde@polynum.com wrote: >[...] > Writing down an explanation about the differences between the on file > saved version, and the runtime structures, I wrote that portable was for > sharing between whatever CPU architectures, while the in memory was > fitting a particular architecture because the memory is tightly coupled > to the cores and not shar... Oups! Hence the question. Thinking about it a little more, whether the whole process memory is migrated, that is not only "data" but instructions; in this case, the new CPU processing has to understand the whole (natively or by emulation) and in this case the program can "ignore" what's going on. Or if the data is separated and shared, it has to be made "portable", by whatever mean but known to the programmer and/or to the binary tools. Well, I don't know if this is totally or only partially stupid. But this was just an "en passant" question. Sorry for the noise. Back to work... -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-15 20:21 ` tlaronde @ 2011-07-15 20:47 ` ron minnich 2011-07-15 22:59 ` Charles Forsyth 2011-07-16 8:02 ` tlaronde 0 siblings, 2 replies; 32+ messages in thread From: ron minnich @ 2011-07-15 20:47 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs http://supertech.csail.mit.edu/porch/ long ago, but I saw it checkpoint between x86 and sparc. ron ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-15 20:47 ` ron minnich @ 2011-07-15 22:59 ` Charles Forsyth 2011-07-16 8:02 ` tlaronde 1 sibling, 0 replies; 32+ messages in thread From: Charles Forsyth @ 2011-07-15 22:59 UTC (permalink / raw) To: 9fans [-- Attachment #1: Type: text/plain, Size: 51 bytes --] that's funny! i was looking at that work last week. [-- Attachment #2: Type: message/rfc822, Size: 2408 bytes --] From: ron minnich <rminnich@gmail.com> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Subject: Re: [9fans] NUMA Date: Fri, 15 Jul 2011 13:47:40 -0700 Message-ID: <CAP6exYK6da_9mVsbRhXf0x-PzUqGFDn4oxK5UE4fCxq31j7RxA@mail.gmail.com> http://supertech.csail.mit.edu/porch/ long ago, but I saw it checkpoint between x86 and sparc. ron ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-15 20:47 ` ron minnich 2011-07-15 22:59 ` Charles Forsyth @ 2011-07-16 8:02 ` tlaronde 2011-07-16 16:27 ` erik quanstrom 2011-07-17 3:39 ` Joel C. Salomon 1 sibling, 2 replies; 32+ messages in thread From: tlaronde @ 2011-07-16 8:02 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, Jul 15, 2011 at 01:47:40PM -0700, ron minnich wrote: > http://supertech.csail.mit.edu/porch/ > > long ago, but I saw it checkpoint between x86 and sparc. Thanks for the pointer! At least it shows that it is always useful to write down the "axiomatics" of one code, since making explicit the assumptions that were only implicit can rise questions. Some time ago, reading about "what makes parallel programming difficult", I discover that all in all the problems arise when a sequence of instructions is not "prédicative" in Poincaré's definition, i.e. "is [predicative] an on-going classification that is not disrupted by the adjunction of new elements". The Itanium story, as guessed early by Hennessy and Patterson in "Computer Architecture", shows that efficiency relying on too complex knowledge, asking too much to the programmers and the compilers, is likely to fail. On the other hand, if the programmer doesn't think at all about these problems, distributed and parallel systems will have hard times and limits and can't do wonders with "spaghetti" code. What is the minimal hints the programmer shall give? At least predicativity. I wonder what minimum set of keywords could be added, say, to C, so that the situation can be greatly improved without the burden being greatly increased. [non-predicative routines being, from a parallel point of view, atomic] -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 8:02 ` tlaronde @ 2011-07-16 16:27 ` erik quanstrom 2011-07-16 18:06 ` tlaronde 2011-07-17 3:39 ` Joel C. Salomon 1 sibling, 1 reply; 32+ messages in thread From: erik quanstrom @ 2011-07-16 16:27 UTC (permalink / raw) To: 9fans > The Itanium story, as guessed early by Hennessy and Patterson in > "Computer Architecture", shows that efficiency relying on too > complex knowledge, asking too much to the programmers and the > compilers, is likely to fail. another way of looking at itanium is that it's like a multicore processor that is programed with a single instruction stream. given a general-purpose workload, it stands to reason that independent threads are going to be scheduled more efficiently and independent threads can be added at will without changing the architechtural model. so it's also easier to scale. - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 16:27 ` erik quanstrom @ 2011-07-16 18:06 ` tlaronde 2011-07-16 19:29 ` Ethan Grammatikidis 2011-07-16 19:54 ` erik quanstrom 0 siblings, 2 replies; 32+ messages in thread From: tlaronde @ 2011-07-16 18:06 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Jul 16, 2011 at 12:27:14PM -0400, erik quanstrom wrote: > > The Itanium story, as guessed early by Hennessy and Patterson in > > "Computer Architecture", shows that efficiency relying on too > > complex knowledge, asking too much to the programmers and the > > compilers, is likely to fail. > > another way of looking at itanium is that it's like a multicore > processor that is programed with a single instruction stream. > given a general-purpose workload, it stands to reason that > independent threads are going to be scheduled more > efficiently and independent threads can be added at will without > changing the architechtural model. so it's also easier to scale. That's probably a legitimate view since the gains from pipelining in current processors were finished and engineers were searching gains elsewhere. But from what I remember when reading the description of the aims of the architecture---in CAQA---, since there was no panacea and no great gain to be easily obtained, optimizations had to rely on special cases and great knowledge of low level details by programmers, and some knowledge of higher level for compilers to do "the right thing(TM)", and that seemed unlikely to work without a lot of pain. If RISC has succeeded, this is precisely because the elements were simple enough to be implemented in hardware, and this simplicity allowed to work reliably on optimizations. There is an english expression, IIRC: penny wise and pound fool. Having the basis right is the main gain. One can compare Plan9, that can be viewed as achieving what MACH was aiming to achieve, while Plan9 is really a micro-kernel (to start with by the size of code), while the MACH like microkernels seem to have survived only in assembly since it was the only mean to get a decent efficiency. But people continued to publish thesis and papers about it---some paragraph in the plan9 presentation paper is about this, if my english is not totally at fault...---, refusing to conclude that the results were showing there was definitively something wrong to start with. But in what was called "science", there is now fashions too. Story telling everywhere... -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 18:06 ` tlaronde @ 2011-07-16 19:29 ` Ethan Grammatikidis 2011-07-16 19:54 ` erik quanstrom 1 sibling, 0 replies; 32+ messages in thread From: Ethan Grammatikidis @ 2011-07-16 19:29 UTC (permalink / raw) To: 9fans On Sat, 16 Jul 2011 20:06:27 +0200 tlaronde@polynum.com wrote: > There is an english expression, IIRC: penny wise and pound fool. Very close: penny wise and pound foolish. (Possibly capitalise Pound to be correct.) I had not heard this expression for years. Now you've reminded me of it, I wonder what else it could be applied to. Gcc perhaps? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 18:06 ` tlaronde 2011-07-16 19:29 ` Ethan Grammatikidis @ 2011-07-16 19:54 ` erik quanstrom 2011-07-16 20:56 ` dexen deVries 1 sibling, 1 reply; 32+ messages in thread From: erik quanstrom @ 2011-07-16 19:54 UTC (permalink / raw) To: 9fans > If RISC has succeeded, this is precisely because the elements were > simple enough to be implemented in hardware, and this simplicity allowed > to work reliably on optimizations. it's interesting you bring this up. risc has largely been removed from architectures. if you tie the instruction set and machine model to the actual hardware, then you need to write new compilers and recompile everything every few years. instead, the risc is hidden and the instruction set stays the same. this allows for a lot of under-the-hood innovation in isolation from the architecture. isolation is generally a good thing, and i don't believe i've seen a compelling argument coupling architechture to implementation is necessary. (by architecture, of course, i mean what you read in the ia64 or amd64 programmer's manual, not the implementation, which intel unhelpfully calls the µarch.) - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 19:54 ` erik quanstrom @ 2011-07-16 20:56 ` dexen deVries 2011-07-16 22:10 ` Charles Forsyth 2011-07-17 10:08 ` Ethan Grammatikidis 0 siblings, 2 replies; 32+ messages in thread From: dexen deVries @ 2011-07-16 20:56 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Saturday 16 July 2011 21:54:33 erik quanstrom wrote: > it's interesting you bring this up. risc has largely been removed > from architectures. if you tie the instruction set and machine model > to the actual hardware, then you need to write new compilers and > recompile everything every few years. instead, the risc is hidden > and the instruction set stays the same. this allows for a lot of > under-the-hood innovation in isolation from the architecture. interesting angle. till now i believed it's easier to innovate in software (even compilers) than in silicon. where did we go wrong that silicon became the easier way? would it be fair to blame GCC and other heavyweight champions? -- dexen deVries > (...) I never use more than 800Mb of RAM. I am running Linux, > a browser and a terminal. rjbond3rd in http://news.ycombinator.com/item?id=2692529 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 20:56 ` dexen deVries @ 2011-07-16 22:10 ` Charles Forsyth 2011-07-17 1:44 ` erik quanstrom 2011-07-17 10:08 ` Ethan Grammatikidis 1 sibling, 1 reply; 32+ messages in thread From: Charles Forsyth @ 2011-07-16 22:10 UTC (permalink / raw) To: 9fans > to the actual hardware, then you need to write new compilers and > recompile everything every few years. you do anyway. i don't think i've used a distribution yet where the upgrade doesn't include completely-recompiled versions of everything. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 22:10 ` Charles Forsyth @ 2011-07-17 1:44 ` erik quanstrom 2011-07-17 7:38 ` tlaronde 0 siblings, 1 reply; 32+ messages in thread From: erik quanstrom @ 2011-07-17 1:44 UTC (permalink / raw) To: 9fans On Sat Jul 16 18:07:28 EDT 2011, forsyth@terzarima.net wrote: > > to the actual hardware, then you need to write new compilers and > > recompile everything every few years. > > you do anyway. i don't think i've used a distribution yet > where the upgrade doesn't include completely-recompiled versions of > everything. i've been able to upgrade my systems here through a number of µarches (intel xeon 5[0456]00, 3[04]00, atom; amd phenom) that weren't around when i first installed my systems, and i'm still using most of the original binaries. the hardware isn't forcing an upgrade. - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 1:44 ` erik quanstrom @ 2011-07-17 7:38 ` tlaronde 2011-07-17 8:44 ` Bakul Shah 2011-07-17 15:51 ` erik quanstrom 0 siblings, 2 replies; 32+ messages in thread From: tlaronde @ 2011-07-17 7:38 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Jul 16, 2011 at 09:44:02PM -0400, erik quanstrom wrote: > On Sat Jul 16 18:07:28 EDT 2011, forsyth@terzarima.net wrote: > > > to the actual hardware, then you need to write new compilers and > > > recompile everything every few years. > > > > you do anyway. i don't think i've used a distribution yet > > where the upgrade doesn't include completely-recompiled versions of > > everything. > > i've been able to upgrade my systems here through a number of µarches > (intel xeon 5[0456]00, 3[04]00, atom; amd phenom) that weren't around > when i first installed my systems, and i'm still using most of the original > binaries. the hardware isn't forcing an upgrade. But that's possible because the "real" hardware is RISC, and there is a software level managing compatibility (microcode). That's why too, hardware can be "patched". My point is more that when the optimization path is complex, the probability is the combined one (the product); hence it is useless to expect a not neglibible gain for the total, specially when the "best case" for each single optimization is almost orthogonal to all the others. Furthermore, I don't know for others, but I prefer correctness over speed. I mean, if a program is proved to be correct (and very few are), complex acrobatics from the compiler, namely in the "optimization" area, able to wreak havoc all the code assumptions, is something I don't buy. I have an example with gcc4.4, compiling not my source, but D.E.K.'s TeX. With gcc3.x, the "-O2" didn't produce a failing program. With gcc4.4 suddenly the "-O2" does produce a program that does not crash but fail (the offending optimization is ` -foptimize-sibling-calls'). The code is not at fault (since it fails not where glue code is added for the WEB to C translation, but in TeX inners; I mean, it's not my small mundain added code, it's pure original D.E.K.'s). But how can one rely on a binary that is so mangled that the fact that you do not see it fail when testing does not prove it will yield a correct result? And, furthermore, that the code is so chewed that the proofs of correctness on the source level do not guarantee anything about the correctness of the compiled result? My gut feeling is that the whole process is going too far, is too complex to be "maintenable" (to be hold in one hand), and that some marginal gains in specific cases are obtained by a general ruin if not of the certainty, at least of some confidence in correctness. And I don't buy that. I would much prefer hints given by the programmer on the source code level. And if the programmer doesn't understand what he's doing, I don't think the compiler will understand it better. But it's the same "philosophy" as the one of the autotools, utilities trying to apply heuristic rules about code made without rules at all. But I guess a e-compiler (indeed a i-compiler) will be the software version of the obsolete lollypop... -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 7:38 ` tlaronde @ 2011-07-17 8:44 ` Bakul Shah 2011-07-17 10:02 ` tlaronde 2011-07-17 15:24 ` erik quanstrom 2011-07-17 15:51 ` erik quanstrom 1 sibling, 2 replies; 32+ messages in thread From: Bakul Shah @ 2011-07-17 8:44 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sun, 17 Jul 2011 09:38:47 +0200 tlaronde@polynum.com wrote: > > Furthermore, I don't know for others, but I prefer correctness over > speed. I mean, if a program is proved to be correct (and very few are), > complex acrobatics from the compiler, namely in the "optimization" area, > able to wreak havoc all the code assumptions, is something I don't buy. C's design has compromises in favor of speed to correctness (mainly by underspecifying, by leaving more things upto the implementor). So if you really prefer correctness over speed, you should be using Scheme, ML or Haskell etc but not C! But note that for what C is used for, this compromise is fine (IMHO). But this has made its semantics significantly more complex. C doesn't even have a well defined formal semantics (there have been attempts to define denotational semantics for C subsets but never the whole language, and even such a subset specification is significantly larger than, say, Scheme's). Also note that the ISA implementations these days are quite complex (perhaps even more than your typical program). We don't see this complexty because it is all hidden behind a relatively simple ISA. But remember the FOOF bug? Usually the vendor has a long errata list (typically only available on a need to know basis and only under NDA!). And usually they don't formally prove the implementation right; they just run zillions of test vectors! I bet you would be scandalized if you knew what they do :-) > But how can one rely on a binary that is so mangled that the fact > that you do not see it fail when testing does not prove it will > yield a correct result? And, furthermore, that the code is so chewed > that the proofs of correctness on the source level do not guarantee > anything about the correctness of the compiled result? Most all complex programs have bugs. gcc does, clang does, and so does plan9 cc. The difference is in the degree of bugginess. One uses the best tool available for a given job and then learns to work around its problems. The problem with C/C++ optimization is that these languages are quite complex and it is not always easy to figure out the correct equivalent operations under all conditions. Contrast that with Stalin which does whole program optimization of R4RS Scheme programs and does it extremely well (but extremely slowly!). > My gut feeling is that the whole process is going too far, is too > complex to be "maintenable" (to be hold in one hand), and that some > marginal gains in specific cases are obtained by a general ruin if not > of the certainty, at least of some confidence in correctness. I seriously think you will be happier with Scheme! ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 8:44 ` Bakul Shah @ 2011-07-17 10:02 ` tlaronde 2011-07-17 12:04 ` dexen deVries 2011-07-17 15:24 ` erik quanstrom 1 sibling, 1 reply; 32+ messages in thread From: tlaronde @ 2011-07-17 10:02 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sun, Jul 17, 2011 at 01:44:11AM -0700, Bakul Shah wrote: > On Sun, 17 Jul 2011 09:38:47 +0200 tlaronde@polynum.com wrote: > > > > Furthermore, I don't know for others, but I prefer correctness over > > speed. I mean, if a program is proved to be correct (and very few are), > > complex acrobatics from the compiler, namely in the "optimization" area, > > able to wreak havoc all the code assumptions, is something I don't buy. > > C's design has compromises in favor of speed to correctness > (mainly by underspecifying, by leaving more things upto the > implementor). So if you really prefer correctness over speed, > you should be using Scheme, ML or Haskell etc but not C! Yes and no. IMHO one of the greatest strengths of C is that the language is small (standard lib is apart) and its description short and to the point. K&R and, at least, ANSI C were short and if there are subtleties (promotions... signed/unsigned etc.), knowing what is guaranteed and what is not can be achieved. (And C is a general purpose language, but not an all purposes language [standard C]: calculus is not its realm, since even integer overflow is unspecified.) My woe is more that an optimization can say "this may improve speed (or may not, even slow down processing...)": OK. But an optimization that can break a program, that is an optimization whose correctness is not guaranteed, is something I can't understand why it is even proposed (since I fail to see why someone would be happy to have more rapidly an incorrect result, that can even not be said to be close to the correct one for some epsilon...). >[...] > Also note that the ISA implementations these days are quite > complex (perhaps even more than your typical program). We > don't see this complexty because it is all hidden behind a > relatively simple ISA. But remember the FOOF bug? Usually the > vendor has a long errata list (typically only available on a > need to know basis and only under NDA!). And usually they > don't formally prove the implementation right; they just run > zillions of test vectors! I bet you would be scandalized if > you knew what they do :-) Scandalized, perhaps... surprised: not! Because stating explicitely the domain of definition of operations on not integer, not scaled and certainly not "reals" is not trivial at all precisely by lack of uniformity, specially on a higher level when you consider not one manipulation, but a sequence... And the the decreasing size of the hardware components will lead to impredictability by design! >[...] > I seriously think you will be happier with Scheme! That's on my TODO list. Since a long time now... ;) -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 10:02 ` tlaronde @ 2011-07-17 12:04 ` dexen deVries 0 siblings, 0 replies; 32+ messages in thread From: dexen deVries @ 2011-07-17 12:04 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sunday 17 July 2011 12:02:45 tlaronde@polynum.com wrote: > My woe is more that an optimization can say "this may improve speed (or > may not, even slow down processing...)": OK. But an optimization > that can break a program, that is an optimization whose correctness > is not guaranteed, is something I can't understand why it is even > proposed (since I fail to see why someone would be happy to have > more rapidly an incorrect result, that can even not be said to be close > to the correct one for some epsilon...). optimizations make it more likely to trip over undefined behavior -- one that `was-somehow-working' without it. http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html -- dexen deVries > (...) I never use more than 800Mb of RAM. I am running Linux, > a browser and a terminal. rjbond3rd in http://news.ycombinator.com/item?id=2692529 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 8:44 ` Bakul Shah 2011-07-17 10:02 ` tlaronde @ 2011-07-17 15:24 ` erik quanstrom 2011-07-17 15:28 ` ron minnich ` (2 more replies) 1 sibling, 3 replies; 32+ messages in thread From: erik quanstrom @ 2011-07-17 15:24 UTC (permalink / raw) To: 9fans On Sun Jul 17 04:45:18 EDT 2011, bakul@bitblocks.com wrote: > Also note that the ISA implementations these days are quite > complex (perhaps even more than your typical program). We > don't see this complexty because it is all hidden behind a > relatively simple ISA. But remember the FOOF bug? Usually the > vendor has a long errata list (typically only available on a > need to know basis and only under NDA!). And usually they > don't formally prove the implementation right; they just run > zillions of test vectors! I bet you would be scandalized if > you knew what they do :-) i have the errata. i've read them. and i find them reassuring. you might find that surprising, but the longer and more detailed the errata, the longer and more intricate the testing was. also long errata sheets, especially of really arcane bugs indicate the vendor isn't sweeping embarassing ones under the rug. i've seen parts with 2-3 errata that were just buggy. they hadn't even tested some large bits of functionality once! on the other hand some processors i work with have very long errata, but none of them matter. intel kindly makes the errata available to the public for their gbe controllers. e.g. http://download.intel.com/design/network/specupdt/322444.pdf page 15, errata#10 is typical. the spec was violated, but it is difficult to imagine working hardware for which this would matter. i can't speak for vendors on why errata is sometimes nda, but i would imagine that the main fear is that the errata can reveal too much about the implementation. on the other hand, many vendors have open errata. i've yet to see need-to-know errata. by the way, proving an implementation correct seems simply impossible. many errata (perhaps like the one i mentioned) come down to variations in the process that might not have met the models. and how would you prove that one of the many physical steps in producing a chip correct anyway? - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 15:24 ` erik quanstrom @ 2011-07-17 15:28 ` ron minnich [not found] ` <CAP6exYL2DJXbKfPZ8+D5uL=fRWKEyr8vY2OVc4NTO3wsFo=Unw@mail.gmail.c> 2011-07-17 17:16 ` Bakul Shah 2 siblings, 0 replies; 32+ messages in thread From: ron minnich @ 2011-07-17 15:28 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sun, Jul 17, 2011 at 8:24 AM, erik quanstrom <quanstro@quanstro.net> wrote: > i can't speak for vendors on why errata is sometimes nda, one no-longer-existing vendor once told me that some errata could expose them to patent lawsuits. They were not sure so would not release such info until they had no choice. > i've yet to see need-to-know > errata. it exists :-( ron ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <CAP6exYL2DJXbKfPZ8+D5uL=fRWKEyr8vY2OVc4NTO3wsFo=Unw@mail.gmail.c>]
* Re: [9fans] NUMA [not found] ` <CAP6exYL2DJXbKfPZ8+D5uL=fRWKEyr8vY2OVc4NTO3wsFo=Unw@mail.gmail.c> @ 2011-07-17 15:32 ` erik quanstrom 0 siblings, 0 replies; 32+ messages in thread From: erik quanstrom @ 2011-07-17 15:32 UTC (permalink / raw) To: 9fans > > i've yet to see need-to-know > > errata. > > it exists :-( i'm sure it does. but that's not even the worst case. the worse case is when there is no errata at all! - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 15:24 ` erik quanstrom 2011-07-17 15:28 ` ron minnich [not found] ` <CAP6exYL2DJXbKfPZ8+D5uL=fRWKEyr8vY2OVc4NTO3wsFo=Unw@mail.gmail.c> @ 2011-07-17 17:16 ` Bakul Shah 2011-07-17 17:21 ` erik quanstrom 2 siblings, 1 reply; 32+ messages in thread From: Bakul Shah @ 2011-07-17 17:16 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Jul 17, 2011, at 8:24 AM, erik quanstrom <quanstro@quanstro.net> wrote: > On Sun Jul 17 04:45:18 EDT 2011, bakul@bitblocks.com wrote: > >> Also note that the ISA implementations these days are quite >> complex (perhaps even more than your typical program). We >> don't see this complexty because it is all hidden behind a >> relatively simple ISA. But remember the FOOF bug? Usually the >> vendor has a long errata list (typically only available on a >> need to know basis and only under NDA!). And usually they >> don't formally prove the implementation right; they just run >> zillions of test vectors! I bet you would be scandalized if >> you knew what they do :-) > > i have the errata. i've read them. and i find them reassuring. > you might find that surprising, but the longer and more detailed > the errata, the longer and more intricate the testing was. also > long errata sheets, especially of really arcane bugs indicate the > vendor isn't sweeping embarassing ones under the rug. i've > seen parts with 2-3 errata that were just buggy. they hadn't even > tested some large bits of functionality once! on the other hand > some processors i work with have very long errata, but none of > them matter. intel kindly makes the errata available to the public > for their gbe controllers. e.g. > > http://download.intel.com/design/network/specupdt/322444.pdf > page 15, errata#10 is typical. the spec was violated, but it is > difficult to imagine working hardware for which this would matter. > > i can't speak for vendors on why errata is sometimes nda, > but i would imagine that the main fear is that the errata can > reveal too much about the implementation. on the other hand, > many vendors have open errata. i've yet to see need-to-know > errata. I am sure (or sure hope) things have changed but in at two cases in the past the vendor reps told me that yes the bug was known *after* I told them I has logic analyzer traces that showed the bug. One a very well known CPU vendor, the a scsi chip manufacturer. I suspect incidents like the FOOF bug changed attitudes quite a bit, at least for vendors like intel. > by the way, proving an implementation correct seems simply > impossible. many errata (perhaps like the one i mentioned) > come down to variations in the process that might not have > met the models. and how would you prove that one of the > many physical steps in producing a chip correct anyway? You can perhaps prove logical properties for simpler subsystems (ALU for instance). Or generate logic from a description in HLL such as Scheme, which might be easier to prove, but of course then you have to worry about the translator! But not the physical processes. I do think more formal proof method might get used as more and more parallelism gets exploited. The combinatorial explosion of testing might lead us there! Anyway, my point was just that there are no certainties; just degrees of uncertainties! You should *almost always* opt for speed (and simplicity) by figuring out how much uncertainty will be tolerated by your customers:-) A 99.9% solution available today has more value than a 100% solution that is 10 times slower and a year late. 99.9% of the time! But I guess that is my engineer's bias! > > - erik > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 17:16 ` Bakul Shah @ 2011-07-17 17:21 ` erik quanstrom 0 siblings, 0 replies; 32+ messages in thread From: erik quanstrom @ 2011-07-17 17:21 UTC (permalink / raw) To: 9fans > I am sure (or sure hope) things have changed but in at two cases in > the past the vendor reps told me that yes the bug was known *after* I > told them I has logic analyzer traces that showed the bug. One a very > well known CPU vendor, the a scsi chip manufacturer. unfortunately some companies hide behind reps that don't know squat. that's definately true. unfortunately for them, not beliving in karma doesn't make it not exist. - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 7:38 ` tlaronde 2011-07-17 8:44 ` Bakul Shah @ 2011-07-17 15:51 ` erik quanstrom 2011-07-17 16:12 ` dexen deVries 2011-07-17 16:37 ` tlaronde 1 sibling, 2 replies; 32+ messages in thread From: erik quanstrom @ 2011-07-17 15:51 UTC (permalink / raw) To: 9fans > > i've been able to upgrade my systems here through a number of µarches > > (intel xeon 5[0456]00, 3[04]00, atom; amd phenom) that weren't around > > when i first installed my systems, and i'm still using most of the original > > binaries. the hardware isn't forcing an upgrade. > > But that's possible because the "real" hardware is RISC, and there is a > software level managing compatibility (microcode). That's why too, > hardware can be "patched". the "real hardware" depends on the cisc layer. a significant amount of x86 performance depends on the fact that x86 isa code is very dense and is used across much slower links than exist within a core. it's not clear to me that real cpu guys would call the guts of modern intel/amd/whatever risc at all. the µops don't exist at the level of any traditional isa. iirc, almost all isa -> µop translations are handled by hardware for intel. i shouldn't be so lazy and look this up again. > My point is more that when the optimization path is complex, the > probability is the combined one (the product); hence it is useless to > expect a not neglibible gain for the total, specially when the "best > case" for each single optimization is almost orthogonal to all the > others. > > Furthermore, I don't know for others, but I prefer correctness over > speed. I mean, if a program is proved to be correct (and very few are), > complex acrobatics from the compiler, namely in the "optimization" area, > able to wreak havoc all the code assumptions, is something I don't buy. > > I have an example with gcc4.4, compiling not my source, but D.E.K.'s > TeX. i think you're mixing apples and oranges. gcc has nothing to do with whatever is running inside a procesor, microcode or not. it is appalling that the gcc guys don't appear to do anything most people would call validation or qa. but that doesn't mean that everybody else has such a poor quality record. - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 15:51 ` erik quanstrom @ 2011-07-17 16:12 ` dexen deVries 2011-07-17 16:37 ` tlaronde 1 sibling, 0 replies; 32+ messages in thread From: dexen deVries @ 2011-07-17 16:12 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sunday 17 July 2011 17:51:04 erik quanstrom wrote: > the "real hardware" depends on the cisc layer. a significant amount of > x86 performance depends on the fact that x86 isa code is very dense and is > used across much slower links than exist within a core. ((at the risk of sounding very silly)) there was that Transmeta Efficeon, which matched clock-for-clock performance of Pentium 3, and watt-for-watt was way, way ahead. it's core was 256bit VLIW (no idea if RISCy), and Transmeta's firmware translated x86 code dynamically into native format. if it was launched those days when multicore is a-OK, it'd shine; but back when desktops were singlecore and servers few-core-is, it fell flat because of singlecore performance :-( -- dexen deVries > (...) I never use more than 800Mb of RAM. I am running Linux, > a browser and a terminal. rjbond3rd in http://news.ycombinator.com/item?id=2692529 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 15:51 ` erik quanstrom 2011-07-17 16:12 ` dexen deVries @ 2011-07-17 16:37 ` tlaronde 1 sibling, 0 replies; 32+ messages in thread From: tlaronde @ 2011-07-17 16:37 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sun, Jul 17, 2011 at 11:51:04AM -0400, erik quanstrom wrote: > [...] > > iirc, almost all isa -> µop translations are handled > by hardware for intel. i shouldn't be so lazy and look this up > again. >From what I read, IIRC (for example in Hennessy and Patterson, some years ago), even the x86 family has RISC underneath. > > > > > I have an example with gcc4.4, compiling not my source, but D.E.K.'s > > TeX. > > i think you're mixing apples and oranges. gcc has nothing to do with > whatever is running inside a procesor, microcode or not. It is an illustration of the result of complexity, not a direct match to hardware. This is the evolution that is becoming worrying. At the beginning, programmers were directly programming the machine. Since it was a pain, some assembly languages were born; but their symbolic and almost macro-definition kind made a direct translation so an easy guarantee. This is definitively not the case anymore with something in between that does more and more complex (and hidden) things: the compiler set. Languages are more and more "high level" that is far from the hardware. The hardware is more and more complex and not pure hardware. The result of a "story" (the source) written by someone (programmer) who does not know exactly what he says; with a compiler that does not tell what it does; feeding a hardware that can not guarantee it will do exactly what it's told to, this result is not, shall we say, soothing. I know that english speaking culture is fond of mystery and magics (from Shakespeare to Harry Potter; Lord of the Rings to Batman and so on). And perhaps the "Red Dragon" book about compiler is meant precisely to emphasize that programming is kind of some Arthurian initiation, fighting amidst the fog. But my french cartesian brains are not a perfect match for this ;) -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 20:56 ` dexen deVries 2011-07-16 22:10 ` Charles Forsyth @ 2011-07-17 10:08 ` Ethan Grammatikidis 2011-07-17 14:50 ` erik quanstrom 1 sibling, 1 reply; 32+ messages in thread From: Ethan Grammatikidis @ 2011-07-17 10:08 UTC (permalink / raw) To: 9fans On Sat, 16 Jul 2011 22:56:53 +0200 dexen deVries <dexen.devries@gmail.com> wrote: > On Saturday 16 July 2011 21:54:33 erik quanstrom wrote: > > it's interesting you bring this up. risc has largely been removed > > from architectures. if you tie the instruction set and machine model > > to the actual hardware, then you need to write new compilers and > > recompile everything every few years. instead, the risc is hidden > > and the instruction set stays the same. this allows for a lot of > > under-the-hood innovation in isolation from the architecture. > > > interesting angle. till now i believed it's easier to innovate in software > (even compilers) than in silicon. where did we go wrong that silicon became > the easier way? would it be fair to blame GCC and other heavyweight champions? Gcc has mutual incompatibilities between different versions of itself, caused by its attempts to correctly interpret the heavyweight C standards we have today, but I wouldn't say gcc is the big problem. Some of the most essential libraries in a Linux system are real bugbears to compile, particularly for a new arch. I'd say it's just part of the ossification of software today. It's become extremely rigid and brittle, perhaps even more so in open source than commercial contexts. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 10:08 ` Ethan Grammatikidis @ 2011-07-17 14:50 ` erik quanstrom 2011-07-17 17:01 ` Ethan Grammatikidis 0 siblings, 1 reply; 32+ messages in thread From: erik quanstrom @ 2011-07-17 14:50 UTC (permalink / raw) To: 9fans > Gcc has mutual incompatibilities between different versions of itself, > caused by its attempts to correctly interpret the heavyweight C > standards we have today, but I wouldn't say gcc is the big problem. > Some of the most essential libraries in a Linux system are real > bugbears to compile, particularly for a new arch. actually the incompatabilities are sometimes caused by gcc's changing abi, but most often by gnu libc, which treats backwards compatability like a fish fillet—something that should be tossed in three days or less. why do you think the size or complexity of the code has anything to do with it? - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 14:50 ` erik quanstrom @ 2011-07-17 17:01 ` Ethan Grammatikidis 0 siblings, 0 replies; 32+ messages in thread From: Ethan Grammatikidis @ 2011-07-17 17:01 UTC (permalink / raw) To: 9fans On Sun, 17 Jul 2011 10:50:50 -0400 erik quanstrom <quanstro@quanstro.net> wrote: > why do you think the size or complexity of the code has anything > to do with it? Good question. I'm not sure I an give a good answer. I do think systems get less flexible as they get more complex. I suppose that isn't provable or always true, but it's empirically "proven" true in my Linux desktop use. As bad as old X11 was, I have repeatedly seen relatively simple ways of solving problems cut off by desktop "solutions" containing implementation details I could never see any use for at all. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-16 8:02 ` tlaronde 2011-07-16 16:27 ` erik quanstrom @ 2011-07-17 3:39 ` Joel C. Salomon 2011-07-17 7:01 ` tlaronde 1 sibling, 1 reply; 32+ messages in thread From: Joel C. Salomon @ 2011-07-17 3:39 UTC (permalink / raw) To: 9fans On 07/16/2011 04:02 AM, tlaronde@polynum.com wrote: > What is the minimal hints the programmer shall give? At least > predicativity. I wonder what minimum set of keywords could be added, > say, to C, so that the situation can be greatly improved without the > burden being greatly increased. [non-predicative routines being, from > a parallel point of view, atomic] Have a look at what the C1x standard is proposing wrt atomics. --Joel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 3:39 ` Joel C. Salomon @ 2011-07-17 7:01 ` tlaronde 2011-07-17 15:05 ` Joel C. Salomon 2011-07-17 15:26 ` erik quanstrom 0 siblings, 2 replies; 32+ messages in thread From: tlaronde @ 2011-07-17 7:01 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Jul 16, 2011 at 11:39:50PM -0400, Joel C. Salomon wrote: > On 07/16/2011 04:02 AM, tlaronde@polynum.com wrote: > > What is the minimal hints the programmer shall give? At least > > predicativity. I wonder what minimum set of keywords could be added, > > say, to C, so that the situation can be greatly improved without the > > burden being greatly increased. [non-predicative routines being, from > > a parallel point of view, atomic] > > Have a look at what the C1x standard is proposing wrt atomics. Thanks for the tip! BTW, if I understand correctly the purpose of the next C standard, I guess there is no urge for kencc to support C99 since it is already a transitory only partially supported standard. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 7:01 ` tlaronde @ 2011-07-17 15:05 ` Joel C. Salomon 2011-07-17 15:26 ` erik quanstrom 1 sibling, 0 replies; 32+ messages in thread From: Joel C. Salomon @ 2011-07-17 15:05 UTC (permalink / raw) To: 9fans On 07/17/2011 03:01 AM, tlaronde@polynum.com wrote: > On Sat, Jul 16, 2011 at 11:39:50PM -0400, Joel C. Salomon wrote: >> On 07/16/2011 04:02 AM, tlaronde@polynum.com wrote: >>> I wonder what minimum set of keywords could be added, >>> say, to C, so that the situation can be greatly improved without the >>> burden being greatly increased. [non-predicative routines being, from >>> a parallel point of view, atomic] >> >> Have a look at what the C1x standard is proposing wrt atomics. > > Thanks for the tip! > > BTW, if I understand correctly the purpose of the next C standard, I > guess there is no urge for kencc to support C99 > since it is already a transitory only partially supported standard. The only place in which that's relevant is that C1x creates language subsets and some of the new language features are optional. (I.e., if your compiler doesn't implement feature x, predefine this macro X and you can still call your compiler conforming.) The only C99 feature listed as optional is VLAs. BTW, C1x standardizes part of kencc's nested-anonymous-struct feature. --Joel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 7:01 ` tlaronde 2011-07-17 15:05 ` Joel C. Salomon @ 2011-07-17 15:26 ` erik quanstrom 2011-07-17 15:52 ` ComeauAt9Fans@gmail.com 1 sibling, 1 reply; 32+ messages in thread From: erik quanstrom @ 2011-07-17 15:26 UTC (permalink / raw) To: 9fans > BTW, if I understand correctly the purpose of the next C standard, I > guess there is no urge for kencc to support C99 > since it is already a transitory only partially supported standard. ken's compiler supports the bits of c99 that people have found important/useful. see /sys/src/cmd/cc/c99. - erik ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [9fans] NUMA 2011-07-17 15:26 ` erik quanstrom @ 2011-07-17 15:52 ` ComeauAt9Fans@gmail.com 0 siblings, 0 replies; 32+ messages in thread From: ComeauAt9Fans@gmail.com @ 2011-07-17 15:52 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Jul 17, 2011, at 11:26 AM, erik quanstrom <quanstro@quanstro.net> wrote: >> BTW, if I understand correctly the purpose of the next C standard, I >> guess there is no urge for kencc to support C99 >> since it is already a transitory only partially supported standard. > > ken's compiler supports the bits of c99 that people have found > important/useful. see /sys/src/cmd/cc/c99. Maybe I'm misunderstanding the inflection, but Isn't it so that C99 used "Ken C" for some of its additions at the time? ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2011-07-17 17:21 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-15 15:15 [9fans] NUMA tlaronde 2011-07-15 20:21 ` tlaronde 2011-07-15 20:47 ` ron minnich 2011-07-15 22:59 ` Charles Forsyth 2011-07-16 8:02 ` tlaronde 2011-07-16 16:27 ` erik quanstrom 2011-07-16 18:06 ` tlaronde 2011-07-16 19:29 ` Ethan Grammatikidis 2011-07-16 19:54 ` erik quanstrom 2011-07-16 20:56 ` dexen deVries 2011-07-16 22:10 ` Charles Forsyth 2011-07-17 1:44 ` erik quanstrom 2011-07-17 7:38 ` tlaronde 2011-07-17 8:44 ` Bakul Shah 2011-07-17 10:02 ` tlaronde 2011-07-17 12:04 ` dexen deVries 2011-07-17 15:24 ` erik quanstrom 2011-07-17 15:28 ` ron minnich [not found] ` <CAP6exYL2DJXbKfPZ8+D5uL=fRWKEyr8vY2OVc4NTO3wsFo=Unw@mail.gmail.c> 2011-07-17 15:32 ` erik quanstrom 2011-07-17 17:16 ` Bakul Shah 2011-07-17 17:21 ` erik quanstrom 2011-07-17 15:51 ` erik quanstrom 2011-07-17 16:12 ` dexen deVries 2011-07-17 16:37 ` tlaronde 2011-07-17 10:08 ` Ethan Grammatikidis 2011-07-17 14:50 ` erik quanstrom 2011-07-17 17:01 ` Ethan Grammatikidis 2011-07-17 3:39 ` Joel C. Salomon 2011-07-17 7:01 ` tlaronde 2011-07-17 15:05 ` Joel C. Salomon 2011-07-17 15:26 ` erik quanstrom 2011-07-17 15:52 ` ComeauAt9Fans@gmail.com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).