* [TUHS] Re: Maximum Array Sizes in 16 bit C
@ 2024-09-18 23:51 Douglas McIlroy
2024-09-18 23:57 ` Henry Bent
0 siblings, 1 reply; 17+ messages in thread
From: Douglas McIlroy @ 2024-09-18 23:51 UTC (permalink / raw)
To: TUHS main list, henry.r.bent
[-- Attachment #1: Type: text/plain, Size: 360 bytes --]
> The array size} limit that I found through trial and error is (2^15)-1.
> Declaring an array that is [larger] results in an error of "Constant
required",
On its face, it states that anything bigger cannot be an integer constant,
which is reasonable because that's the largest (signed) integer value. Does
that version of C support unsigned constants?
Doug
[-- Attachment #2: Type: text/html, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-18 23:51 [TUHS] Re: Maximum Array Sizes in 16 bit C Douglas McIlroy @ 2024-09-18 23:57 ` Henry Bent 2024-09-19 13:13 ` Rich Salz 0 siblings, 1 reply; 17+ messages in thread From: Henry Bent @ 2024-09-18 23:57 UTC (permalink / raw) To: Douglas McIlroy; +Cc: TUHS main list [-- Attachment #1: Type: text/plain, Size: 855 bytes --] On Wed, 18 Sept 2024 at 19:51, Douglas McIlroy < douglas.mcilroy@dartmouth.edu> wrote: > > The array size} limit that I found through trial and error is (2^15)-1. > > Declaring an array that is [larger] results in an error of "Constant > required", > > On its face, it states that anything bigger cannot be an integer constant, > which is reasonable because that's the largest (signed) integer value. Does > that version of C support unsigned constants? > I believe that it does support (16 bit) unsigned int, but I don't think that it supports (32 bit) unsigned long, only signed long. That's a great suggestion of a place to start. Following Nelson's suggestion, if there need to be negative references in array accesses (which certainly makes sense to me, on its face), it seems reasonable to have whatever intermediate variable be signed. -Henry [-- Attachment #2: Type: text/html, Size: 1273 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-18 23:57 ` Henry Bent @ 2024-09-19 13:13 ` Rich Salz 2024-09-20 13:33 ` Paul Winalski 2024-09-20 19:40 ` Leah Neukirchen 0 siblings, 2 replies; 17+ messages in thread From: Rich Salz @ 2024-09-19 13:13 UTC (permalink / raw) To: Henry Bent; +Cc: Douglas McIlroy, TUHS main list [-- Attachment #1: Type: text/plain, Size: 439 bytes --] > > if there need to be negative references in array accesses (which certainly > makes sense to me, on its face), it seems reasonable to have whatever > intermediate variable be signed. > In my first C programming job I saw the source to V7 grep which had a "foo[-2]" construct. It was a moment of enlightenment and another bit of K&R fell into place. ( https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/grep.c; search for "[-") [-- Attachment #2: Type: text/html, Size: 823 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-19 13:13 ` Rich Salz @ 2024-09-20 13:33 ` Paul Winalski 2024-09-20 15:07 ` Dave Horsfall 2024-09-20 15:26 ` Rich Salz 2024-09-20 19:40 ` Leah Neukirchen 1 sibling, 2 replies; 17+ messages in thread From: Paul Winalski @ 2024-09-20 13:33 UTC (permalink / raw) To: Rich Salz; +Cc: Douglas McIlroy, TUHS main list [-- Attachment #1: Type: text/plain, Size: 1018 bytes --] On Thu, Sep 19, 2024 at 7:52 PM Rich Salz <rich.salz@gmail.com> wrote: > > In my first C programming job I saw the source to V7 grep which had a > "foo[-2]" construct. > That sort of thing is very dangerous with modern compilers. Does K&R C require that variables be allocated in the order that they are declared? If not, you're playing with fire. To get decent performance out of modern processors, the compiler must perform data placement to maximize cache efficiency, and that practically guarantees that you can't rely on out-of-bounds array references. Unless "foo" were a pointer that the programmer explicitly pointed to the inside of a larger data structure. In that case you could guarantee that the construct would work reliably. But by pointing into the middle of another data structure you've created a data aliasing situation, and that complicates compiler data flow analysis and can block important optimizations. Things were much simpler when V7 was written. -Paul W. [-- Attachment #2: Type: text/html, Size: 1457 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 13:33 ` Paul Winalski @ 2024-09-20 15:07 ` Dave Horsfall 2024-09-20 15:30 ` Larry McVoy ` (3 more replies) 2024-09-20 15:26 ` Rich Salz 1 sibling, 4 replies; 17+ messages in thread From: Dave Horsfall @ 2024-09-20 15:07 UTC (permalink / raw) To: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 1514 bytes --] On Fri, 20 Sep 2024, Paul Winalski wrote: > On Thu, Sep 19, 2024 at 7:52 PM Rich Salz <rich.salz@gmail.com> wrote: > > In my first C programming job I saw the source to V7 grep which > had a "foo[-2]" construct. > > That sort of thing is very dangerous with modern compilers. Does K&R C > require that variables be allocated in the order that they are declared? If > not, you're playing with fire. To get decent performance out of modern > processors, the compiler must perform data placement to maximize cache > efficiency, and that practically guarantees that you can't rely on > out-of-bounds array references. [...] Unless I'm mistaken (quite possible at my age), the OP was referring to that in C, pointers and arrays are pretty much the same thing i.e. "foo[-2]" means "take the pointer 'foo' and go back two things" (whatever a "thing" is). C is just a high level assembly language; there is no such object as a "string" for example: it's just an "array of char" with the last element being "\0" (viz: "strlen" vs. "sizeof". What's the length of "abc" vs. how many bytes are needed to store it? > Things were much simpler when V7 was written. Giggle... In a device driver I wrote for V6, I used the expression "0123"[n] and the two programmers whom I thought were better than me had to ask me what it did... -- Dave, brought up on PDP-11 Unix[*] [*] I still remember the days of BOS/PICK/etc, and I staked my career on Unix. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 15:07 ` Dave Horsfall @ 2024-09-20 15:30 ` Larry McVoy 2024-09-20 15:56 ` Stuff Received ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: Larry McVoy @ 2024-09-20 15:30 UTC (permalink / raw) To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society On Sat, Sep 21, 2024 at 01:07:11AM +1000, Dave Horsfall wrote: > On Fri, 20 Sep 2024, Paul Winalski wrote: > > > On Thu, Sep 19, 2024 at 7:52???PM Rich Salz <rich.salz@gmail.com> wrote: > > > > In my first C programming job I saw the source to V7 grep which > > had a "foo[-2]" construct. > > > > That sort of thing is very dangerous with modern compilers.?? Does K&R C > > require that variables be allocated in the order that they are declared??? If > > not, you're playing with fire.?? To get decent performance out of modern > > processors, the compiler must perform data placement to maximize cache > > efficiency, and that practically guarantees that you can't rely on > > out-of-bounds array references. > > [...] > > Unless I'm mistaken (quite possible at my age), the OP was referring to > that in C, pointers and arrays are pretty much the same thing i.e. > "foo[-2]" means "take the pointer 'foo' and go back two things" (whatever > a "thing" is). Yes, but that was a stack variable. Let me see if I can say it more clearly. foo() { int a = 1, b = 2; int alias[5]; alias[-2] = 0; // try and set a to 0. } In v7 days, the stack would look like [stuff] [2 bytes for a] [2 bytes for b] [2 bytes for the alias address, which I think points forward] [10 bytes for alias contents] I'm hazy on how the space for alias[] is allocated, so I made that up. It's probably something like I said but Paul (or someone) will correct me. When using a negative index for alias[], the coder is assuming that the stack variables are placed in the order they were declared. Paul tried to explain that _might_ be true but is not always true. Modern compilers will look see which variables are used the most in the function, and place them next to each other so that if you have the cache line for one heavily used variable, the other one is right there next to it. Like so: int heavy1 = 1; int rarely1 = 2; int spacer[10]; int heavy2 = 3; int rarel2 = 4; The compiler might figure out that heavy{1,2} are used a lot and lay out the stack like so: [2 bytes (or 4 or 8 these days) for heavy1] [bytes for heavy2] [bytes for rarely1] [bytes for spacer[10]] [bytes for rarely2] Paul was saying that using a negative index in the array creates an alias, or another name, for the scalar integer on the stack (his description made me understand, for the first time in decades, why compiler writers hate aliases and I get it now). Aliases mess hard with optimizers. Optimizers may reorder the stack for better cache line usage and what you think array[-2] means doesn't work any more unless the optimizer catches that you made an alias and preserves it. Paul, how did I do? I'm not a compiler guy, just had to learn enough to walk the stack when the kernel panics. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 15:07 ` Dave Horsfall 2024-09-20 15:30 ` Larry McVoy @ 2024-09-20 15:56 ` Stuff Received 2024-09-20 16:14 ` Dan Cross 2024-09-20 17:11 ` G. Branden Robinson 3 siblings, 0 replies; 17+ messages in thread From: Stuff Received @ 2024-09-20 15:56 UTC (permalink / raw) To: COFF Moved to COFF. On 2024-09-20 11:07, Dave Horsfall wrote (in part): > > Giggle... In a device driver I wrote for V6, I used the expression > > "0123"[n] > > and the two programmers whom I thought were better than me had to ask me > what it did... > > -- Dave, brought up on PDP-11 Unix[*] > > [*] > I still remember the days of BOS/PICK/etc, and I staked my career on Unix. Working on embedded systems, we often used constructs such as a[-4] to either read or modify stuff on the stack (for that particular compiler+processor only). S. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 15:07 ` Dave Horsfall 2024-09-20 15:30 ` Larry McVoy 2024-09-20 15:56 ` Stuff Received @ 2024-09-20 16:14 ` Dan Cross 2024-09-20 17:11 ` G. Branden Robinson 3 siblings, 0 replies; 17+ messages in thread From: Dan Cross @ 2024-09-20 16:14 UTC (permalink / raw) To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society On Fri, Sep 20, 2024 at 11:17 AM Dave Horsfall <dave@horsfall.org> wrote: > On Fri, 20 Sep 2024, Paul Winalski wrote: > > On Thu, Sep 19, 2024 at 7:52 PM Rich Salz <rich.salz@gmail.com> wrote: > > In my first C programming job I saw the source to V7 grep which > > had a "foo[-2]" construct. > > > > That sort of thing is very dangerous with modern compilers. Does K&R C > > require that variables be allocated in the order that they are declared? If > > not, you're playing with fire. To get decent performance out of modern > > processors, the compiler must perform data placement to maximize cache > > efficiency, and that practically guarantees that you can't rely on > > out-of-bounds array references. > > [...] > > Unless I'm mistaken (quite possible at my age), the OP was referring to > that in C, pointers and arrays are pretty much the same thing i.e. > "foo[-2]" means "take the pointer 'foo' and go back two things" (whatever > a "thing" is). Where I've usually seen this idiom is in things like: char foo[10]; char *p = foo + 5; p[-2] = 'a'; /* set foo[3] to 'a' */ But as Paul pointed out, a) this relies on aliasing the bytes in `foo`, and b) is UB if the (negative) index falls below the beginning of the underlying object (e.g., the array `foo`). > C is just a high level assembly language; there is no such object as a > "string" for example: it's just an "array of char" with the last element > being "\0" (viz: "strlen" vs. "sizeof". Sadly, this hasn't been true for a few decades; arguably since optimizing compilers for C started to become common in the 70s. Trying to treat C as a high-level macro assembler is dangerous, as Paul pointed out, even though a lot of us feel like we can "see" the assembly that a line of C code will likely emit. While in many cases we are probably right (or close to it), C _compiler writers_ don't think in those terms, but rather think in terms of operations targeting the abstract virtual machine loosely described by the language standard. Caveat emptor, there be dragons. > What's the length of "abc" vs. how many bytes are needed to store it? > > > Things were much simpler when V7 was written. > > Giggle... In a device driver I wrote for V6, I used the expression > > "0123"[n] > > and the two programmers whom I thought were better than me had to ask me > what it did... Fortunately, this is still legal. :-) - Dan C. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 15:07 ` Dave Horsfall ` (2 preceding siblings ...) 2024-09-20 16:14 ` Dan Cross @ 2024-09-20 17:11 ` G. Branden Robinson 2024-09-20 20:16 ` Bakul Shah via TUHS 3 siblings, 1 reply; 17+ messages in thread From: G. Branden Robinson @ 2024-09-20 17:11 UTC (permalink / raw) To: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 9598 bytes --] At 2024-09-21T01:07:11+1000, Dave Horsfall wrote: > Unless I'm mistaken (quite possible at my age), the OP was referring > to that in C, pointers and arrays are pretty much the same thing i.e. > "foo[-2]" means "take the pointer 'foo' and go back two things" > (whatever a "thing" is). "in C, pointers and arrays are pretty much the same thing" is a common utterance but misleading, and in my opinion, better replaced with a different one. We should instead say something more like: In C, pointers and arrays have compatible dereference syntaxes. They do _not_ have compatible _declaration_ syntaxes. Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_ (1994) tackles this issue head-on and at length. Here's the salient point. "Consider the case of an external declaration `extern char *p;` but a definition of `char p[10];`. When we retrieve the contents of `p[i]` using the extern, we get characters, but we treat it as a pointer. Interpreting ASCII characters as an address is garbage, and if you're lucky the program will coredump at that point. If you're not lucky it will corrupt something in your address space, causing a mysterious failure at some point later in the program." > C is just a high level assembly language; I disagree with this common claim too. Assembly languages correspond to well-defined machine models.[1] Those machine models have memory models. C has no memory model--deliberately, because that would have gotten in the way of performance. (In practice, C's machine model was and remains the PDP-11,[2] with aspects thereof progressively sanded off over the years in repeated efforts to salvage the language's reputation for portability.) > there is no such object as a "string" for example: it's just an "array > of char" with the last element being "\0" (viz: "strlen" vs. "sizeof". Yeah, it turns out we need a well-defined string type much more powerfully than, it seems, anyone at the Bell Labs CSRC appreciated. string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the end of the 1970s and C aficionados have defended the language's purported perfection with such vigor that they annexed the haphazardly assembled standard library into the territory that they defend with much rhetorical violence and overstatement. From useless or redundant return values to const-carelessness to Schlemiel the Painter algorithms in implementations, it seems we've collectively made every mistake that could be made with Nelson's original, minimal API, and taught those mistakes as best practices in tutorials and classrooms. A sorry affair. So deep was this disdain for the string as a well-defined data type, and moreover one conceptually distinct from an array (or vector) of integral types that Stroustrup initially repeated the mistake in C++. People can easily roll their own, he seemed to have thought. Eventually he thought again, but C++ took so long to get standardized that by then, damage was done. "A string is just an array of `char`s, and a `char` is just a byte"--another hasty equivalence that surrendered a priceless hostage to fortune. This is the sort of fallacy indulged by people excessively wedded to machine language programming and who apply its perspective to every problem statement uncritically. Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow" characters, and "base" vs. "combining" characters, the champions of the "portable assembly" paradigm charged like Lord Cardigan into the pike and musket lines of the character type as one might envision it in a machine register. (This insistence on visualizing register-level representations has prompted numerous other stupidities, like the use of an integral zero at the _language level_ to represent empty, null, or false literals for as many different data types as possible. "If it ends up as a zero in a register," the thinking appears to have gone, "it should look like a zero in the source code." Generations of code--and language--cowboys have screwed us all over repeatedly with this hasty equivalence. Type theorists have known better for decades. But type theory is (1) hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy day in the sun (for which we may be grateful), which means that is seldom on the path one anticipates to a comfortable retirement from a Silicon Valley tech company (or several) on a private yacht. Why do I rant so splenetically about these issues? Because the result of such confusion is _bugs in programs_. You want something concrete? There it is. Data types protect you from screwing up. And the better your data types are, the more care you give to specifying what sorts of objects your program manipulates, the more thought you give to the invariants that must be maintained for your program to remain in a well-defined state, the fewer bugs you will have. But, nah, better to slap together a prototype, ship it, talk it up to the moon as your latest triumph while interviewing with a rival of the company you just delivered that prototype to, and look on in amusement when your brilliant achievement either proves disastrous in deployment or soaks up the waking hours of an entire team of your former colleagues cleaning up the steaming pile you voided from your rock star bowels. We've paid a heavy price for C's slow and seemingly deeply grudging embrace of the type concept. (The lack of controlled scope for enumeration constants is one example; the horrifyingly ill-conceived choice of "typedef" as a keyword indicating _type aliasing_ is another.) Kernighan did not help by trashing Pascal so hard in about 1980. He was dead right that Pascal needed, essentially, polymorphic subprograms in array types. Wirth not speccing the language to accommodate that back in 1973 or so was a sad mistake. But Pascal got a lot of other stuff right--stuff that the partisanship of C advocates refused to countenance such that they ended up celebrating C's flaws as features. No amount of Jonestown tea could quench their thirst. I suspect the truth was more that they didn't want to bother having to learn any other languages. (Or if they did, not any language that anyone else on their team at work had any facility with.) A rock star plays only one instrument, no? People didn't like it when Eddie Van Halen played keyboards instead of guitar on stage, so he stopped doing that. The less your coworkers understand your work, the more of a genius you must be. Now, where was I? > What's the length of "abc" vs. how many bytes are needed to store it? Even what is meant by "length" has several different correct answers! Quantity of code points in the sequence? Number of "grapheme clusters" a.k.a. "user-perceived characters" as Unicode puts it? Width as represented on the output device? On an ASCII device these usually had the same answer (control characters excepted). But even at the Bell Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't necessarily have to. (How wide is an em dash? How many bytes represent it, in the formatting language and in the output language?) > Giggle... In a device driver I wrote for V6, I used the expression > > "0123"[n] > > and the two programmers whom I thought were better than me had to ask > me what it did... > > -- Dave, brought up on PDP-11 Unix[*] I enjoy this application of that technique, courtesy of Alan Cox. fsck-fuzix: blow 90 bytes on a progress indicator static void progress(void) { static uint8_t progct; progct++; progct&=3; printf("%c\010", "-\\|/"[progct]); fflush(stdout); } > I still remember the days of BOS/PICK/etc, and I staked my career on > Unix. Not a bad choice. Your exposure to and recollection of other ways of doing things, I suspect, made you a more valuable contributor than those who mazed themselves with thoughts of "the Unix way" to the point that they never seriously considered any other. It's fine to prefer "the C way" or "the Unix way", if you can intelligibly define what that means as applied to the issue in dispute, and coherently defend it. Demonstrating an understanding of the alternatives, and being able to credibly explain why they are inferior approaches, is how to do advocacy correctly. But it is not the cowboy way. The rock star way. Regards, Branden [1] Unfortunately I must concede that this claim is less true than it used to be thanks to the relentless pursuit of trade-secret means of optimizing hardware performance. Assembly languages now correspond, particularly on x86, to a sort of macro language that imperfectly masks a massive amount of microarchitectural state that the implementors themselves don't completely understand, at least not in time to get the product to market. Hence the field day of speculative execution attacks and similar. It would not be fair to say that CPUs of old had _no_ microarchitectural state--the Z80, for example, had the not-completely-official `W` and `Z` registers--but they did have much less of it, and correspondingly less attack surface for screwing your programs. I do miss the days of deterministic cycle counts for instruction execution. But I know I'd be sad if all the caches on my workaday machine switched off. [2] https://queue.acm.org/detail.cfm?id=3212479 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 17:11 ` G. Branden Robinson @ 2024-09-20 20:16 ` Bakul Shah via TUHS 2024-09-20 20:58 ` Warner Losh 0 siblings, 1 reply; 17+ messages in thread From: Bakul Shah via TUHS @ 2024-09-20 20:16 UTC (permalink / raw) To: G. Branden Robinson; +Cc: The Eunuchs Hysterical Society You are a bit late with your screed. You will find posts with similar sentiments starting back in 1980s in Usenet groups such as comp.lang.{c,misc,pascal}. Perhaps a more interesting (but likely pointless) question is what is the *least* that can be done to fix C's major problems. Compilers can easily add bounds checking for the array[index] construct but ptr[index] can not be checked, unless we make a ptr a heavy weight object such as (address, start, limit). One can see how code can be generated for code such as this: Foo x[count]; Foo* p = x + n; // or &x[n] Code such as "Foo *p = malloc(size);" would require the compiler to know how malloc behaves to be able to compute the limit. But for a user to write a similar function will require some language extension. [Of course, if we did that, adding proper support for multidimensional slices would be far easier. But that is an exploration for another day!] Converting enums to behave like Pascal scalars would likely break things. The question is, can such breakage be fixed automatically (by source code conversion)? C's union type is used in two different ways: 1: similar to a sum type, which can be done type safely and 2: to cheat. The compiler should produce a warning when it can't verify a typesafe use -- one can add "unsafe" or some such to let the user absolve the compiler of such check. [May be naively] I tend to think one can evolve C this way and fix a lot of code &/or make a lot of bugs more explicit. > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson <g.branden.robinson@gmail.com> wrote: > > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote: >> Unless I'm mistaken (quite possible at my age), the OP was referring >> to that in C, pointers and arrays are pretty much the same thing i.e. >> "foo[-2]" means "take the pointer 'foo' and go back two things" >> (whatever a "thing" is). > > "in C, pointers and arrays are pretty much the same thing" is a common > utterance but misleading, and in my opinion, better replaced with a > different one. > > We should instead say something more like: > > In C, pointers and arrays have compatible dereference syntaxes. > > They do _not_ have compatible _declaration_ syntaxes. > > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_ > (1994) tackles this issue head-on and at length. > > Here's the salient point. > > "Consider the case of an external declaration `extern char *p;` but a > definition of `char p[10];`. When we retrieve the contents of `p[i]` > using the extern, we get characters, but we treat it as a pointer. > Interpreting ASCII characters as an address is garbage, and if you're > lucky the program will coredump at that point. If you're not lucky it > will corrupt something in your address space, causing a mysterious > failure at some point later in the program." > >> C is just a high level assembly language; > > I disagree with this common claim too. Assembly languages correspond to > well-defined machine models.[1] Those machine models have memory > models. C has no memory model--deliberately, because that would have > gotten in the way of performance. (In practice, C's machine model was > and remains the PDP-11,[2] with aspects thereof progressively sanded off > over the years in repeated efforts to salvage the language's reputation > for portability.) > >> there is no such object as a "string" for example: it's just an "array >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof". > > Yeah, it turns out we need a well-defined string type much more > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated. > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the > end of the 1970s and C aficionados have defended the language's > purported perfection with such vigor that they annexed the haphazardly > assembled standard library into the territory that they defend with much > rhetorical violence and overstatement. From useless or redundant return > values to const-carelessness to Schlemiel the Painter algorithms in > implementations, it seems we've collectively made every mistake that > could be made with Nelson's original, minimal API, and taught those > mistakes as best practices in tutorials and classrooms. A sorry affair. > > So deep was this disdain for the string as a well-defined data type, and > moreover one conceptually distinct from an array (or vector) of integral > types that Stroustrup initially repeated the mistake in C++. People can > easily roll their own, he seemed to have thought. Eventually he thought > again, but C++ took so long to get standardized that by then, damage was > done. > > "A string is just an array of `char`s, and a `char` is just a > byte"--another hasty equivalence that surrendered a priceless hostage to > fortune. This is the sort of fallacy indulged by people excessively > wedded to machine language programming and who apply its perspective to > every problem statement uncritically. > > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow" > characters, and "base" vs. "combining" characters, the champions of the > "portable assembly" paradigm charged like Lord Cardigan into the pike > and musket lines of the character type as one might envision it in a > machine register. (This insistence on visualizing register-level > representations has prompted numerous other stupidities, like the use of > an integral zero at the _language level_ to represent empty, null, or > false literals for as many different data types as possible. "If it > ends up as a zero in a register," the thinking appears to have gone, "it > should look like a zero in the source code." Generations of code--and > language--cowboys have screwed us all over repeatedly with this hasty > equivalence. > > Type theorists have known better for decades. But type theory is (1) > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy > day in the sun (for which we may be grateful), which means that is > seldom on the path one anticipates to a comfortable retirement from a > Silicon Valley tech company (or several) on a private yacht. > > Why do I rant so splenetically about these issues? Because the result > of such confusion is _bugs in programs_. You want something concrete? > There it is. Data types protect you from screwing up. And the better > your data types are, the more care you give to specifying what sorts of > objects your program manipulates, the more thought you give to the > invariants that must be maintained for your program to remain in a > well-defined state, the fewer bugs you will have. > > But, nah, better to slap together a prototype, ship it, talk it up to > the moon as your latest triumph while interviewing with a rival of the > company you just delivered that prototype to, and look on in amusement > when your brilliant achievement either proves disastrous in deployment > or soaks up the waking hours of an entire team of your former colleagues > cleaning up the steaming pile you voided from your rock star bowels. > > We've paid a heavy price for C's slow and seemingly deeply grudging > embrace of the type concept. (The lack of controlled scope for > enumeration constants is one example; the horrifyingly ill-conceived > choice of "typedef" as a keyword indicating _type aliasing_ is another.) > Kernighan did not help by trashing Pascal so hard in about 1980. He was > dead right that Pascal needed, essentially, polymorphic subprograms in > array types. Wirth not speccing the language to accommodate that back > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff > right--stuff that the partisanship of C advocates refused to countenance > such that they ended up celebrating C's flaws as features. No amount of > Jonestown tea could quench their thirst. I suspect the truth was more > that they didn't want to bother having to learn any other languages. > (Or if they did, not any language that anyone else on their team at work > had any facility with.) A rock star plays only one instrument, no? > People didn't like it when Eddie Van Halen played keyboards instead of > guitar on stage, so he stopped doing that. The less your coworkers > understand your work, the more of a genius you must be. > > Now, where was I? > >> What's the length of "abc" vs. how many bytes are needed to store it? > > Even what is meant by "length" has several different correct answers! > Quantity of code points in the sequence? Number of "grapheme clusters" > a.k.a. "user-perceived characters" as Unicode puts it? Width as > represented on the output device? On an ASCII device these usually had > the same answer (control characters excepted). But even at the Bell > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't > necessarily have to. (How wide is an em dash? How many bytes represent > it, in the formatting language and in the output language?) > >> Giggle... In a device driver I wrote for V6, I used the expression >> >> "0123"[n] >> >> and the two programmers whom I thought were better than me had to ask >> me what it did... >> >> -- Dave, brought up on PDP-11 Unix[*] > > I enjoy this application of that technique, courtesy of Alan Cox. > > fsck-fuzix: blow 90 bytes on a progress indicator > > static void progress(void) > { > static uint8_t progct; > progct++; > progct&=3; > printf("%c\010", "-\\|/"[progct]); > fflush(stdout); > } > >> I still remember the days of BOS/PICK/etc, and I staked my career on >> Unix. > > Not a bad choice. Your exposure to and recollection of other ways of > doing things, I suspect, made you a more valuable contributor than those > who mazed themselves with thoughts of "the Unix way" to the point that > they never seriously considered any other. > > It's fine to prefer "the C way" or "the Unix way", if you can > intelligibly define what that means as applied to the issue in dispute, > and coherently defend it. Demonstrating an understanding of the > alternatives, and being able to credibly explain why they are inferior > approaches, is how to do advocacy correctly. > > But it is not the cowboy way. The rock star way. > > Regards, > Branden > > [1] Unfortunately I must concede that this claim is less true than it > used to be thanks to the relentless pursuit of trade-secret means of > optimizing hardware performance. Assembly languages now correspond, > particularly on x86, to a sort of macro language that imperfectly > masks a massive amount of microarchitectural state that the > implementors themselves don't completely understand, at least not in > time to get the product to market. Hence the field day of > speculative execution attacks and similar. It would not be fair to > say that CPUs of old had _no_ microarchitectural state--the Z80, for > example, had the not-completely-official `W` and `Z` registers--but > they did have much less of it, and correspondingly less attack > surface for screwing your programs. I do miss the days of > deterministic cycle counts for instruction execution. But I know > I'd be sad if all the caches on my workaday machine switched off. > > [2] https://queue.acm.org/detail.cfm?id=3212479 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 20:16 ` Bakul Shah via TUHS @ 2024-09-20 20:58 ` Warner Losh 2024-09-20 21:18 ` Rob Pike ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Warner Losh @ 2024-09-20 20:58 UTC (permalink / raw) To: Bakul Shah; +Cc: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 13107 bytes --] On Fri, Sep 20, 2024 at 9:16 PM Bakul Shah via TUHS <tuhs@tuhs.org> wrote: > You are a bit late with your screed. You will find posts > with similar sentiments starting back in 1980s in Usenet > groups such as comp.lang.{c,misc,pascal}. > > Perhaps a more interesting (but likely pointless) question > is what is the *least* that can be done to fix C's major > problems. > > Compilers can easily add bounds checking for the array[index] > construct but ptr[index] can not be checked, unless we make > a ptr a heavy weight object such as (address, start, limit). > One can see how code can be generated for code such as this: > > Foo x[count]; > Foo* p = x + n; // or &x[n] > > Code such as "Foo *p = malloc(size);" would require the > compiler to know how malloc behaves to be able to compute > the limit. But for a user to write a similar function will > require some language extension. > > [Of course, if we did that, adding proper support for > multidimensional slices would be far easier. But that > is an exploration for another day!] > The CHERI architecture extensions do this. It pushes this info into hardware where all pointers point to a region (gross simplification) that also grant you rights the area (including read/write/execute). It's really cool, but it does come at a cost in performance. Each pointer is a pointer, and a capacity that's basically a cryptographically signed bit of data that's the bounds and access permissions associated with the pointer. There's more details on their web site: https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ CHERI-BSD is a FreeBSD variant that runs on both CHERI variants (aarch64 and riscv64) and where most of the research has been done. There's also a Linux variant as well. Members of this project know way too many of the corner cases of the C language from porting most popular software to the CHERI... And have gone on screeds of their own. The only one I can easily find is https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf Warner > Converting enums to behave like Pascal scalars would > likely break things. The question is, can such breakage > be fixed automatically (by source code conversion)? > > C's union type is used in two different ways: 1: similar > to a sum type, which can be done type safely and 2: to > cheat. The compiler should produce a warning when it can't > verify a typesafe use -- one can add "unsafe" or some such > to let the user absolve the compiler of such check. > > [May be naively] I tend to think one can evolve C this way > and fix a lot of code &/or make a lot of bugs more explicit. > > > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson < > g.branden.robinson@gmail.com> wrote: > > > > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote: > >> Unless I'm mistaken (quite possible at my age), the OP was referring > >> to that in C, pointers and arrays are pretty much the same thing i.e. > >> "foo[-2]" means "take the pointer 'foo' and go back two things" > >> (whatever a "thing" is). > > > > "in C, pointers and arrays are pretty much the same thing" is a common > > utterance but misleading, and in my opinion, better replaced with a > > different one. > > > > We should instead say something more like: > > > > In C, pointers and arrays have compatible dereference syntaxes. > > > > They do _not_ have compatible _declaration_ syntaxes. > > > > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_ > > (1994) tackles this issue head-on and at length. > > > > Here's the salient point. > > > > "Consider the case of an external declaration `extern char *p;` but a > > definition of `char p[10];`. When we retrieve the contents of `p[i]` > > using the extern, we get characters, but we treat it as a pointer. > > Interpreting ASCII characters as an address is garbage, and if you're > > lucky the program will coredump at that point. If you're not lucky it > > will corrupt something in your address space, causing a mysterious > > failure at some point later in the program." > > > >> C is just a high level assembly language; > > > > I disagree with this common claim too. Assembly languages correspond to > > well-defined machine models.[1] Those machine models have memory > > models. C has no memory model--deliberately, because that would have > > gotten in the way of performance. (In practice, C's machine model was > > and remains the PDP-11,[2] with aspects thereof progressively sanded off > > over the years in repeated efforts to salvage the language's reputation > > for portability.) > > > >> there is no such object as a "string" for example: it's just an "array > >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof". > > > > Yeah, it turns out we need a well-defined string type much more > > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated. > > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the > > end of the 1970s and C aficionados have defended the language's > > purported perfection with such vigor that they annexed the haphazardly > > assembled standard library into the territory that they defend with much > > rhetorical violence and overstatement. From useless or redundant return > > values to const-carelessness to Schlemiel the Painter algorithms in > > implementations, it seems we've collectively made every mistake that > > could be made with Nelson's original, minimal API, and taught those > > mistakes as best practices in tutorials and classrooms. A sorry affair. > > > > So deep was this disdain for the string as a well-defined data type, and > > moreover one conceptually distinct from an array (or vector) of integral > > types that Stroustrup initially repeated the mistake in C++. People can > > easily roll their own, he seemed to have thought. Eventually he thought > > again, but C++ took so long to get standardized that by then, damage was > > done. > > > > "A string is just an array of `char`s, and a `char` is just a > > byte"--another hasty equivalence that surrendered a priceless hostage to > > fortune. This is the sort of fallacy indulged by people excessively > > wedded to machine language programming and who apply its perspective to > > every problem statement uncritically. > > > > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow" > > characters, and "base" vs. "combining" characters, the champions of the > > "portable assembly" paradigm charged like Lord Cardigan into the pike > > and musket lines of the character type as one might envision it in a > > machine register. (This insistence on visualizing register-level > > representations has prompted numerous other stupidities, like the use of > > an integral zero at the _language level_ to represent empty, null, or > > false literals for as many different data types as possible. "If it > > ends up as a zero in a register," the thinking appears to have gone, "it > > should look like a zero in the source code." Generations of code--and > > language--cowboys have screwed us all over repeatedly with this hasty > > equivalence. > > > > Type theorists have known better for decades. But type theory is (1) > > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy > > day in the sun (for which we may be grateful), which means that is > > seldom on the path one anticipates to a comfortable retirement from a > > Silicon Valley tech company (or several) on a private yacht. > > > > Why do I rant so splenetically about these issues? Because the result > > of such confusion is _bugs in programs_. You want something concrete? > > There it is. Data types protect you from screwing up. And the better > > your data types are, the more care you give to specifying what sorts of > > objects your program manipulates, the more thought you give to the > > invariants that must be maintained for your program to remain in a > > well-defined state, the fewer bugs you will have. > > > > But, nah, better to slap together a prototype, ship it, talk it up to > > the moon as your latest triumph while interviewing with a rival of the > > company you just delivered that prototype to, and look on in amusement > > when your brilliant achievement either proves disastrous in deployment > > or soaks up the waking hours of an entire team of your former colleagues > > cleaning up the steaming pile you voided from your rock star bowels. > > > > We've paid a heavy price for C's slow and seemingly deeply grudging > > embrace of the type concept. (The lack of controlled scope for > > enumeration constants is one example; the horrifyingly ill-conceived > > choice of "typedef" as a keyword indicating _type aliasing_ is another.) > > Kernighan did not help by trashing Pascal so hard in about 1980. He was > > dead right that Pascal needed, essentially, polymorphic subprograms in > > array types. Wirth not speccing the language to accommodate that back > > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff > > right--stuff that the partisanship of C advocates refused to countenance > > such that they ended up celebrating C's flaws as features. No amount of > > Jonestown tea could quench their thirst. I suspect the truth was more > > that they didn't want to bother having to learn any other languages. > > (Or if they did, not any language that anyone else on their team at work > > had any facility with.) A rock star plays only one instrument, no? > > People didn't like it when Eddie Van Halen played keyboards instead of > > guitar on stage, so he stopped doing that. The less your coworkers > > understand your work, the more of a genius you must be. > > > > Now, where was I? > > > >> What's the length of "abc" vs. how many bytes are needed to store it? > > > > Even what is meant by "length" has several different correct answers! > > Quantity of code points in the sequence? Number of "grapheme clusters" > > a.k.a. "user-perceived characters" as Unicode puts it? Width as > > represented on the output device? On an ASCII device these usually had > > the same answer (control characters excepted). But even at the Bell > > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't > > necessarily have to. (How wide is an em dash? How many bytes represent > > it, in the formatting language and in the output language?) > > > >> Giggle... In a device driver I wrote for V6, I used the expression > >> > >> "0123"[n] > >> > >> and the two programmers whom I thought were better than me had to ask > >> me what it did... > >> > >> -- Dave, brought up on PDP-11 Unix[*] > > > > I enjoy this application of that technique, courtesy of Alan Cox. > > > > fsck-fuzix: blow 90 bytes on a progress indicator > > > > static void progress(void) > > { > > static uint8_t progct; > > progct++; > > progct&=3; > > printf("%c\010", "-\\|/"[progct]); > > fflush(stdout); > > } > > > >> I still remember the days of BOS/PICK/etc, and I staked my career on > >> Unix. > > > > Not a bad choice. Your exposure to and recollection of other ways of > > doing things, I suspect, made you a more valuable contributor than those > > who mazed themselves with thoughts of "the Unix way" to the point that > > they never seriously considered any other. > > > > It's fine to prefer "the C way" or "the Unix way", if you can > > intelligibly define what that means as applied to the issue in dispute, > > and coherently defend it. Demonstrating an understanding of the > > alternatives, and being able to credibly explain why they are inferior > > approaches, is how to do advocacy correctly. > > > > But it is not the cowboy way. The rock star way. > > > > Regards, > > Branden > > > > [1] Unfortunately I must concede that this claim is less true than it > > used to be thanks to the relentless pursuit of trade-secret means of > > optimizing hardware performance. Assembly languages now correspond, > > particularly on x86, to a sort of macro language that imperfectly > > masks a massive amount of microarchitectural state that the > > implementors themselves don't completely understand, at least not in > > time to get the product to market. Hence the field day of > > speculative execution attacks and similar. It would not be fair to > > say that CPUs of old had _no_ microarchitectural state--the Z80, for > > example, had the not-completely-official `W` and `Z` registers--but > > they did have much less of it, and correspondingly less attack > > surface for screwing your programs. I do miss the days of > > deterministic cycle counts for instruction execution. But I know > > I'd be sad if all the caches on my workaday machine switched off. > > > > [2] https://queue.acm.org/detail.cfm?id=3212479 > > [-- Attachment #2: Type: text/html, Size: 15731 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 20:58 ` Warner Losh @ 2024-09-20 21:18 ` Rob Pike 2024-09-20 22:04 ` Bakul Shah via TUHS 2024-09-20 22:19 ` G. Branden Robinson 2 siblings, 0 replies; 17+ messages in thread From: Rob Pike @ 2024-09-20 21:18 UTC (permalink / raw) To: Warner Losh; +Cc: Bakul Shah, The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 13740 bytes --] Here is some code from typo. int table[2]; /*keep these four cards in order*/ int tab1[26]; int tab2[730]; char tab3[19684]; ... er = read(salt,table,21200); Note the use of the word 'card'. The past is a different country. -rob On Sat, Sep 21, 2024 at 7:07 AM Warner Losh <imp@bsdimp.com> wrote: > > > On Fri, Sep 20, 2024 at 9:16 PM Bakul Shah via TUHS <tuhs@tuhs.org> wrote: > >> You are a bit late with your screed. You will find posts >> with similar sentiments starting back in 1980s in Usenet >> groups such as comp.lang.{c,misc,pascal}. >> >> Perhaps a more interesting (but likely pointless) question >> is what is the *least* that can be done to fix C's major >> problems. >> >> Compilers can easily add bounds checking for the array[index] >> construct but ptr[index] can not be checked, unless we make >> a ptr a heavy weight object such as (address, start, limit). >> One can see how code can be generated for code such as this: >> >> Foo x[count]; >> Foo* p = x + n; // or &x[n] >> >> Code such as "Foo *p = malloc(size);" would require the >> compiler to know how malloc behaves to be able to compute >> the limit. But for a user to write a similar function will >> require some language extension. >> >> [Of course, if we did that, adding proper support for >> multidimensional slices would be far easier. But that >> is an exploration for another day!] >> > > The CHERI architecture extensions do this. It pushes this info into > hardware > where all pointers point to a region (gross simplification) that also > grant you > rights the area (including read/write/execute). It's really cool, but it > does come > at a cost in performance. Each pointer is a pointer, and a capacity that's > basically > a cryptographically signed bit of data that's the bounds and access > permissions > associated with the pointer. There's more details on their web site: > https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ > > CHERI-BSD is a FreeBSD variant that runs on both CHERI variants (aarch64 > and > riscv64) and where most of the research has been done. There's also a > Linux > variant as well. > > Members of this project know way too many of the corner cases of the C > language > from porting most popular software to the CHERI... And have gone on > screeds of > their own. The only one I can easily find is > > https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf > > Warner > > >> Converting enums to behave like Pascal scalars would >> likely break things. The question is, can such breakage >> be fixed automatically (by source code conversion)? >> >> C's union type is used in two different ways: 1: similar >> to a sum type, which can be done type safely and 2: to >> cheat. The compiler should produce a warning when it can't >> verify a typesafe use -- one can add "unsafe" or some such >> to let the user absolve the compiler of such check. >> >> [May be naively] I tend to think one can evolve C this way >> and fix a lot of code &/or make a lot of bugs more explicit. >> >> > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson < >> g.branden.robinson@gmail.com> wrote: >> > >> > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote: >> >> Unless I'm mistaken (quite possible at my age), the OP was referring >> >> to that in C, pointers and arrays are pretty much the same thing i.e. >> >> "foo[-2]" means "take the pointer 'foo' and go back two things" >> >> (whatever a "thing" is). >> > >> > "in C, pointers and arrays are pretty much the same thing" is a common >> > utterance but misleading, and in my opinion, better replaced with a >> > different one. >> > >> > We should instead say something more like: >> > >> > In C, pointers and arrays have compatible dereference syntaxes. >> > >> > They do _not_ have compatible _declaration_ syntaxes. >> > >> > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_ >> > (1994) tackles this issue head-on and at length. >> > >> > Here's the salient point. >> > >> > "Consider the case of an external declaration `extern char *p;` but a >> > definition of `char p[10];`. When we retrieve the contents of `p[i]` >> > using the extern, we get characters, but we treat it as a pointer. >> > Interpreting ASCII characters as an address is garbage, and if you're >> > lucky the program will coredump at that point. If you're not lucky it >> > will corrupt something in your address space, causing a mysterious >> > failure at some point later in the program." >> > >> >> C is just a high level assembly language; >> > >> > I disagree with this common claim too. Assembly languages correspond to >> > well-defined machine models.[1] Those machine models have memory >> > models. C has no memory model--deliberately, because that would have >> > gotten in the way of performance. (In practice, C's machine model was >> > and remains the PDP-11,[2] with aspects thereof progressively sanded off >> > over the years in repeated efforts to salvage the language's reputation >> > for portability.) >> > >> >> there is no such object as a "string" for example: it's just an "array >> >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof". >> > >> > Yeah, it turns out we need a well-defined string type much more >> > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated. >> > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the >> > end of the 1970s and C aficionados have defended the language's >> > purported perfection with such vigor that they annexed the haphazardly >> > assembled standard library into the territory that they defend with much >> > rhetorical violence and overstatement. From useless or redundant return >> > values to const-carelessness to Schlemiel the Painter algorithms in >> > implementations, it seems we've collectively made every mistake that >> > could be made with Nelson's original, minimal API, and taught those >> > mistakes as best practices in tutorials and classrooms. A sorry affair. >> > >> > So deep was this disdain for the string as a well-defined data type, and >> > moreover one conceptually distinct from an array (or vector) of integral >> > types that Stroustrup initially repeated the mistake in C++. People can >> > easily roll their own, he seemed to have thought. Eventually he thought >> > again, but C++ took so long to get standardized that by then, damage was >> > done. >> > >> > "A string is just an array of `char`s, and a `char` is just a >> > byte"--another hasty equivalence that surrendered a priceless hostage to >> > fortune. This is the sort of fallacy indulged by people excessively >> > wedded to machine language programming and who apply its perspective to >> > every problem statement uncritically. >> > >> > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow" >> > characters, and "base" vs. "combining" characters, the champions of the >> > "portable assembly" paradigm charged like Lord Cardigan into the pike >> > and musket lines of the character type as one might envision it in a >> > machine register. (This insistence on visualizing register-level >> > representations has prompted numerous other stupidities, like the use of >> > an integral zero at the _language level_ to represent empty, null, or >> > false literals for as many different data types as possible. "If it >> > ends up as a zero in a register," the thinking appears to have gone, "it >> > should look like a zero in the source code." Generations of code--and >> > language--cowboys have screwed us all over repeatedly with this hasty >> > equivalence. >> > >> > Type theorists have known better for decades. But type theory is (1) >> > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy >> > day in the sun (for which we may be grateful), which means that is >> > seldom on the path one anticipates to a comfortable retirement from a >> > Silicon Valley tech company (or several) on a private yacht. >> > >> > Why do I rant so splenetically about these issues? Because the result >> > of such confusion is _bugs in programs_. You want something concrete? >> > There it is. Data types protect you from screwing up. And the better >> > your data types are, the more care you give to specifying what sorts of >> > objects your program manipulates, the more thought you give to the >> > invariants that must be maintained for your program to remain in a >> > well-defined state, the fewer bugs you will have. >> > >> > But, nah, better to slap together a prototype, ship it, talk it up to >> > the moon as your latest triumph while interviewing with a rival of the >> > company you just delivered that prototype to, and look on in amusement >> > when your brilliant achievement either proves disastrous in deployment >> > or soaks up the waking hours of an entire team of your former colleagues >> > cleaning up the steaming pile you voided from your rock star bowels. >> > >> > We've paid a heavy price for C's slow and seemingly deeply grudging >> > embrace of the type concept. (The lack of controlled scope for >> > enumeration constants is one example; the horrifyingly ill-conceived >> > choice of "typedef" as a keyword indicating _type aliasing_ is another.) >> > Kernighan did not help by trashing Pascal so hard in about 1980. He was >> > dead right that Pascal needed, essentially, polymorphic subprograms in >> > array types. Wirth not speccing the language to accommodate that back >> > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff >> > right--stuff that the partisanship of C advocates refused to countenance >> > such that they ended up celebrating C's flaws as features. No amount of >> > Jonestown tea could quench their thirst. I suspect the truth was more >> > that they didn't want to bother having to learn any other languages. >> > (Or if they did, not any language that anyone else on their team at work >> > had any facility with.) A rock star plays only one instrument, no? >> > People didn't like it when Eddie Van Halen played keyboards instead of >> > guitar on stage, so he stopped doing that. The less your coworkers >> > understand your work, the more of a genius you must be. >> > >> > Now, where was I? >> > >> >> What's the length of "abc" vs. how many bytes are needed to store it? >> > >> > Even what is meant by "length" has several different correct answers! >> > Quantity of code points in the sequence? Number of "grapheme clusters" >> > a.k.a. "user-perceived characters" as Unicode puts it? Width as >> > represented on the output device? On an ASCII device these usually had >> > the same answer (control characters excepted). But even at the Bell >> > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't >> > necessarily have to. (How wide is an em dash? How many bytes represent >> > it, in the formatting language and in the output language?) >> > >> >> Giggle... In a device driver I wrote for V6, I used the expression >> >> >> >> "0123"[n] >> >> >> >> and the two programmers whom I thought were better than me had to ask >> >> me what it did... >> >> >> >> -- Dave, brought up on PDP-11 Unix[*] >> > >> > I enjoy this application of that technique, courtesy of Alan Cox. >> > >> > fsck-fuzix: blow 90 bytes on a progress indicator >> > >> > static void progress(void) >> > { >> > static uint8_t progct; >> > progct++; >> > progct&=3; >> > printf("%c\010", "-\\|/"[progct]); >> > fflush(stdout); >> > } >> > >> >> I still remember the days of BOS/PICK/etc, and I staked my career on >> >> Unix. >> > >> > Not a bad choice. Your exposure to and recollection of other ways of >> > doing things, I suspect, made you a more valuable contributor than those >> > who mazed themselves with thoughts of "the Unix way" to the point that >> > they never seriously considered any other. >> > >> > It's fine to prefer "the C way" or "the Unix way", if you can >> > intelligibly define what that means as applied to the issue in dispute, >> > and coherently defend it. Demonstrating an understanding of the >> > alternatives, and being able to credibly explain why they are inferior >> > approaches, is how to do advocacy correctly. >> > >> > But it is not the cowboy way. The rock star way. >> > >> > Regards, >> > Branden >> > >> > [1] Unfortunately I must concede that this claim is less true than it >> > used to be thanks to the relentless pursuit of trade-secret means of >> > optimizing hardware performance. Assembly languages now correspond, >> > particularly on x86, to a sort of macro language that imperfectly >> > masks a massive amount of microarchitectural state that the >> > implementors themselves don't completely understand, at least not in >> > time to get the product to market. Hence the field day of >> > speculative execution attacks and similar. It would not be fair to >> > say that CPUs of old had _no_ microarchitectural state--the Z80, for >> > example, had the not-completely-official `W` and `Z` registers--but >> > they did have much less of it, and correspondingly less attack >> > surface for screwing your programs. I do miss the days of >> > deterministic cycle counts for instruction execution. But I know >> > I'd be sad if all the caches on my workaday machine switched off. >> > >> > [2] https://queue.acm.org/detail.cfm?id=3212479 >> >> [-- Attachment #2: Type: text/html, Size: 19465 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 20:58 ` Warner Losh 2024-09-20 21:18 ` Rob Pike @ 2024-09-20 22:04 ` Bakul Shah via TUHS 2024-09-20 22:19 ` G. Branden Robinson 2 siblings, 0 replies; 17+ messages in thread From: Bakul Shah via TUHS @ 2024-09-20 22:04 UTC (permalink / raw) To: Warner Losh; +Cc: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 14662 bytes --] > On Sep 20, 2024, at 1:58 PM, Warner Losh <imp@bsdimp.com> wrote: > > > > On Fri, Sep 20, 2024 at 9:16 PM Bakul Shah via TUHS <tuhs@tuhs.org <mailto:tuhs@tuhs.org>> wrote: >> You are a bit late with your screed. You will find posts >> with similar sentiments starting back in 1980s in Usenet >> groups such as comp.lang.{c,misc,pascal}. >> >> Perhaps a more interesting (but likely pointless) question >> is what is the *least* that can be done to fix C's major >> problems. >> >> Compilers can easily add bounds checking for the array[index] >> construct but ptr[index] can not be checked, unless we make >> a ptr a heavy weight object such as (address, start, limit). >> One can see how code can be generated for code such as this: >> >> Foo x[count]; >> Foo* p = x + n; // or &x[n] >> >> Code such as "Foo *p = malloc(size);" would require the >> compiler to know how malloc behaves to be able to compute >> the limit. But for a user to write a similar function will >> require some language extension. >> >> [Of course, if we did that, adding proper support for >> multidimensional slices would be far easier. But that >> is an exploration for another day!] > > The CHERI architecture extensions do this. It pushes this info into hardware > where all pointers point to a region (gross simplification) that also grant you > rights the area (including read/write/execute). It's really cool, but it does come > at a cost in performance. Each pointer is a pointer, and a capacity that's basically > a cryptographically signed bit of data that's the bounds and access permissions > associated with the pointer. There's more details on their web site: > https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ Capabilities are heavier weight and perhaps an overkill to use as pointers. And that doesn't help programs on normal processors. I view a capability architecture better suited for microkernels -- a cap call would be akin to a syscall + upcall to a server running in user code. For example "read(file-cap, buffer-cap, size)" would need to be delivered to a fileserver process etc. Basically a cap. is ptr *across* a protection domain. We want type safe (including bound checking) within a protection domain (a process). A compiler can often elide bounds checks or push them out of a loop. Similarly for other smaller changes. The idea is to try to "fix" C with as little rewriting as possible. Nobody is going to fund writing rewirtng all 10M lines of kernel code in C (& more in user code) into Rust (not to mention such from scratch rewrites usually result in incompatibilities). But we still seem to want maximum performance and maximum security without paying for it (and if pushed, we live with bugs but not lower performance even if processors are orders of magniture faster now). > > CHERI-BSD is a FreeBSD variant that runs on both CHERI variants (aarch64 and > riscv64) and where most of the research has been done. There's also a Linux > variant as well. > > Members of this project know way too many of the corner cases of the C language > from porting most popular software to the CHERI... And have gone on screeds of > their own. The only one I can easily find is > https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf > > Warner > >> Converting enums to behave like Pascal scalars would >> likely break things. The question is, can such breakage >> be fixed automatically (by source code conversion)? >> >> C's union type is used in two different ways: 1: similar >> to a sum type, which can be done type safely and 2: to >> cheat. The compiler should produce a warning when it can't >> verify a typesafe use -- one can add "unsafe" or some such >> to let the user absolve the compiler of such check. >> >> [May be naively] I tend to think one can evolve C this way >> and fix a lot of code &/or make a lot of bugs more explicit. >> >> > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson <g.branden.robinson@gmail.com <mailto:g.branden.robinson@gmail.com>> wrote: >> > >> > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote: >> >> Unless I'm mistaken (quite possible at my age), the OP was referring >> >> to that in C, pointers and arrays are pretty much the same thing i.e. >> >> "foo[-2]" means "take the pointer 'foo' and go back two things" >> >> (whatever a "thing" is). >> > >> > "in C, pointers and arrays are pretty much the same thing" is a common >> > utterance but misleading, and in my opinion, better replaced with a >> > different one. >> > >> > We should instead say something more like: >> > >> > In C, pointers and arrays have compatible dereference syntaxes. >> > >> > They do _not_ have compatible _declaration_ syntaxes. >> > >> > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_ >> > (1994) tackles this issue head-on and at length. >> > >> > Here's the salient point. >> > >> > "Consider the case of an external declaration `extern char *p;` but a >> > definition of `char p[10];`. When we retrieve the contents of `p[i]` >> > using the extern, we get characters, but we treat it as a pointer. >> > Interpreting ASCII characters as an address is garbage, and if you're >> > lucky the program will coredump at that point. If you're not lucky it >> > will corrupt something in your address space, causing a mysterious >> > failure at some point later in the program." >> > >> >> C is just a high level assembly language; >> > >> > I disagree with this common claim too. Assembly languages correspond to >> > well-defined machine models.[1] Those machine models have memory >> > models. C has no memory model--deliberately, because that would have >> > gotten in the way of performance. (In practice, C's machine model was >> > and remains the PDP-11,[2] with aspects thereof progressively sanded off >> > over the years in repeated efforts to salvage the language's reputation >> > for portability.) >> > >> >> there is no such object as a "string" for example: it's just an "array >> >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof". >> > >> > Yeah, it turns out we need a well-defined string type much more >> > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated. >> > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the >> > end of the 1970s and C aficionados have defended the language's >> > purported perfection with such vigor that they annexed the haphazardly >> > assembled standard library into the territory that they defend with much >> > rhetorical violence and overstatement. From useless or redundant return >> > values to const-carelessness to Schlemiel the Painter algorithms in >> > implementations, it seems we've collectively made every mistake that >> > could be made with Nelson's original, minimal API, and taught those >> > mistakes as best practices in tutorials and classrooms. A sorry affair. >> > >> > So deep was this disdain for the string as a well-defined data type, and >> > moreover one conceptually distinct from an array (or vector) of integral >> > types that Stroustrup initially repeated the mistake in C++. People can >> > easily roll their own, he seemed to have thought. Eventually he thought >> > again, but C++ took so long to get standardized that by then, damage was >> > done. >> > >> > "A string is just an array of `char`s, and a `char` is just a >> > byte"--another hasty equivalence that surrendered a priceless hostage to >> > fortune. This is the sort of fallacy indulged by people excessively >> > wedded to machine language programming and who apply its perspective to >> > every problem statement uncritically. >> > >> > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow" >> > characters, and "base" vs. "combining" characters, the champions of the >> > "portable assembly" paradigm charged like Lord Cardigan into the pike >> > and musket lines of the character type as one might envision it in a >> > machine register. (This insistence on visualizing register-level >> > representations has prompted numerous other stupidities, like the use of >> > an integral zero at the _language level_ to represent empty, null, or >> > false literals for as many different data types as possible. "If it >> > ends up as a zero in a register," the thinking appears to have gone, "it >> > should look like a zero in the source code." Generations of code--and >> > language--cowboys have screwed us all over repeatedly with this hasty >> > equivalence. >> > >> > Type theorists have known better for decades. But type theory is (1) >> > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy >> > day in the sun (for which we may be grateful), which means that is >> > seldom on the path one anticipates to a comfortable retirement from a >> > Silicon Valley tech company (or several) on a private yacht. >> > >> > Why do I rant so splenetically about these issues? Because the result >> > of such confusion is _bugs in programs_. You want something concrete? >> > There it is. Data types protect you from screwing up. And the better >> > your data types are, the more care you give to specifying what sorts of >> > objects your program manipulates, the more thought you give to the >> > invariants that must be maintained for your program to remain in a >> > well-defined state, the fewer bugs you will have. >> > >> > But, nah, better to slap together a prototype, ship it, talk it up to >> > the moon as your latest triumph while interviewing with a rival of the >> > company you just delivered that prototype to, and look on in amusement >> > when your brilliant achievement either proves disastrous in deployment >> > or soaks up the waking hours of an entire team of your former colleagues >> > cleaning up the steaming pile you voided from your rock star bowels. >> > >> > We've paid a heavy price for C's slow and seemingly deeply grudging >> > embrace of the type concept. (The lack of controlled scope for >> > enumeration constants is one example; the horrifyingly ill-conceived >> > choice of "typedef" as a keyword indicating _type aliasing_ is another.) >> > Kernighan did not help by trashing Pascal so hard in about 1980. He was >> > dead right that Pascal needed, essentially, polymorphic subprograms in >> > array types. Wirth not speccing the language to accommodate that back >> > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff >> > right--stuff that the partisanship of C advocates refused to countenance >> > such that they ended up celebrating C's flaws as features. No amount of >> > Jonestown tea could quench their thirst. I suspect the truth was more >> > that they didn't want to bother having to learn any other languages. >> > (Or if they did, not any language that anyone else on their team at work >> > had any facility with.) A rock star plays only one instrument, no? >> > People didn't like it when Eddie Van Halen played keyboards instead of >> > guitar on stage, so he stopped doing that. The less your coworkers >> > understand your work, the more of a genius you must be. >> > >> > Now, where was I? >> > >> >> What's the length of "abc" vs. how many bytes are needed to store it? >> > >> > Even what is meant by "length" has several different correct answers! >> > Quantity of code points in the sequence? Number of "grapheme clusters" >> > a.k.a. "user-perceived characters" as Unicode puts it? Width as >> > represented on the output device? On an ASCII device these usually had >> > the same answer (control characters excepted). But even at the Bell >> > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't >> > necessarily have to. (How wide is an em dash? How many bytes represent >> > it, in the formatting language and in the output language?) >> > >> >> Giggle... In a device driver I wrote for V6, I used the expression >> >> >> >> "0123"[n] >> >> >> >> and the two programmers whom I thought were better than me had to ask >> >> me what it did... >> >> >> >> -- Dave, brought up on PDP-11 Unix[*] >> > >> > I enjoy this application of that technique, courtesy of Alan Cox. >> > >> > fsck-fuzix: blow 90 bytes on a progress indicator >> > >> > static void progress(void) >> > { >> > static uint8_t progct; >> > progct++; >> > progct&=3; >> > printf("%c\010", "-\\|/"[progct]); >> > fflush(stdout); >> > } >> > >> >> I still remember the days of BOS/PICK/etc, and I staked my career on >> >> Unix. >> > >> > Not a bad choice. Your exposure to and recollection of other ways of >> > doing things, I suspect, made you a more valuable contributor than those >> > who mazed themselves with thoughts of "the Unix way" to the point that >> > they never seriously considered any other. >> > >> > It's fine to prefer "the C way" or "the Unix way", if you can >> > intelligibly define what that means as applied to the issue in dispute, >> > and coherently defend it. Demonstrating an understanding of the >> > alternatives, and being able to credibly explain why they are inferior >> > approaches, is how to do advocacy correctly. >> > >> > But it is not the cowboy way. The rock star way. >> > >> > Regards, >> > Branden >> > >> > [1] Unfortunately I must concede that this claim is less true than it >> > used to be thanks to the relentless pursuit of trade-secret means of >> > optimizing hardware performance. Assembly languages now correspond, >> > particularly on x86, to a sort of macro language that imperfectly >> > masks a massive amount of microarchitectural state that the >> > implementors themselves don't completely understand, at least not in >> > time to get the product to market. Hence the field day of >> > speculative execution attacks and similar. It would not be fair to >> > say that CPUs of old had _no_ microarchitectural state--the Z80, for >> > example, had the not-completely-official `W` and `Z` registers--but >> > they did have much less of it, and correspondingly less attack >> > surface for screwing your programs. I do miss the days of >> > deterministic cycle counts for instruction execution. But I know >> > I'd be sad if all the caches on my workaday machine switched off. >> > >> > [2] https://queue.acm.org/detail.cfm?id=3212479 >> [-- Attachment #2: Type: text/html, Size: 17522 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 20:58 ` Warner Losh 2024-09-20 21:18 ` Rob Pike 2024-09-20 22:04 ` Bakul Shah via TUHS @ 2024-09-20 22:19 ` G. Branden Robinson 2 siblings, 0 replies; 17+ messages in thread From: G. Branden Robinson @ 2024-09-20 22:19 UTC (permalink / raw) To: The Eunuchs Hysterical Society [-- Attachment #1: Type: text/plain, Size: 10518 bytes --] Hi Bakul & Warner, At 2024-09-20T13:16:24-0700, Bakul Shah wrote: > You are a bit late with your screed. ...I had hoped that my awareness of that was made evident by my citation of a 30-year-old book. ;-) > You will find posts with similar sentiments starting back in 1980s in > Usenet groups such as comp.lang.{c,misc,pascal}. Before my time, but I don't doubt it. The sad thing is that not enough people took such posts seriously. I spend a fair amount of time dealing with "legacy" code. Stuff that hasn't been touched in a long time. One thing I'm convinced of: bad idioms are forever. And that means people will keep learning and copying them. Of course no one wants to pay for the cleanup of such technical debt, not in spite of but _because_ it will expose bugs. You can't justify to any manager that we need to set up this one cost center so that we can expand another one. Not unless the manager cares about downside risk. And tech culture absolutely does not. Let the planes fall out of the sky and the reactors melt down. You can justify it all in the name of "ethical altruism", or whatever the trendy label for sociopathic anarcho- capitalism is these days. (I'm kidding, of course. Serious tech bros understand the essential function of government in maintaining structures for the allocation of economic rents [copyrights and patents] and the utility of employment law, police, and if it comes to it, the National Guard in the suppression of organized labor. Fortunately for management, software engineers think so highly of themselves that they identify with the billionaire CEO's economic class instead of their own.) > Perhaps a more interesting (but likely pointless) question is what is > the *least* that can be done to fix C's major problems. Not pointless. If we ask ourselves that question after every revision of the language standard, the language _will_ advance. C23 has a `nullptr` constant. K&R-style function declarations are gone, and good riddance. I did notice that some national bodies fought like hell to keep trigraphs, though. :-| > Compilers can easily add bounds checking for the array[index] Pascal expected this. One of Kernighan's complaints in his CSTR #100 paper (the one I mentioned) is that he feared precious machine cycles would be lost validating expressions that pointed within valid bounds. So why not a compiler switch, jeez louise? Develop in paranoid/slow mode and ship in sloppy/fast mode. If you must. It seems that static analysis was in its infancy back then. Compiler writers screeched like banshees at the forms of validation the Ada language spec required them to do, and complained so vociferously that they helped trash the language's reputation. A few years went by and, gosh, folks realized that you sure could prevent a lot of bugs by wiring such checks into compilers for other languages--in the places where the semantics would permit it, a count that was invariably lower than Ada's because, shock, Ada was actually thought out and went through several revisions _before_ being put into production. Did anyone ever repent of their loathsome shrieking? Doubt it. Static analysis became cool and they accepted whatever plaudits fell upon them. > construct but ptr[index] can not be checked, unless we make > a ptr a heavy weight object such as (address, start, limit). Base and bounds registers are an old idea. Older than C. But the PDP-11 didn't have them,[1] so C expected to do without and the rest is lamentable history. We would do well to learn from C++'s multiple attempts at "smart pointers". I guess they've got it right in C++11, at last? Not sure. C++'s aggressive promiscuity has not done C a favor, but rather conditioned the latter into reflexive, instead of reasoned, conservatism. > One can see how code can be generated for code such as this: > > Foo x[count]; > Foo* p = x + n; // or &x[n] > > Code such as "Foo *p = malloc(size);" would require the compiler to > know how malloc behaves to be able to compute the limit. C's refusal to specify dynamic memory allocation in the language runtime (as opposed to, eventually, the standard library) was a painful oversight. There was a strange tension between that and code idioms among C's own earliest practitioners to assume dynamically sized storage. I remember when novice C programmers managing strings would get ridiculed by their seniors for setting up and writing to static buffers. Why did they do that? Because it was easy--the language supported it well. Going to `malloc()` was like aiming a gun at your own face. The routine practice of memory overcommit in C on Unix systems led to a sort of perverse synergy. Programmers were actively conditioned _against_ performing algorithmic analysis of their _space_ requirements. (By contrast, seeing how far you could bro down your code's _time_ complexity was where you really showed your mettle. If you spent all of the time you saved waiting on I/O, hey man, that's not YOUR problem.) > But for a user to write a similar function will require some language > extension. > > [Of course, if we did that, adding proper support for multidimensional > slices would be far easier. But that is an exploration for another > day!] When I read about Fortran 90/95/2003's facilities for array reshaping, I rocked back on my heels. > Converting enums to behave like Pascal scalars would likely break > things. The question is, can such breakage be fixed automatically (by > source code conversion)? I don't assert that C needs to ape _Pascal_ scalars in particular. Better Ada's. :P Or, equivalently, C++11's "enum class". As with many things in C++, the syntax is an ugly graft, but the idea is as sound as they come. One of the proposals that didn't make it for C23 was similarly ugly: "return break;". But the _idea_ was to mark tail recursion so that the compiler would know it's happening. That saves stack. _That's_ worth having. I worry that it didn't make it just because the syntax was so cringey. But the alternatives, like yet another new keyword, or overloading punctuation some more, seemed worse. C++ indulges both vices amply with every revision. > C's union type is used in two different ways: 1: similar to a sum > type, which can be done type safely and 2: to cheat. The compiler > should produce a warning when it can't verify a typesafe use -- one > can add "unsafe" or some such to let the user absolve the compiler of > such check. Agreed. C++'s family of typecasting operators is, once again, an ugly feature syntactically, but the benefits in terms of saying what you mean, and _only_ what you mean, are valuable. Casts in C are too often an express ticket to UB. > [May be naively] I tend to think one can evolve C this way and fix a > lot of code &/or make a lot of bugs more explicit. If that be naïveté, let's have more of it. At 2024-09-20T21:58:26+0100, Warner Losh wrote: > The CHERI architecture extensions do this. It pushes this info into > hardware where all pointers point to a region (gross simplification) > that also grant you rights the area (including read/write/execute). > It's really cool, but it does come at a cost in performance. Each > pointer is a pointer, and a capacity that's basically a > cryptographically signed bit of data that's the bounds and access > permissions associated with the pointer. There's more details on their > web site: > https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ CHERI is absolutely cool and even if it doesn't conquer the world, I feel sure that there is a lot we can learn from it. > CHERI-BSD is a FreeBSD variant that runs on both CHERI variants > (aarch64 and riscv64) and where most of the research has been done. > There's also a Linux variant as well. > > Members of this project know way too many of the corner cases of the C > language from porting most popular software to the CHERI... And have > gone on screeds of their own. The only one I can easily find is > https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf Oh yes. I remember they presented at the LF's Open Source Summit one year (maybe the last year in was in downtown San Francisco, before the LF moved the conference to wine country to scrape off all the engineers and other tedious techy types who might point out what's wrong with somebody's grandiose sales pitch--conferences are for getting deals done [too many vice cops in SF?], not advancing the state of the art). It was a questionnaire along the lines of "what do you _really_ know about C?" and it opened my eyes wide for sure. Apparently it turns out that the Dunning-Kruger effect isn't what we think it is. https://www.scientificamerican.com/article/the-dunning-kruger-effect-isnt-what-you-think-it-is/ Maybe D&K's findings were so rapidly assimilated into the cultural zeitgeist because far too many people are acquainted with highly confident C programmers. While preparing this message, I ran across this: https://csrc.nist.gov/files/pubs/conference/1998/10/08/proceedings-of-the-21st-nissc-1998/final/docs/early-cs-papers/schi75.pdf "The Design and Specification of a Security Kernel for the PDP-11/45", by Schiller (1975). I'll try to read and absorb its 117 pages before burdening this list with any more of my yammerings. Happy weekend! Regards, Branden [1] I think. The PDP-11/20 infamously didn't have memory protection of any sort, and the CSRC wisely ran away from that as fast as they could once they could afford to. (See the preface to the Third Edition Programmer's Manual.) And it was reasonable to not expect support for such things if one wanted portability to embedded systems, but it's not clear to me how seriously the portability of C itself was considered until the first ports were actually _done_, and these were not to embedded systems, but to machines broadly comparable to PDP-11s. London and Reiser's paper on Unix/32V opened my eyes with respect to just how late some portability- impacting changes to "K&R C" were actually made. They sounded many cautionary notes that the community--or maybe it was just compiler writers (banshees again?)--seemed slow to acknowledge. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-20 13:33 ` Paul Winalski 2024-09-20 15:07 ` Dave Horsfall @ 2024-09-20 15:26 ` Rich Salz 1 sibling, 0 replies; 17+ messages in thread From: Rich Salz @ 2024-09-20 15:26 UTC (permalink / raw) To: Paul Winalski; +Cc: Douglas McIlroy, TUHS main list [-- Attachment #1: Type: text/plain, Size: 261 bytes --] > > Unless "foo" were a pointer that the programmer explicitly pointed to the > inside of a larger data structure. > It was that. Go look at the source (I included the link) if you want. This was in the context of a sub-thread about array indices, after all. [-- Attachment #2: Type: text/html, Size: 549 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C 2024-09-19 13:13 ` Rich Salz 2024-09-20 13:33 ` Paul Winalski @ 2024-09-20 19:40 ` Leah Neukirchen 1 sibling, 0 replies; 17+ messages in thread From: Leah Neukirchen @ 2024-09-20 19:40 UTC (permalink / raw) To: Rich Salz; +Cc: Douglas McIlroy, TUHS main list Rich Salz <rich.salz@gmail.com> writes: >> >> if there need to be negative references in array accesses (which certainly >> makes sense to me, on its face), it seems reasonable to have whatever >> intermediate variable be signed. >> > > In my first C programming job I saw the source to V7 grep which had a > "foo[-2]" construct. It was a moment of enlightenment and another bit of > K&R fell into place. ( > https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/grep.c; search > for "[-") Now this thread already derailed into C undefined behavior semantics, but nobody bothered to look at the actual code, which is perfectly fine: if ((c = *sp++) != '*') lastep = ep; switch (c) { ... case '[': ... neg = 0; if((c = *sp++) == '^') { neg = 1; c = *sp++; } cstart = sp; do { ... if (c=='-' && sp>cstart && *sp!=']') { for (c = sp[-2]; c<*sp; c++) ep[c>>3] |= bittab[c&07]; sp++; } ep[c>>3] |= bittab[c&07]; } while((c = *sp++) != ']'); Since sp has been incremented twice already, accessing sp[-2] is fine in any case, but it's also guarded by cstart, so the regexp range "[-z]" doesn't expand to [[-z]. -- Leah Neukirchen <leah@vuxu.org> https://leahneukirchen.org/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
@ 2024-09-20 15:24 Douglas McIlroy
0 siblings, 0 replies; 17+ messages in thread
From: Douglas McIlroy @ 2024-09-20 15:24 UTC (permalink / raw)
To: TUHS main list
[-- Attachment #1: Type: text/plain, Size: 831 bytes --]
Apropos of the virtue of negative subscripts.
> by pointing into the middle of another data structure you've created a
data aliasing situation
Not if all references are relative to the offset pointer. The following
example is silly, but I recently used exactly this trick to simplify
calculations on a first-quadrant (x,y) grid by supplying "out-of-bounds"
zeroes at (x,-2), (x,-1) and (-1,y). It was much cleaner to access the
zeroes like ordinary elements than to provide special code to handle the
boundary conditions.
/* Fill an N-element array with Fibonacci numbers, f(i), where 0<=i<N.
The recursion accesses a zero "out of bounds" at i=-1 */
const int N = 100;
int base[N+1];
#define fib(i) base[(i)+1]
void fill() {
int i;
fib(0) = 1;
for(i=1; i<N; i++)
fib(i) = fib(i-2) + fib(i-1);
}
Doug
[-- Attachment #2: Type: text/html, Size: 1204 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-09-20 22:19 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-09-18 23:51 [TUHS] Re: Maximum Array Sizes in 16 bit C Douglas McIlroy 2024-09-18 23:57 ` Henry Bent 2024-09-19 13:13 ` Rich Salz 2024-09-20 13:33 ` Paul Winalski 2024-09-20 15:07 ` Dave Horsfall 2024-09-20 15:30 ` Larry McVoy 2024-09-20 15:56 ` Stuff Received 2024-09-20 16:14 ` Dan Cross 2024-09-20 17:11 ` G. Branden Robinson 2024-09-20 20:16 ` Bakul Shah via TUHS 2024-09-20 20:58 ` Warner Losh 2024-09-20 21:18 ` Rob Pike 2024-09-20 22:04 ` Bakul Shah via TUHS 2024-09-20 22:19 ` G. Branden Robinson 2024-09-20 15:26 ` Rich Salz 2024-09-20 19:40 ` Leah Neukirchen 2024-09-20 15:24 Douglas McIlroy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).