* [TUHS] Re: Maximum Array Sizes in 16 bit C
@ 2024-09-20 15:24 Douglas McIlroy
0 siblings, 0 replies; 17+ messages in thread
From: Douglas McIlroy @ 2024-09-20 15:24 UTC (permalink / raw)
To: TUHS main list
[-- Attachment #1: Type: text/plain, Size: 831 bytes --]
Apropos of the virtue of negative subscripts.
> by pointing into the middle of another data structure you've created a
data aliasing situation
Not if all references are relative to the offset pointer. The following
example is silly, but I recently used exactly this trick to simplify
calculations on a first-quadrant (x,y) grid by supplying "out-of-bounds"
zeroes at (x,-2), (x,-1) and (-1,y). It was much cleaner to access the
zeroes like ordinary elements than to provide special code to handle the
boundary conditions.
/* Fill an N-element array with Fibonacci numbers, f(i), where 0<=i<N.
The recursion accesses a zero "out of bounds" at i=-1 */
const int N = 100;
int base[N+1];
#define fib(i) base[(i)+1]
void fill() {
int i;
fib(0) = 1;
for(i=1; i<N; i++)
fib(i) = fib(i-2) + fib(i-1);
}
Doug
[-- Attachment #2: Type: text/html, Size: 1204 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 20:58 ` Warner Losh
2024-09-20 21:18 ` Rob Pike
2024-09-20 22:04 ` Bakul Shah via TUHS
@ 2024-09-20 22:19 ` G. Branden Robinson
2 siblings, 0 replies; 17+ messages in thread
From: G. Branden Robinson @ 2024-09-20 22:19 UTC (permalink / raw)
To: The Eunuchs Hysterical Society
[-- Attachment #1: Type: text/plain, Size: 10518 bytes --]
Hi Bakul & Warner,
At 2024-09-20T13:16:24-0700, Bakul Shah wrote:
> You are a bit late with your screed.
...I had hoped that my awareness of that was made evident by my citation
of a 30-year-old book. ;-)
> You will find posts with similar sentiments starting back in 1980s in
> Usenet groups such as comp.lang.{c,misc,pascal}.
Before my time, but I don't doubt it. The sad thing is that not enough
people took such posts seriously.
I spend a fair amount of time dealing with "legacy" code. Stuff that
hasn't been touched in a long time. One thing I'm convinced of: bad
idioms are forever. And that means people will keep learning and
copying them.
Of course no one wants to pay for the cleanup of such technical debt,
not in spite of but _because_ it will expose bugs. You can't justify to
any manager that we need to set up this one cost center so that we can
expand another one.
Not unless the manager cares about downside risk. And tech culture
absolutely does not. Let the planes fall out of the sky and the
reactors melt down. You can justify it all in the name of "ethical
altruism", or whatever the trendy label for sociopathic anarcho-
capitalism is these days.
(I'm kidding, of course. Serious tech bros understand the essential
function of government in maintaining structures for the allocation of
economic rents [copyrights and patents] and the utility of employment
law, police, and if it comes to it, the National Guard in the
suppression of organized labor. Fortunately for management, software
engineers think so highly of themselves that they identify with the
billionaire CEO's economic class instead of their own.)
> Perhaps a more interesting (but likely pointless) question is what is
> the *least* that can be done to fix C's major problems.
Not pointless. If we ask ourselves that question after every revision
of the language standard, the language _will_ advance. C23 has a
`nullptr` constant. K&R-style function declarations are gone, and good
riddance. I did notice that some national bodies fought like hell to
keep trigraphs, though. :-|
> Compilers can easily add bounds checking for the array[index]
Pascal expected this. One of Kernighan's complaints in his CSTR #100
paper (the one I mentioned) is that he feared precious machine cycles
would be lost validating expressions that pointed within valid bounds.
So why not a compiler switch, jeez louise? Develop in paranoid/slow
mode and ship in sloppy/fast mode. If you must.
It seems that static analysis was in its infancy back then. Compiler
writers screeched like banshees at the forms of validation the Ada
language spec required them to do, and complained so vociferously that
they helped trash the language's reputation. A few years went by and,
gosh, folks realized that you sure could prevent a lot of bugs by wiring
such checks into compilers for other languages--in the places where the
semantics would permit it, a count that was invariably lower than Ada's
because, shock, Ada was actually thought out and went through several
revisions _before_ being put into production.
Did anyone ever repent of their loathsome shrieking? Doubt it. Static
analysis became cool and they accepted whatever plaudits fell upon them.
> construct but ptr[index] can not be checked, unless we make
> a ptr a heavy weight object such as (address, start, limit).
Base and bounds registers are an old idea. Older than C. But the
PDP-11 didn't have them,[1] so C expected to do without and the rest is
lamentable history.
We would do well to learn from C++'s multiple attempts at "smart
pointers". I guess they've got it right in C++11, at last? Not sure.
C++'s aggressive promiscuity has not done C a favor, but rather
conditioned the latter into reflexive, instead of reasoned,
conservatism.
> One can see how code can be generated for code such as this:
>
> Foo x[count];
> Foo* p = x + n; // or &x[n]
>
> Code such as "Foo *p = malloc(size);" would require the compiler to
> know how malloc behaves to be able to compute the limit.
C's refusal to specify dynamic memory allocation in the language runtime
(as opposed to, eventually, the standard library) was a painful
oversight. There was a strange tension between that and code idioms
among C's own earliest practitioners to assume dynamically sized
storage. I remember when novice C programmers managing strings would
get ridiculed by their seniors for setting up and writing to static
buffers. Why did they do that? Because it was easy--the language
supported it well. Going to `malloc()` was like aiming a gun at your
own face.
The routine practice of memory overcommit in C on Unix systems led to a
sort of perverse synergy. Programmers were actively conditioned
_against_ performing algorithmic analysis of their _space_ requirements.
(By contrast, seeing how far you could bro down your code's _time_
complexity was where you really showed your mettle. If you spent all of
the time you saved waiting on I/O, hey man, that's not YOUR problem.)
> But for a user to write a similar function will require some language
> extension.
>
> [Of course, if we did that, adding proper support for multidimensional
> slices would be far easier. But that is an exploration for another
> day!]
When I read about Fortran 90/95/2003's facilities for array reshaping, I
rocked back on my heels.
> Converting enums to behave like Pascal scalars would likely break
> things. The question is, can such breakage be fixed automatically (by
> source code conversion)?
I don't assert that C needs to ape _Pascal_ scalars in particular.
Better Ada's. :P Or, equivalently, C++11's "enum class". As with many
things in C++, the syntax is an ugly graft, but the idea is as sound as
they come.
One of the proposals that didn't make it for C23 was similarly ugly:
"return break;". But the _idea_ was to mark tail recursion so that the
compiler would know it's happening. That saves stack. _That's_ worth
having. I worry that it didn't make it just because the syntax was so
cringey. But the alternatives, like yet another new keyword, or
overloading punctuation some more, seemed worse. C++ indulges both
vices amply with every revision.
> C's union type is used in two different ways: 1: similar to a sum
> type, which can be done type safely and 2: to cheat. The compiler
> should produce a warning when it can't verify a typesafe use -- one
> can add "unsafe" or some such to let the user absolve the compiler of
> such check.
Agreed. C++'s family of typecasting operators is, once again, an ugly
feature syntactically, but the benefits in terms of saying what you
mean, and _only_ what you mean, are valuable.
Casts in C are too often an express ticket to UB.
> [May be naively] I tend to think one can evolve C this way and fix a
> lot of code &/or make a lot of bugs more explicit.
If that be naïveté, let's have more of it.
At 2024-09-20T21:58:26+0100, Warner Losh wrote:
> The CHERI architecture extensions do this. It pushes this info into
> hardware where all pointers point to a region (gross simplification)
> that also grant you rights the area (including read/write/execute).
> It's really cool, but it does come at a cost in performance. Each
> pointer is a pointer, and a capacity that's basically a
> cryptographically signed bit of data that's the bounds and access
> permissions associated with the pointer. There's more details on their
> web site:
> https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
CHERI is absolutely cool and even if it doesn't conquer the world, I
feel sure that there is a lot we can learn from it.
> CHERI-BSD is a FreeBSD variant that runs on both CHERI variants
> (aarch64 and riscv64) and where most of the research has been done.
> There's also a Linux variant as well.
>
> Members of this project know way too many of the corner cases of the C
> language from porting most popular software to the CHERI... And have
> gone on screeds of their own. The only one I can easily find is
> https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf
Oh yes. I remember they presented at the LF's Open Source Summit one
year (maybe the last year in was in downtown San Francisco, before the
LF moved the conference to wine country to scrape off all the engineers
and other tedious techy types who might point out what's wrong with
somebody's grandiose sales pitch--conferences are for getting deals
done [too many vice cops in SF?], not advancing the state of the art).
It was a questionnaire along the lines of "what do you _really_ know
about C?" and it opened my eyes wide for sure.
Apparently it turns out that the Dunning-Kruger effect isn't what we
think it is.
https://www.scientificamerican.com/article/the-dunning-kruger-effect-isnt-what-you-think-it-is/
Maybe D&K's findings were so rapidly assimilated into the cultural
zeitgeist because far too many people are acquainted with highly
confident C programmers.
While preparing this message, I ran across this:
https://csrc.nist.gov/files/pubs/conference/1998/10/08/proceedings-of-the-21st-nissc-1998/final/docs/early-cs-papers/schi75.pdf
"The Design and Specification of a Security Kernel for the PDP-11/45",
by Schiller (1975).
I'll try to read and absorb its 117 pages before burdening this list
with any more of my yammerings. Happy weekend!
Regards,
Branden
[1] I think. The PDP-11/20 infamously didn't have memory protection of
any sort, and the CSRC wisely ran away from that as fast as they
could once they could afford to. (See the preface to the Third
Edition Programmer's Manual.) And it was reasonable to not expect
support for such things if one wanted portability to embedded
systems, but it's not clear to me how seriously the portability of
C itself was considered until the first ports were actually _done_,
and these were not to embedded systems, but to machines broadly
comparable to PDP-11s. London and Reiser's paper on Unix/32V
opened my eyes with respect to just how late some portability-
impacting changes to "K&R C" were actually made. They sounded many
cautionary notes that the community--or maybe it was just compiler
writers (banshees again?)--seemed slow to acknowledge.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 20:58 ` Warner Losh
2024-09-20 21:18 ` Rob Pike
@ 2024-09-20 22:04 ` Bakul Shah via TUHS
2024-09-20 22:19 ` G. Branden Robinson
2 siblings, 0 replies; 17+ messages in thread
From: Bakul Shah via TUHS @ 2024-09-20 22:04 UTC (permalink / raw)
To: Warner Losh; +Cc: The Eunuchs Hysterical Society
[-- Attachment #1: Type: text/plain, Size: 14662 bytes --]
> On Sep 20, 2024, at 1:58 PM, Warner Losh <imp@bsdimp.com> wrote:
>
>
>
> On Fri, Sep 20, 2024 at 9:16 PM Bakul Shah via TUHS <tuhs@tuhs.org <mailto:tuhs@tuhs.org>> wrote:
>> You are a bit late with your screed. You will find posts
>> with similar sentiments starting back in 1980s in Usenet
>> groups such as comp.lang.{c,misc,pascal}.
>>
>> Perhaps a more interesting (but likely pointless) question
>> is what is the *least* that can be done to fix C's major
>> problems.
>>
>> Compilers can easily add bounds checking for the array[index]
>> construct but ptr[index] can not be checked, unless we make
>> a ptr a heavy weight object such as (address, start, limit).
>> One can see how code can be generated for code such as this:
>>
>> Foo x[count];
>> Foo* p = x + n; // or &x[n]
>>
>> Code such as "Foo *p = malloc(size);" would require the
>> compiler to know how malloc behaves to be able to compute
>> the limit. But for a user to write a similar function will
>> require some language extension.
>>
>> [Of course, if we did that, adding proper support for
>> multidimensional slices would be far easier. But that
>> is an exploration for another day!]
>
> The CHERI architecture extensions do this. It pushes this info into hardware
> where all pointers point to a region (gross simplification) that also grant you
> rights the area (including read/write/execute). It's really cool, but it does come
> at a cost in performance. Each pointer is a pointer, and a capacity that's basically
> a cryptographically signed bit of data that's the bounds and access permissions
> associated with the pointer. There's more details on their web site:
> https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
Capabilities are heavier weight and perhaps an overkill to use as pointers.
And that doesn't help programs on normal processors. I view a capability
architecture better suited for microkernels -- a cap call would be akin to
a syscall + upcall to a server running in user code. For example
"read(file-cap, buffer-cap, size)" would need to be delivered to a fileserver
process etc. Basically a cap. is ptr *across* a protection domain. We want
type safe (including bound checking) within a protection domain (a process).
A compiler can often elide bounds checks or push them out of a loop.
Similarly for other smaller changes. The idea is to try to "fix" C with as little
rewriting as possible. Nobody is going to fund writing rewirtng all 10M lines of
kernel code in C (& more in user code) into Rust (not to mention such from
scratch rewrites usually result in incompatibilities).
But we still seem to want maximum performance and maximum security
without paying for it (and if pushed, we live with bugs but not lower
performance even if processors are orders of magniture faster now).
>
> CHERI-BSD is a FreeBSD variant that runs on both CHERI variants (aarch64 and
> riscv64) and where most of the research has been done. There's also a Linux
> variant as well.
>
> Members of this project know way too many of the corner cases of the C language
> from porting most popular software to the CHERI... And have gone on screeds of
> their own. The only one I can easily find is
> https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf
>
> Warner
>
>> Converting enums to behave like Pascal scalars would
>> likely break things. The question is, can such breakage
>> be fixed automatically (by source code conversion)?
>>
>> C's union type is used in two different ways: 1: similar
>> to a sum type, which can be done type safely and 2: to
>> cheat. The compiler should produce a warning when it can't
>> verify a typesafe use -- one can add "unsafe" or some such
>> to let the user absolve the compiler of such check.
>>
>> [May be naively] I tend to think one can evolve C this way
>> and fix a lot of code &/or make a lot of bugs more explicit.
>>
>> > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson <g.branden.robinson@gmail.com <mailto:g.branden.robinson@gmail.com>> wrote:
>> >
>> > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote:
>> >> Unless I'm mistaken (quite possible at my age), the OP was referring
>> >> to that in C, pointers and arrays are pretty much the same thing i.e.
>> >> "foo[-2]" means "take the pointer 'foo' and go back two things"
>> >> (whatever a "thing" is).
>> >
>> > "in C, pointers and arrays are pretty much the same thing" is a common
>> > utterance but misleading, and in my opinion, better replaced with a
>> > different one.
>> >
>> > We should instead say something more like:
>> >
>> > In C, pointers and arrays have compatible dereference syntaxes.
>> >
>> > They do _not_ have compatible _declaration_ syntaxes.
>> >
>> > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_
>> > (1994) tackles this issue head-on and at length.
>> >
>> > Here's the salient point.
>> >
>> > "Consider the case of an external declaration `extern char *p;` but a
>> > definition of `char p[10];`. When we retrieve the contents of `p[i]`
>> > using the extern, we get characters, but we treat it as a pointer.
>> > Interpreting ASCII characters as an address is garbage, and if you're
>> > lucky the program will coredump at that point. If you're not lucky it
>> > will corrupt something in your address space, causing a mysterious
>> > failure at some point later in the program."
>> >
>> >> C is just a high level assembly language;
>> >
>> > I disagree with this common claim too. Assembly languages correspond to
>> > well-defined machine models.[1] Those machine models have memory
>> > models. C has no memory model--deliberately, because that would have
>> > gotten in the way of performance. (In practice, C's machine model was
>> > and remains the PDP-11,[2] with aspects thereof progressively sanded off
>> > over the years in repeated efforts to salvage the language's reputation
>> > for portability.)
>> >
>> >> there is no such object as a "string" for example: it's just an "array
>> >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof".
>> >
>> > Yeah, it turns out we need a well-defined string type much more
>> > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated.
>> > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the
>> > end of the 1970s and C aficionados have defended the language's
>> > purported perfection with such vigor that they annexed the haphazardly
>> > assembled standard library into the territory that they defend with much
>> > rhetorical violence and overstatement. From useless or redundant return
>> > values to const-carelessness to Schlemiel the Painter algorithms in
>> > implementations, it seems we've collectively made every mistake that
>> > could be made with Nelson's original, minimal API, and taught those
>> > mistakes as best practices in tutorials and classrooms. A sorry affair.
>> >
>> > So deep was this disdain for the string as a well-defined data type, and
>> > moreover one conceptually distinct from an array (or vector) of integral
>> > types that Stroustrup initially repeated the mistake in C++. People can
>> > easily roll their own, he seemed to have thought. Eventually he thought
>> > again, but C++ took so long to get standardized that by then, damage was
>> > done.
>> >
>> > "A string is just an array of `char`s, and a `char` is just a
>> > byte"--another hasty equivalence that surrendered a priceless hostage to
>> > fortune. This is the sort of fallacy indulged by people excessively
>> > wedded to machine language programming and who apply its perspective to
>> > every problem statement uncritically.
>> >
>> > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow"
>> > characters, and "base" vs. "combining" characters, the champions of the
>> > "portable assembly" paradigm charged like Lord Cardigan into the pike
>> > and musket lines of the character type as one might envision it in a
>> > machine register. (This insistence on visualizing register-level
>> > representations has prompted numerous other stupidities, like the use of
>> > an integral zero at the _language level_ to represent empty, null, or
>> > false literals for as many different data types as possible. "If it
>> > ends up as a zero in a register," the thinking appears to have gone, "it
>> > should look like a zero in the source code." Generations of code--and
>> > language--cowboys have screwed us all over repeatedly with this hasty
>> > equivalence.
>> >
>> > Type theorists have known better for decades. But type theory is (1)
>> > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy
>> > day in the sun (for which we may be grateful), which means that is
>> > seldom on the path one anticipates to a comfortable retirement from a
>> > Silicon Valley tech company (or several) on a private yacht.
>> >
>> > Why do I rant so splenetically about these issues? Because the result
>> > of such confusion is _bugs in programs_. You want something concrete?
>> > There it is. Data types protect you from screwing up. And the better
>> > your data types are, the more care you give to specifying what sorts of
>> > objects your program manipulates, the more thought you give to the
>> > invariants that must be maintained for your program to remain in a
>> > well-defined state, the fewer bugs you will have.
>> >
>> > But, nah, better to slap together a prototype, ship it, talk it up to
>> > the moon as your latest triumph while interviewing with a rival of the
>> > company you just delivered that prototype to, and look on in amusement
>> > when your brilliant achievement either proves disastrous in deployment
>> > or soaks up the waking hours of an entire team of your former colleagues
>> > cleaning up the steaming pile you voided from your rock star bowels.
>> >
>> > We've paid a heavy price for C's slow and seemingly deeply grudging
>> > embrace of the type concept. (The lack of controlled scope for
>> > enumeration constants is one example; the horrifyingly ill-conceived
>> > choice of "typedef" as a keyword indicating _type aliasing_ is another.)
>> > Kernighan did not help by trashing Pascal so hard in about 1980. He was
>> > dead right that Pascal needed, essentially, polymorphic subprograms in
>> > array types. Wirth not speccing the language to accommodate that back
>> > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff
>> > right--stuff that the partisanship of C advocates refused to countenance
>> > such that they ended up celebrating C's flaws as features. No amount of
>> > Jonestown tea could quench their thirst. I suspect the truth was more
>> > that they didn't want to bother having to learn any other languages.
>> > (Or if they did, not any language that anyone else on their team at work
>> > had any facility with.) A rock star plays only one instrument, no?
>> > People didn't like it when Eddie Van Halen played keyboards instead of
>> > guitar on stage, so he stopped doing that. The less your coworkers
>> > understand your work, the more of a genius you must be.
>> >
>> > Now, where was I?
>> >
>> >> What's the length of "abc" vs. how many bytes are needed to store it?
>> >
>> > Even what is meant by "length" has several different correct answers!
>> > Quantity of code points in the sequence? Number of "grapheme clusters"
>> > a.k.a. "user-perceived characters" as Unicode puts it? Width as
>> > represented on the output device? On an ASCII device these usually had
>> > the same answer (control characters excepted). But even at the Bell
>> > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't
>> > necessarily have to. (How wide is an em dash? How many bytes represent
>> > it, in the formatting language and in the output language?)
>> >
>> >> Giggle... In a device driver I wrote for V6, I used the expression
>> >>
>> >> "0123"[n]
>> >>
>> >> and the two programmers whom I thought were better than me had to ask
>> >> me what it did...
>> >>
>> >> -- Dave, brought up on PDP-11 Unix[*]
>> >
>> > I enjoy this application of that technique, courtesy of Alan Cox.
>> >
>> > fsck-fuzix: blow 90 bytes on a progress indicator
>> >
>> > static void progress(void)
>> > {
>> > static uint8_t progct;
>> > progct++;
>> > progct&=3;
>> > printf("%c\010", "-\\|/"[progct]);
>> > fflush(stdout);
>> > }
>> >
>> >> I still remember the days of BOS/PICK/etc, and I staked my career on
>> >> Unix.
>> >
>> > Not a bad choice. Your exposure to and recollection of other ways of
>> > doing things, I suspect, made you a more valuable contributor than those
>> > who mazed themselves with thoughts of "the Unix way" to the point that
>> > they never seriously considered any other.
>> >
>> > It's fine to prefer "the C way" or "the Unix way", if you can
>> > intelligibly define what that means as applied to the issue in dispute,
>> > and coherently defend it. Demonstrating an understanding of the
>> > alternatives, and being able to credibly explain why they are inferior
>> > approaches, is how to do advocacy correctly.
>> >
>> > But it is not the cowboy way. The rock star way.
>> >
>> > Regards,
>> > Branden
>> >
>> > [1] Unfortunately I must concede that this claim is less true than it
>> > used to be thanks to the relentless pursuit of trade-secret means of
>> > optimizing hardware performance. Assembly languages now correspond,
>> > particularly on x86, to a sort of macro language that imperfectly
>> > masks a massive amount of microarchitectural state that the
>> > implementors themselves don't completely understand, at least not in
>> > time to get the product to market. Hence the field day of
>> > speculative execution attacks and similar. It would not be fair to
>> > say that CPUs of old had _no_ microarchitectural state--the Z80, for
>> > example, had the not-completely-official `W` and `Z` registers--but
>> > they did have much less of it, and correspondingly less attack
>> > surface for screwing your programs. I do miss the days of
>> > deterministic cycle counts for instruction execution. But I know
>> > I'd be sad if all the caches on my workaday machine switched off.
>> >
>> > [2] https://queue.acm.org/detail.cfm?id=3212479
>>
[-- Attachment #2: Type: text/html, Size: 17522 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 20:58 ` Warner Losh
@ 2024-09-20 21:18 ` Rob Pike
2024-09-20 22:04 ` Bakul Shah via TUHS
2024-09-20 22:19 ` G. Branden Robinson
2 siblings, 0 replies; 17+ messages in thread
From: Rob Pike @ 2024-09-20 21:18 UTC (permalink / raw)
To: Warner Losh; +Cc: Bakul Shah, The Eunuchs Hysterical Society
[-- Attachment #1: Type: text/plain, Size: 13740 bytes --]
Here is some code from typo.
int table[2]; /*keep these four cards in order*/
int tab1[26];
int tab2[730];
char tab3[19684];
...
er = read(salt,table,21200);
Note the use of the word 'card'.
The past is a different country.
-rob
On Sat, Sep 21, 2024 at 7:07 AM Warner Losh <imp@bsdimp.com> wrote:
>
>
> On Fri, Sep 20, 2024 at 9:16 PM Bakul Shah via TUHS <tuhs@tuhs.org> wrote:
>
>> You are a bit late with your screed. You will find posts
>> with similar sentiments starting back in 1980s in Usenet
>> groups such as comp.lang.{c,misc,pascal}.
>>
>> Perhaps a more interesting (but likely pointless) question
>> is what is the *least* that can be done to fix C's major
>> problems.
>>
>> Compilers can easily add bounds checking for the array[index]
>> construct but ptr[index] can not be checked, unless we make
>> a ptr a heavy weight object such as (address, start, limit).
>> One can see how code can be generated for code such as this:
>>
>> Foo x[count];
>> Foo* p = x + n; // or &x[n]
>>
>> Code such as "Foo *p = malloc(size);" would require the
>> compiler to know how malloc behaves to be able to compute
>> the limit. But for a user to write a similar function will
>> require some language extension.
>>
>> [Of course, if we did that, adding proper support for
>> multidimensional slices would be far easier. But that
>> is an exploration for another day!]
>>
>
> The CHERI architecture extensions do this. It pushes this info into
> hardware
> where all pointers point to a region (gross simplification) that also
> grant you
> rights the area (including read/write/execute). It's really cool, but it
> does come
> at a cost in performance. Each pointer is a pointer, and a capacity that's
> basically
> a cryptographically signed bit of data that's the bounds and access
> permissions
> associated with the pointer. There's more details on their web site:
> https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
>
> CHERI-BSD is a FreeBSD variant that runs on both CHERI variants (aarch64
> and
> riscv64) and where most of the research has been done. There's also a
> Linux
> variant as well.
>
> Members of this project know way too many of the corner cases of the C
> language
> from porting most popular software to the CHERI... And have gone on
> screeds of
> their own. The only one I can easily find is
>
> https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf
>
> Warner
>
>
>> Converting enums to behave like Pascal scalars would
>> likely break things. The question is, can such breakage
>> be fixed automatically (by source code conversion)?
>>
>> C's union type is used in two different ways: 1: similar
>> to a sum type, which can be done type safely and 2: to
>> cheat. The compiler should produce a warning when it can't
>> verify a typesafe use -- one can add "unsafe" or some such
>> to let the user absolve the compiler of such check.
>>
>> [May be naively] I tend to think one can evolve C this way
>> and fix a lot of code &/or make a lot of bugs more explicit.
>>
>> > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson <
>> g.branden.robinson@gmail.com> wrote:
>> >
>> > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote:
>> >> Unless I'm mistaken (quite possible at my age), the OP was referring
>> >> to that in C, pointers and arrays are pretty much the same thing i.e.
>> >> "foo[-2]" means "take the pointer 'foo' and go back two things"
>> >> (whatever a "thing" is).
>> >
>> > "in C, pointers and arrays are pretty much the same thing" is a common
>> > utterance but misleading, and in my opinion, better replaced with a
>> > different one.
>> >
>> > We should instead say something more like:
>> >
>> > In C, pointers and arrays have compatible dereference syntaxes.
>> >
>> > They do _not_ have compatible _declaration_ syntaxes.
>> >
>> > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_
>> > (1994) tackles this issue head-on and at length.
>> >
>> > Here's the salient point.
>> >
>> > "Consider the case of an external declaration `extern char *p;` but a
>> > definition of `char p[10];`. When we retrieve the contents of `p[i]`
>> > using the extern, we get characters, but we treat it as a pointer.
>> > Interpreting ASCII characters as an address is garbage, and if you're
>> > lucky the program will coredump at that point. If you're not lucky it
>> > will corrupt something in your address space, causing a mysterious
>> > failure at some point later in the program."
>> >
>> >> C is just a high level assembly language;
>> >
>> > I disagree with this common claim too. Assembly languages correspond to
>> > well-defined machine models.[1] Those machine models have memory
>> > models. C has no memory model--deliberately, because that would have
>> > gotten in the way of performance. (In practice, C's machine model was
>> > and remains the PDP-11,[2] with aspects thereof progressively sanded off
>> > over the years in repeated efforts to salvage the language's reputation
>> > for portability.)
>> >
>> >> there is no such object as a "string" for example: it's just an "array
>> >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof".
>> >
>> > Yeah, it turns out we need a well-defined string type much more
>> > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated.
>> > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the
>> > end of the 1970s and C aficionados have defended the language's
>> > purported perfection with such vigor that they annexed the haphazardly
>> > assembled standard library into the territory that they defend with much
>> > rhetorical violence and overstatement. From useless or redundant return
>> > values to const-carelessness to Schlemiel the Painter algorithms in
>> > implementations, it seems we've collectively made every mistake that
>> > could be made with Nelson's original, minimal API, and taught those
>> > mistakes as best practices in tutorials and classrooms. A sorry affair.
>> >
>> > So deep was this disdain for the string as a well-defined data type, and
>> > moreover one conceptually distinct from an array (or vector) of integral
>> > types that Stroustrup initially repeated the mistake in C++. People can
>> > easily roll their own, he seemed to have thought. Eventually he thought
>> > again, but C++ took so long to get standardized that by then, damage was
>> > done.
>> >
>> > "A string is just an array of `char`s, and a `char` is just a
>> > byte"--another hasty equivalence that surrendered a priceless hostage to
>> > fortune. This is the sort of fallacy indulged by people excessively
>> > wedded to machine language programming and who apply its perspective to
>> > every problem statement uncritically.
>> >
>> > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow"
>> > characters, and "base" vs. "combining" characters, the champions of the
>> > "portable assembly" paradigm charged like Lord Cardigan into the pike
>> > and musket lines of the character type as one might envision it in a
>> > machine register. (This insistence on visualizing register-level
>> > representations has prompted numerous other stupidities, like the use of
>> > an integral zero at the _language level_ to represent empty, null, or
>> > false literals for as many different data types as possible. "If it
>> > ends up as a zero in a register," the thinking appears to have gone, "it
>> > should look like a zero in the source code." Generations of code--and
>> > language--cowboys have screwed us all over repeatedly with this hasty
>> > equivalence.
>> >
>> > Type theorists have known better for decades. But type theory is (1)
>> > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy
>> > day in the sun (for which we may be grateful), which means that is
>> > seldom on the path one anticipates to a comfortable retirement from a
>> > Silicon Valley tech company (or several) on a private yacht.
>> >
>> > Why do I rant so splenetically about these issues? Because the result
>> > of such confusion is _bugs in programs_. You want something concrete?
>> > There it is. Data types protect you from screwing up. And the better
>> > your data types are, the more care you give to specifying what sorts of
>> > objects your program manipulates, the more thought you give to the
>> > invariants that must be maintained for your program to remain in a
>> > well-defined state, the fewer bugs you will have.
>> >
>> > But, nah, better to slap together a prototype, ship it, talk it up to
>> > the moon as your latest triumph while interviewing with a rival of the
>> > company you just delivered that prototype to, and look on in amusement
>> > when your brilliant achievement either proves disastrous in deployment
>> > or soaks up the waking hours of an entire team of your former colleagues
>> > cleaning up the steaming pile you voided from your rock star bowels.
>> >
>> > We've paid a heavy price for C's slow and seemingly deeply grudging
>> > embrace of the type concept. (The lack of controlled scope for
>> > enumeration constants is one example; the horrifyingly ill-conceived
>> > choice of "typedef" as a keyword indicating _type aliasing_ is another.)
>> > Kernighan did not help by trashing Pascal so hard in about 1980. He was
>> > dead right that Pascal needed, essentially, polymorphic subprograms in
>> > array types. Wirth not speccing the language to accommodate that back
>> > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff
>> > right--stuff that the partisanship of C advocates refused to countenance
>> > such that they ended up celebrating C's flaws as features. No amount of
>> > Jonestown tea could quench their thirst. I suspect the truth was more
>> > that they didn't want to bother having to learn any other languages.
>> > (Or if they did, not any language that anyone else on their team at work
>> > had any facility with.) A rock star plays only one instrument, no?
>> > People didn't like it when Eddie Van Halen played keyboards instead of
>> > guitar on stage, so he stopped doing that. The less your coworkers
>> > understand your work, the more of a genius you must be.
>> >
>> > Now, where was I?
>> >
>> >> What's the length of "abc" vs. how many bytes are needed to store it?
>> >
>> > Even what is meant by "length" has several different correct answers!
>> > Quantity of code points in the sequence? Number of "grapheme clusters"
>> > a.k.a. "user-perceived characters" as Unicode puts it? Width as
>> > represented on the output device? On an ASCII device these usually had
>> > the same answer (control characters excepted). But even at the Bell
>> > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't
>> > necessarily have to. (How wide is an em dash? How many bytes represent
>> > it, in the formatting language and in the output language?)
>> >
>> >> Giggle... In a device driver I wrote for V6, I used the expression
>> >>
>> >> "0123"[n]
>> >>
>> >> and the two programmers whom I thought were better than me had to ask
>> >> me what it did...
>> >>
>> >> -- Dave, brought up on PDP-11 Unix[*]
>> >
>> > I enjoy this application of that technique, courtesy of Alan Cox.
>> >
>> > fsck-fuzix: blow 90 bytes on a progress indicator
>> >
>> > static void progress(void)
>> > {
>> > static uint8_t progct;
>> > progct++;
>> > progct&=3;
>> > printf("%c\010", "-\\|/"[progct]);
>> > fflush(stdout);
>> > }
>> >
>> >> I still remember the days of BOS/PICK/etc, and I staked my career on
>> >> Unix.
>> >
>> > Not a bad choice. Your exposure to and recollection of other ways of
>> > doing things, I suspect, made you a more valuable contributor than those
>> > who mazed themselves with thoughts of "the Unix way" to the point that
>> > they never seriously considered any other.
>> >
>> > It's fine to prefer "the C way" or "the Unix way", if you can
>> > intelligibly define what that means as applied to the issue in dispute,
>> > and coherently defend it. Demonstrating an understanding of the
>> > alternatives, and being able to credibly explain why they are inferior
>> > approaches, is how to do advocacy correctly.
>> >
>> > But it is not the cowboy way. The rock star way.
>> >
>> > Regards,
>> > Branden
>> >
>> > [1] Unfortunately I must concede that this claim is less true than it
>> > used to be thanks to the relentless pursuit of trade-secret means of
>> > optimizing hardware performance. Assembly languages now correspond,
>> > particularly on x86, to a sort of macro language that imperfectly
>> > masks a massive amount of microarchitectural state that the
>> > implementors themselves don't completely understand, at least not in
>> > time to get the product to market. Hence the field day of
>> > speculative execution attacks and similar. It would not be fair to
>> > say that CPUs of old had _no_ microarchitectural state--the Z80, for
>> > example, had the not-completely-official `W` and `Z` registers--but
>> > they did have much less of it, and correspondingly less attack
>> > surface for screwing your programs. I do miss the days of
>> > deterministic cycle counts for instruction execution. But I know
>> > I'd be sad if all the caches on my workaday machine switched off.
>> >
>> > [2] https://queue.acm.org/detail.cfm?id=3212479
>>
>>
[-- Attachment #2: Type: text/html, Size: 19465 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 20:16 ` Bakul Shah via TUHS
@ 2024-09-20 20:58 ` Warner Losh
2024-09-20 21:18 ` Rob Pike
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Warner Losh @ 2024-09-20 20:58 UTC (permalink / raw)
To: Bakul Shah; +Cc: The Eunuchs Hysterical Society
[-- Attachment #1: Type: text/plain, Size: 13107 bytes --]
On Fri, Sep 20, 2024 at 9:16 PM Bakul Shah via TUHS <tuhs@tuhs.org> wrote:
> You are a bit late with your screed. You will find posts
> with similar sentiments starting back in 1980s in Usenet
> groups such as comp.lang.{c,misc,pascal}.
>
> Perhaps a more interesting (but likely pointless) question
> is what is the *least* that can be done to fix C's major
> problems.
>
> Compilers can easily add bounds checking for the array[index]
> construct but ptr[index] can not be checked, unless we make
> a ptr a heavy weight object such as (address, start, limit).
> One can see how code can be generated for code such as this:
>
> Foo x[count];
> Foo* p = x + n; // or &x[n]
>
> Code such as "Foo *p = malloc(size);" would require the
> compiler to know how malloc behaves to be able to compute
> the limit. But for a user to write a similar function will
> require some language extension.
>
> [Of course, if we did that, adding proper support for
> multidimensional slices would be far easier. But that
> is an exploration for another day!]
>
The CHERI architecture extensions do this. It pushes this info into hardware
where all pointers point to a region (gross simplification) that also grant
you
rights the area (including read/write/execute). It's really cool, but it
does come
at a cost in performance. Each pointer is a pointer, and a capacity that's
basically
a cryptographically signed bit of data that's the bounds and access
permissions
associated with the pointer. There's more details on their web site:
https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
CHERI-BSD is a FreeBSD variant that runs on both CHERI variants (aarch64 and
riscv64) and where most of the research has been done. There's also a Linux
variant as well.
Members of this project know way too many of the corner cases of the C
language
from porting most popular software to the CHERI... And have gone on
screeds of
their own. The only one I can easily find is
https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/helloworld.pdf
Warner
> Converting enums to behave like Pascal scalars would
> likely break things. The question is, can such breakage
> be fixed automatically (by source code conversion)?
>
> C's union type is used in two different ways: 1: similar
> to a sum type, which can be done type safely and 2: to
> cheat. The compiler should produce a warning when it can't
> verify a typesafe use -- one can add "unsafe" or some such
> to let the user absolve the compiler of such check.
>
> [May be naively] I tend to think one can evolve C this way
> and fix a lot of code &/or make a lot of bugs more explicit.
>
> > On Sep 20, 2024, at 10:11 AM, G. Branden Robinson <
> g.branden.robinson@gmail.com> wrote:
> >
> > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote:
> >> Unless I'm mistaken (quite possible at my age), the OP was referring
> >> to that in C, pointers and arrays are pretty much the same thing i.e.
> >> "foo[-2]" means "take the pointer 'foo' and go back two things"
> >> (whatever a "thing" is).
> >
> > "in C, pointers and arrays are pretty much the same thing" is a common
> > utterance but misleading, and in my opinion, better replaced with a
> > different one.
> >
> > We should instead say something more like:
> >
> > In C, pointers and arrays have compatible dereference syntaxes.
> >
> > They do _not_ have compatible _declaration_ syntaxes.
> >
> > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_
> > (1994) tackles this issue head-on and at length.
> >
> > Here's the salient point.
> >
> > "Consider the case of an external declaration `extern char *p;` but a
> > definition of `char p[10];`. When we retrieve the contents of `p[i]`
> > using the extern, we get characters, but we treat it as a pointer.
> > Interpreting ASCII characters as an address is garbage, and if you're
> > lucky the program will coredump at that point. If you're not lucky it
> > will corrupt something in your address space, causing a mysterious
> > failure at some point later in the program."
> >
> >> C is just a high level assembly language;
> >
> > I disagree with this common claim too. Assembly languages correspond to
> > well-defined machine models.[1] Those machine models have memory
> > models. C has no memory model--deliberately, because that would have
> > gotten in the way of performance. (In practice, C's machine model was
> > and remains the PDP-11,[2] with aspects thereof progressively sanded off
> > over the years in repeated efforts to salvage the language's reputation
> > for portability.)
> >
> >> there is no such object as a "string" for example: it's just an "array
> >> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof".
> >
> > Yeah, it turns out we need a well-defined string type much more
> > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated.
> > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the
> > end of the 1970s and C aficionados have defended the language's
> > purported perfection with such vigor that they annexed the haphazardly
> > assembled standard library into the territory that they defend with much
> > rhetorical violence and overstatement. From useless or redundant return
> > values to const-carelessness to Schlemiel the Painter algorithms in
> > implementations, it seems we've collectively made every mistake that
> > could be made with Nelson's original, minimal API, and taught those
> > mistakes as best practices in tutorials and classrooms. A sorry affair.
> >
> > So deep was this disdain for the string as a well-defined data type, and
> > moreover one conceptually distinct from an array (or vector) of integral
> > types that Stroustrup initially repeated the mistake in C++. People can
> > easily roll their own, he seemed to have thought. Eventually he thought
> > again, but C++ took so long to get standardized that by then, damage was
> > done.
> >
> > "A string is just an array of `char`s, and a `char` is just a
> > byte"--another hasty equivalence that surrendered a priceless hostage to
> > fortune. This is the sort of fallacy indulged by people excessively
> > wedded to machine language programming and who apply its perspective to
> > every problem statement uncritically.
> >
> > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow"
> > characters, and "base" vs. "combining" characters, the champions of the
> > "portable assembly" paradigm charged like Lord Cardigan into the pike
> > and musket lines of the character type as one might envision it in a
> > machine register. (This insistence on visualizing register-level
> > representations has prompted numerous other stupidities, like the use of
> > an integral zero at the _language level_ to represent empty, null, or
> > false literals for as many different data types as possible. "If it
> > ends up as a zero in a register," the thinking appears to have gone, "it
> > should look like a zero in the source code." Generations of code--and
> > language--cowboys have screwed us all over repeatedly with this hasty
> > equivalence.
> >
> > Type theorists have known better for decades. But type theory is (1)
> > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy
> > day in the sun (for which we may be grateful), which means that is
> > seldom on the path one anticipates to a comfortable retirement from a
> > Silicon Valley tech company (or several) on a private yacht.
> >
> > Why do I rant so splenetically about these issues? Because the result
> > of such confusion is _bugs in programs_. You want something concrete?
> > There it is. Data types protect you from screwing up. And the better
> > your data types are, the more care you give to specifying what sorts of
> > objects your program manipulates, the more thought you give to the
> > invariants that must be maintained for your program to remain in a
> > well-defined state, the fewer bugs you will have.
> >
> > But, nah, better to slap together a prototype, ship it, talk it up to
> > the moon as your latest triumph while interviewing with a rival of the
> > company you just delivered that prototype to, and look on in amusement
> > when your brilliant achievement either proves disastrous in deployment
> > or soaks up the waking hours of an entire team of your former colleagues
> > cleaning up the steaming pile you voided from your rock star bowels.
> >
> > We've paid a heavy price for C's slow and seemingly deeply grudging
> > embrace of the type concept. (The lack of controlled scope for
> > enumeration constants is one example; the horrifyingly ill-conceived
> > choice of "typedef" as a keyword indicating _type aliasing_ is another.)
> > Kernighan did not help by trashing Pascal so hard in about 1980. He was
> > dead right that Pascal needed, essentially, polymorphic subprograms in
> > array types. Wirth not speccing the language to accommodate that back
> > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff
> > right--stuff that the partisanship of C advocates refused to countenance
> > such that they ended up celebrating C's flaws as features. No amount of
> > Jonestown tea could quench their thirst. I suspect the truth was more
> > that they didn't want to bother having to learn any other languages.
> > (Or if they did, not any language that anyone else on their team at work
> > had any facility with.) A rock star plays only one instrument, no?
> > People didn't like it when Eddie Van Halen played keyboards instead of
> > guitar on stage, so he stopped doing that. The less your coworkers
> > understand your work, the more of a genius you must be.
> >
> > Now, where was I?
> >
> >> What's the length of "abc" vs. how many bytes are needed to store it?
> >
> > Even what is meant by "length" has several different correct answers!
> > Quantity of code points in the sequence? Number of "grapheme clusters"
> > a.k.a. "user-perceived characters" as Unicode puts it? Width as
> > represented on the output device? On an ASCII device these usually had
> > the same answer (control characters excepted). But even at the Bell
> > Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't
> > necessarily have to. (How wide is an em dash? How many bytes represent
> > it, in the formatting language and in the output language?)
> >
> >> Giggle... In a device driver I wrote for V6, I used the expression
> >>
> >> "0123"[n]
> >>
> >> and the two programmers whom I thought were better than me had to ask
> >> me what it did...
> >>
> >> -- Dave, brought up on PDP-11 Unix[*]
> >
> > I enjoy this application of that technique, courtesy of Alan Cox.
> >
> > fsck-fuzix: blow 90 bytes on a progress indicator
> >
> > static void progress(void)
> > {
> > static uint8_t progct;
> > progct++;
> > progct&=3;
> > printf("%c\010", "-\\|/"[progct]);
> > fflush(stdout);
> > }
> >
> >> I still remember the days of BOS/PICK/etc, and I staked my career on
> >> Unix.
> >
> > Not a bad choice. Your exposure to and recollection of other ways of
> > doing things, I suspect, made you a more valuable contributor than those
> > who mazed themselves with thoughts of "the Unix way" to the point that
> > they never seriously considered any other.
> >
> > It's fine to prefer "the C way" or "the Unix way", if you can
> > intelligibly define what that means as applied to the issue in dispute,
> > and coherently defend it. Demonstrating an understanding of the
> > alternatives, and being able to credibly explain why they are inferior
> > approaches, is how to do advocacy correctly.
> >
> > But it is not the cowboy way. The rock star way.
> >
> > Regards,
> > Branden
> >
> > [1] Unfortunately I must concede that this claim is less true than it
> > used to be thanks to the relentless pursuit of trade-secret means of
> > optimizing hardware performance. Assembly languages now correspond,
> > particularly on x86, to a sort of macro language that imperfectly
> > masks a massive amount of microarchitectural state that the
> > implementors themselves don't completely understand, at least not in
> > time to get the product to market. Hence the field day of
> > speculative execution attacks and similar. It would not be fair to
> > say that CPUs of old had _no_ microarchitectural state--the Z80, for
> > example, had the not-completely-official `W` and `Z` registers--but
> > they did have much less of it, and correspondingly less attack
> > surface for screwing your programs. I do miss the days of
> > deterministic cycle counts for instruction execution. But I know
> > I'd be sad if all the caches on my workaday machine switched off.
> >
> > [2] https://queue.acm.org/detail.cfm?id=3212479
>
>
[-- Attachment #2: Type: text/html, Size: 15731 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 17:11 ` G. Branden Robinson
@ 2024-09-20 20:16 ` Bakul Shah via TUHS
2024-09-20 20:58 ` Warner Losh
0 siblings, 1 reply; 17+ messages in thread
From: Bakul Shah via TUHS @ 2024-09-20 20:16 UTC (permalink / raw)
To: G. Branden Robinson; +Cc: The Eunuchs Hysterical Society
You are a bit late with your screed. You will find posts
with similar sentiments starting back in 1980s in Usenet
groups such as comp.lang.{c,misc,pascal}.
Perhaps a more interesting (but likely pointless) question
is what is the *least* that can be done to fix C's major
problems.
Compilers can easily add bounds checking for the array[index]
construct but ptr[index] can not be checked, unless we make
a ptr a heavy weight object such as (address, start, limit).
One can see how code can be generated for code such as this:
Foo x[count];
Foo* p = x + n; // or &x[n]
Code such as "Foo *p = malloc(size);" would require the
compiler to know how malloc behaves to be able to compute
the limit. But for a user to write a similar function will
require some language extension.
[Of course, if we did that, adding proper support for
multidimensional slices would be far easier. But that
is an exploration for another day!]
Converting enums to behave like Pascal scalars would
likely break things. The question is, can such breakage
be fixed automatically (by source code conversion)?
C's union type is used in two different ways: 1: similar
to a sum type, which can be done type safely and 2: to
cheat. The compiler should produce a warning when it can't
verify a typesafe use -- one can add "unsafe" or some such
to let the user absolve the compiler of such check.
[May be naively] I tend to think one can evolve C this way
and fix a lot of code &/or make a lot of bugs more explicit.
> On Sep 20, 2024, at 10:11 AM, G. Branden Robinson <g.branden.robinson@gmail.com> wrote:
>
> At 2024-09-21T01:07:11+1000, Dave Horsfall wrote:
>> Unless I'm mistaken (quite possible at my age), the OP was referring
>> to that in C, pointers and arrays are pretty much the same thing i.e.
>> "foo[-2]" means "take the pointer 'foo' and go back two things"
>> (whatever a "thing" is).
>
> "in C, pointers and arrays are pretty much the same thing" is a common
> utterance but misleading, and in my opinion, better replaced with a
> different one.
>
> We should instead say something more like:
>
> In C, pointers and arrays have compatible dereference syntaxes.
>
> They do _not_ have compatible _declaration_ syntaxes.
>
> Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_
> (1994) tackles this issue head-on and at length.
>
> Here's the salient point.
>
> "Consider the case of an external declaration `extern char *p;` but a
> definition of `char p[10];`. When we retrieve the contents of `p[i]`
> using the extern, we get characters, but we treat it as a pointer.
> Interpreting ASCII characters as an address is garbage, and if you're
> lucky the program will coredump at that point. If you're not lucky it
> will corrupt something in your address space, causing a mysterious
> failure at some point later in the program."
>
>> C is just a high level assembly language;
>
> I disagree with this common claim too. Assembly languages correspond to
> well-defined machine models.[1] Those machine models have memory
> models. C has no memory model--deliberately, because that would have
> gotten in the way of performance. (In practice, C's machine model was
> and remains the PDP-11,[2] with aspects thereof progressively sanded off
> over the years in repeated efforts to salvage the language's reputation
> for portability.)
>
>> there is no such object as a "string" for example: it's just an "array
>> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof".
>
> Yeah, it turns out we need a well-defined string type much more
> powerfully than, it seems, anyone at the Bell Labs CSRC appreciated.
> string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the
> end of the 1970s and C aficionados have defended the language's
> purported perfection with such vigor that they annexed the haphazardly
> assembled standard library into the territory that they defend with much
> rhetorical violence and overstatement. From useless or redundant return
> values to const-carelessness to Schlemiel the Painter algorithms in
> implementations, it seems we've collectively made every mistake that
> could be made with Nelson's original, minimal API, and taught those
> mistakes as best practices in tutorials and classrooms. A sorry affair.
>
> So deep was this disdain for the string as a well-defined data type, and
> moreover one conceptually distinct from an array (or vector) of integral
> types that Stroustrup initially repeated the mistake in C++. People can
> easily roll their own, he seemed to have thought. Eventually he thought
> again, but C++ took so long to get standardized that by then, damage was
> done.
>
> "A string is just an array of `char`s, and a `char` is just a
> byte"--another hasty equivalence that surrendered a priceless hostage to
> fortune. This is the sort of fallacy indulged by people excessively
> wedded to machine language programming and who apply its perspective to
> every problem statement uncritically.
>
> Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow"
> characters, and "base" vs. "combining" characters, the champions of the
> "portable assembly" paradigm charged like Lord Cardigan into the pike
> and musket lines of the character type as one might envision it in a
> machine register. (This insistence on visualizing register-level
> representations has prompted numerous other stupidities, like the use of
> an integral zero at the _language level_ to represent empty, null, or
> false literals for as many different data types as possible. "If it
> ends up as a zero in a register," the thinking appears to have gone, "it
> should look like a zero in the source code." Generations of code--and
> language--cowboys have screwed us all over repeatedly with this hasty
> equivalence.
>
> Type theorists have known better for decades. But type theory is (1)
> hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy
> day in the sun (for which we may be grateful), which means that is
> seldom on the path one anticipates to a comfortable retirement from a
> Silicon Valley tech company (or several) on a private yacht.
>
> Why do I rant so splenetically about these issues? Because the result
> of such confusion is _bugs in programs_. You want something concrete?
> There it is. Data types protect you from screwing up. And the better
> your data types are, the more care you give to specifying what sorts of
> objects your program manipulates, the more thought you give to the
> invariants that must be maintained for your program to remain in a
> well-defined state, the fewer bugs you will have.
>
> But, nah, better to slap together a prototype, ship it, talk it up to
> the moon as your latest triumph while interviewing with a rival of the
> company you just delivered that prototype to, and look on in amusement
> when your brilliant achievement either proves disastrous in deployment
> or soaks up the waking hours of an entire team of your former colleagues
> cleaning up the steaming pile you voided from your rock star bowels.
>
> We've paid a heavy price for C's slow and seemingly deeply grudging
> embrace of the type concept. (The lack of controlled scope for
> enumeration constants is one example; the horrifyingly ill-conceived
> choice of "typedef" as a keyword indicating _type aliasing_ is another.)
> Kernighan did not help by trashing Pascal so hard in about 1980. He was
> dead right that Pascal needed, essentially, polymorphic subprograms in
> array types. Wirth not speccing the language to accommodate that back
> in 1973 or so was a sad mistake. But Pascal got a lot of other stuff
> right--stuff that the partisanship of C advocates refused to countenance
> such that they ended up celebrating C's flaws as features. No amount of
> Jonestown tea could quench their thirst. I suspect the truth was more
> that they didn't want to bother having to learn any other languages.
> (Or if they did, not any language that anyone else on their team at work
> had any facility with.) A rock star plays only one instrument, no?
> People didn't like it when Eddie Van Halen played keyboards instead of
> guitar on stage, so he stopped doing that. The less your coworkers
> understand your work, the more of a genius you must be.
>
> Now, where was I?
>
>> What's the length of "abc" vs. how many bytes are needed to store it?
>
> Even what is meant by "length" has several different correct answers!
> Quantity of code points in the sequence? Number of "grapheme clusters"
> a.k.a. "user-perceived characters" as Unicode puts it? Width as
> represented on the output device? On an ASCII device these usually had
> the same answer (control characters excepted). But even at the Bell
> Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't
> necessarily have to. (How wide is an em dash? How many bytes represent
> it, in the formatting language and in the output language?)
>
>> Giggle... In a device driver I wrote for V6, I used the expression
>>
>> "0123"[n]
>>
>> and the two programmers whom I thought were better than me had to ask
>> me what it did...
>>
>> -- Dave, brought up on PDP-11 Unix[*]
>
> I enjoy this application of that technique, courtesy of Alan Cox.
>
> fsck-fuzix: blow 90 bytes on a progress indicator
>
> static void progress(void)
> {
> static uint8_t progct;
> progct++;
> progct&=3;
> printf("%c\010", "-\\|/"[progct]);
> fflush(stdout);
> }
>
>> I still remember the days of BOS/PICK/etc, and I staked my career on
>> Unix.
>
> Not a bad choice. Your exposure to and recollection of other ways of
> doing things, I suspect, made you a more valuable contributor than those
> who mazed themselves with thoughts of "the Unix way" to the point that
> they never seriously considered any other.
>
> It's fine to prefer "the C way" or "the Unix way", if you can
> intelligibly define what that means as applied to the issue in dispute,
> and coherently defend it. Demonstrating an understanding of the
> alternatives, and being able to credibly explain why they are inferior
> approaches, is how to do advocacy correctly.
>
> But it is not the cowboy way. The rock star way.
>
> Regards,
> Branden
>
> [1] Unfortunately I must concede that this claim is less true than it
> used to be thanks to the relentless pursuit of trade-secret means of
> optimizing hardware performance. Assembly languages now correspond,
> particularly on x86, to a sort of macro language that imperfectly
> masks a massive amount of microarchitectural state that the
> implementors themselves don't completely understand, at least not in
> time to get the product to market. Hence the field day of
> speculative execution attacks and similar. It would not be fair to
> say that CPUs of old had _no_ microarchitectural state--the Z80, for
> example, had the not-completely-official `W` and `Z` registers--but
> they did have much less of it, and correspondingly less attack
> surface for screwing your programs. I do miss the days of
> deterministic cycle counts for instruction execution. But I know
> I'd be sad if all the caches on my workaday machine switched off.
>
> [2] https://queue.acm.org/detail.cfm?id=3212479
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-19 13:13 ` Rich Salz
2024-09-20 13:33 ` Paul Winalski
@ 2024-09-20 19:40 ` Leah Neukirchen
1 sibling, 0 replies; 17+ messages in thread
From: Leah Neukirchen @ 2024-09-20 19:40 UTC (permalink / raw)
To: Rich Salz; +Cc: Douglas McIlroy, TUHS main list
Rich Salz <rich.salz@gmail.com> writes:
>>
>> if there need to be negative references in array accesses (which certainly
>> makes sense to me, on its face), it seems reasonable to have whatever
>> intermediate variable be signed.
>>
>
> In my first C programming job I saw the source to V7 grep which had a
> "foo[-2]" construct. It was a moment of enlightenment and another bit of
> K&R fell into place. (
> https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/grep.c; search
> for "[-")
Now this thread already derailed into C undefined behavior semantics,
but nobody bothered to look at the actual code, which is perfectly fine:
if ((c = *sp++) != '*')
lastep = ep;
switch (c) {
...
case '[':
...
neg = 0;
if((c = *sp++) == '^') {
neg = 1;
c = *sp++;
}
cstart = sp;
do {
...
if (c=='-' && sp>cstart && *sp!=']') {
for (c = sp[-2]; c<*sp; c++)
ep[c>>3] |= bittab[c&07];
sp++;
}
ep[c>>3] |= bittab[c&07];
} while((c = *sp++) != ']');
Since sp has been incremented twice already, accessing sp[-2] is fine
in any case, but it's also guarded by cstart, so the regexp range
"[-z]" doesn't expand to [[-z].
--
Leah Neukirchen <leah@vuxu.org> https://leahneukirchen.org/
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 15:07 ` Dave Horsfall
` (2 preceding siblings ...)
2024-09-20 16:14 ` Dan Cross
@ 2024-09-20 17:11 ` G. Branden Robinson
2024-09-20 20:16 ` Bakul Shah via TUHS
3 siblings, 1 reply; 17+ messages in thread
From: G. Branden Robinson @ 2024-09-20 17:11 UTC (permalink / raw)
To: The Eunuchs Hysterical Society
[-- Attachment #1: Type: text/plain, Size: 9598 bytes --]
At 2024-09-21T01:07:11+1000, Dave Horsfall wrote:
> Unless I'm mistaken (quite possible at my age), the OP was referring
> to that in C, pointers and arrays are pretty much the same thing i.e.
> "foo[-2]" means "take the pointer 'foo' and go back two things"
> (whatever a "thing" is).
"in C, pointers and arrays are pretty much the same thing" is a common
utterance but misleading, and in my opinion, better replaced with a
different one.
We should instead say something more like:
In C, pointers and arrays have compatible dereference syntaxes.
They do _not_ have compatible _declaration_ syntaxes.
Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_
(1994) tackles this issue head-on and at length.
Here's the salient point.
"Consider the case of an external declaration `extern char *p;` but a
definition of `char p[10];`. When we retrieve the contents of `p[i]`
using the extern, we get characters, but we treat it as a pointer.
Interpreting ASCII characters as an address is garbage, and if you're
lucky the program will coredump at that point. If you're not lucky it
will corrupt something in your address space, causing a mysterious
failure at some point later in the program."
> C is just a high level assembly language;
I disagree with this common claim too. Assembly languages correspond to
well-defined machine models.[1] Those machine models have memory
models. C has no memory model--deliberately, because that would have
gotten in the way of performance. (In practice, C's machine model was
and remains the PDP-11,[2] with aspects thereof progressively sanded off
over the years in repeated efforts to salvage the language's reputation
for portability.)
> there is no such object as a "string" for example: it's just an "array
> of char" with the last element being "\0" (viz: "strlen" vs. "sizeof".
Yeah, it turns out we need a well-defined string type much more
powerfully than, it seems, anyone at the Bell Labs CSRC appreciated.
string.h was tacked on (by Nils-Peter Nelson, as I understand it) at the
end of the 1970s and C aficionados have defended the language's
purported perfection with such vigor that they annexed the haphazardly
assembled standard library into the territory that they defend with much
rhetorical violence and overstatement. From useless or redundant return
values to const-carelessness to Schlemiel the Painter algorithms in
implementations, it seems we've collectively made every mistake that
could be made with Nelson's original, minimal API, and taught those
mistakes as best practices in tutorials and classrooms. A sorry affair.
So deep was this disdain for the string as a well-defined data type, and
moreover one conceptually distinct from an array (or vector) of integral
types that Stroustrup initially repeated the mistake in C++. People can
easily roll their own, he seemed to have thought. Eventually he thought
again, but C++ took so long to get standardized that by then, damage was
done.
"A string is just an array of `char`s, and a `char` is just a
byte"--another hasty equivalence that surrendered a priceless hostage to
fortune. This is the sort of fallacy indulged by people excessively
wedded to machine language programming and who apply its perspective to
every problem statement uncritically.
Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow"
characters, and "base" vs. "combining" characters, the champions of the
"portable assembly" paradigm charged like Lord Cardigan into the pike
and musket lines of the character type as one might envision it in a
machine register. (This insistence on visualizing register-level
representations has prompted numerous other stupidities, like the use of
an integral zero at the _language level_ to represent empty, null, or
false literals for as many different data types as possible. "If it
ends up as a zero in a register," the thinking appears to have gone, "it
should look like a zero in the source code." Generations of code--and
language--cowboys have screwed us all over repeatedly with this hasty
equivalence.
Type theorists have known better for decades. But type theory is (1)
hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy
day in the sun (for which we may be grateful), which means that is
seldom on the path one anticipates to a comfortable retirement from a
Silicon Valley tech company (or several) on a private yacht.
Why do I rant so splenetically about these issues? Because the result
of such confusion is _bugs in programs_. You want something concrete?
There it is. Data types protect you from screwing up. And the better
your data types are, the more care you give to specifying what sorts of
objects your program manipulates, the more thought you give to the
invariants that must be maintained for your program to remain in a
well-defined state, the fewer bugs you will have.
But, nah, better to slap together a prototype, ship it, talk it up to
the moon as your latest triumph while interviewing with a rival of the
company you just delivered that prototype to, and look on in amusement
when your brilliant achievement either proves disastrous in deployment
or soaks up the waking hours of an entire team of your former colleagues
cleaning up the steaming pile you voided from your rock star bowels.
We've paid a heavy price for C's slow and seemingly deeply grudging
embrace of the type concept. (The lack of controlled scope for
enumeration constants is one example; the horrifyingly ill-conceived
choice of "typedef" as a keyword indicating _type aliasing_ is another.)
Kernighan did not help by trashing Pascal so hard in about 1980. He was
dead right that Pascal needed, essentially, polymorphic subprograms in
array types. Wirth not speccing the language to accommodate that back
in 1973 or so was a sad mistake. But Pascal got a lot of other stuff
right--stuff that the partisanship of C advocates refused to countenance
such that they ended up celebrating C's flaws as features. No amount of
Jonestown tea could quench their thirst. I suspect the truth was more
that they didn't want to bother having to learn any other languages.
(Or if they did, not any language that anyone else on their team at work
had any facility with.) A rock star plays only one instrument, no?
People didn't like it when Eddie Van Halen played keyboards instead of
guitar on stage, so he stopped doing that. The less your coworkers
understand your work, the more of a genius you must be.
Now, where was I?
> What's the length of "abc" vs. how many bytes are needed to store it?
Even what is meant by "length" has several different correct answers!
Quantity of code points in the sequence? Number of "grapheme clusters"
a.k.a. "user-perceived characters" as Unicode puts it? Width as
represented on the output device? On an ASCII device these usually had
the same answer (control characters excepted). But even at the Bell
Labs CSRC in the 1970s, thanks to troff, the staff knew that they didn't
necessarily have to. (How wide is an em dash? How many bytes represent
it, in the formatting language and in the output language?)
> Giggle... In a device driver I wrote for V6, I used the expression
>
> "0123"[n]
>
> and the two programmers whom I thought were better than me had to ask
> me what it did...
>
> -- Dave, brought up on PDP-11 Unix[*]
I enjoy this application of that technique, courtesy of Alan Cox.
fsck-fuzix: blow 90 bytes on a progress indicator
static void progress(void)
{
static uint8_t progct;
progct++;
progct&=3;
printf("%c\010", "-\\|/"[progct]);
fflush(stdout);
}
> I still remember the days of BOS/PICK/etc, and I staked my career on
> Unix.
Not a bad choice. Your exposure to and recollection of other ways of
doing things, I suspect, made you a more valuable contributor than those
who mazed themselves with thoughts of "the Unix way" to the point that
they never seriously considered any other.
It's fine to prefer "the C way" or "the Unix way", if you can
intelligibly define what that means as applied to the issue in dispute,
and coherently defend it. Demonstrating an understanding of the
alternatives, and being able to credibly explain why they are inferior
approaches, is how to do advocacy correctly.
But it is not the cowboy way. The rock star way.
Regards,
Branden
[1] Unfortunately I must concede that this claim is less true than it
used to be thanks to the relentless pursuit of trade-secret means of
optimizing hardware performance. Assembly languages now correspond,
particularly on x86, to a sort of macro language that imperfectly
masks a massive amount of microarchitectural state that the
implementors themselves don't completely understand, at least not in
time to get the product to market. Hence the field day of
speculative execution attacks and similar. It would not be fair to
say that CPUs of old had _no_ microarchitectural state--the Z80, for
example, had the not-completely-official `W` and `Z` registers--but
they did have much less of it, and correspondingly less attack
surface for screwing your programs. I do miss the days of
deterministic cycle counts for instruction execution. But I know
I'd be sad if all the caches on my workaday machine switched off.
[2] https://queue.acm.org/detail.cfm?id=3212479
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 15:07 ` Dave Horsfall
2024-09-20 15:30 ` Larry McVoy
2024-09-20 15:56 ` Stuff Received
@ 2024-09-20 16:14 ` Dan Cross
2024-09-20 17:11 ` G. Branden Robinson
3 siblings, 0 replies; 17+ messages in thread
From: Dan Cross @ 2024-09-20 16:14 UTC (permalink / raw)
To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society
On Fri, Sep 20, 2024 at 11:17 AM Dave Horsfall <dave@horsfall.org> wrote:
> On Fri, 20 Sep 2024, Paul Winalski wrote:
> > On Thu, Sep 19, 2024 at 7:52 PM Rich Salz <rich.salz@gmail.com> wrote:
> > In my first C programming job I saw the source to V7 grep which
> > had a "foo[-2]" construct.
> >
> > That sort of thing is very dangerous with modern compilers. Does K&R C
> > require that variables be allocated in the order that they are declared? If
> > not, you're playing with fire. To get decent performance out of modern
> > processors, the compiler must perform data placement to maximize cache
> > efficiency, and that practically guarantees that you can't rely on
> > out-of-bounds array references.
>
> [...]
>
> Unless I'm mistaken (quite possible at my age), the OP was referring to
> that in C, pointers and arrays are pretty much the same thing i.e.
> "foo[-2]" means "take the pointer 'foo' and go back two things" (whatever
> a "thing" is).
Where I've usually seen this idiom is in things like:
char foo[10];
char *p = foo + 5;
p[-2] = 'a'; /* set foo[3] to 'a' */
But as Paul pointed out, a) this relies on aliasing the bytes in
`foo`, and b) is UB if the (negative) index falls below the beginning
of the underlying object (e.g., the array `foo`).
> C is just a high level assembly language; there is no such object as a
> "string" for example: it's just an "array of char" with the last element
> being "\0" (viz: "strlen" vs. "sizeof".
Sadly, this hasn't been true for a few decades; arguably since
optimizing compilers for C started to become common in the 70s. Trying
to treat C as a high-level macro assembler is dangerous, as Paul
pointed out, even though a lot of us feel like we can "see" the
assembly that a line of C code will likely emit. While in many cases
we are probably right (or close to it), C _compiler writers_ don't
think in those terms, but rather think in terms of operations
targeting the abstract virtual machine loosely described by the
language standard. Caveat emptor, there be dragons.
> What's the length of "abc" vs. how many bytes are needed to store it?
>
> > Things were much simpler when V7 was written.
>
> Giggle... In a device driver I wrote for V6, I used the expression
>
> "0123"[n]
>
> and the two programmers whom I thought were better than me had to ask me
> what it did...
Fortunately, this is still legal. :-)
- Dan C.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 15:07 ` Dave Horsfall
2024-09-20 15:30 ` Larry McVoy
@ 2024-09-20 15:56 ` Stuff Received
2024-09-20 16:14 ` Dan Cross
2024-09-20 17:11 ` G. Branden Robinson
3 siblings, 0 replies; 17+ messages in thread
From: Stuff Received @ 2024-09-20 15:56 UTC (permalink / raw)
To: COFF
Moved to COFF.
On 2024-09-20 11:07, Dave Horsfall wrote (in part):
>
> Giggle... In a device driver I wrote for V6, I used the expression
>
> "0123"[n]
>
> and the two programmers whom I thought were better than me had to ask me
> what it did...
>
> -- Dave, brought up on PDP-11 Unix[*]
>
> [*]
> I still remember the days of BOS/PICK/etc, and I staked my career on Unix.
Working on embedded systems, we often used constructs such as a[-4] to
either read or modify stuff on the stack (for that particular
compiler+processor only).
S.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 15:07 ` Dave Horsfall
@ 2024-09-20 15:30 ` Larry McVoy
2024-09-20 15:56 ` Stuff Received
` (2 subsequent siblings)
3 siblings, 0 replies; 17+ messages in thread
From: Larry McVoy @ 2024-09-20 15:30 UTC (permalink / raw)
To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society
On Sat, Sep 21, 2024 at 01:07:11AM +1000, Dave Horsfall wrote:
> On Fri, 20 Sep 2024, Paul Winalski wrote:
>
> > On Thu, Sep 19, 2024 at 7:52???PM Rich Salz <rich.salz@gmail.com> wrote:
> >
> > In my first C programming job I saw the source to V7 grep which
> > had a "foo[-2]" construct.
> >
> > That sort of thing is very dangerous with modern compilers.?? Does K&R C
> > require that variables be allocated in the order that they are declared??? If
> > not, you're playing with fire.?? To get decent performance out of modern
> > processors, the compiler must perform data placement to maximize cache
> > efficiency, and that practically guarantees that you can't rely on
> > out-of-bounds array references.
>
> [...]
>
> Unless I'm mistaken (quite possible at my age), the OP was referring to
> that in C, pointers and arrays are pretty much the same thing i.e.
> "foo[-2]" means "take the pointer 'foo' and go back two things" (whatever
> a "thing" is).
Yes, but that was a stack variable. Let me see if I can say it more clearly.
foo()
{
int a = 1, b = 2;
int alias[5];
alias[-2] = 0; // try and set a to 0.
}
In v7 days, the stack would look like
[stuff]
[2 bytes for a]
[2 bytes for b]
[2 bytes for the alias address, which I think points forward]
[10 bytes for alias contents]
I'm hazy on how the space for alias[] is allocated, so I made that up. It's
probably something like I said but Paul (or someone) will correct me.
When using a negative index for alias[], the coder is assuming that the stack
variables are placed in the order they were declared. Paul tried to explain
that _might_ be true but is not always true. Modern compilers will look see
which variables are used the most in the function, and place them next to
each other so that if you have the cache line for one heavily used variable,
the other one is right there next to it. Like so:
int heavy1 = 1;
int rarely1 = 2;
int spacer[10];
int heavy2 = 3;
int rarel2 = 4;
The compiler might figure out that heavy{1,2} are used a lot and lay out the
stack like so:
[2 bytes (or 4 or 8 these days) for heavy1]
[bytes for heavy2]
[bytes for rarely1]
[bytes for spacer[10]]
[bytes for rarely2]
Paul was saying that using a negative index in the array creates an alias,
or another name, for the scalar integer on the stack (his description made
me understand, for the first time in decades, why compiler writers hate
aliases and I get it now). Aliases mess hard with optimizers. Optimizers
may reorder the stack for better cache line usage and what you think
array[-2] means doesn't work any more unless the optimizer catches that
you made an alias and preserves it.
Paul, how did I do? I'm not a compiler guy, just had to learn enough to
walk the stack when the kernel panics.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 13:33 ` Paul Winalski
2024-09-20 15:07 ` Dave Horsfall
@ 2024-09-20 15:26 ` Rich Salz
1 sibling, 0 replies; 17+ messages in thread
From: Rich Salz @ 2024-09-20 15:26 UTC (permalink / raw)
To: Paul Winalski; +Cc: Douglas McIlroy, TUHS main list
[-- Attachment #1: Type: text/plain, Size: 261 bytes --]
>
> Unless "foo" were a pointer that the programmer explicitly pointed to the
> inside of a larger data structure.
>
It was that. Go look at the source (I included the link) if you want.
This was in the context of a sub-thread about array indices, after all.
[-- Attachment #2: Type: text/html, Size: 549 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-20 13:33 ` Paul Winalski
@ 2024-09-20 15:07 ` Dave Horsfall
2024-09-20 15:30 ` Larry McVoy
` (3 more replies)
2024-09-20 15:26 ` Rich Salz
1 sibling, 4 replies; 17+ messages in thread
From: Dave Horsfall @ 2024-09-20 15:07 UTC (permalink / raw)
To: The Eunuchs Hysterical Society
[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]
On Fri, 20 Sep 2024, Paul Winalski wrote:
> On Thu, Sep 19, 2024 at 7:52 PM Rich Salz <rich.salz@gmail.com> wrote:
>
> In my first C programming job I saw the source to V7 grep which
> had a "foo[-2]" construct.
>
> That sort of thing is very dangerous with modern compilers. Does K&R C
> require that variables be allocated in the order that they are declared? If
> not, you're playing with fire. To get decent performance out of modern
> processors, the compiler must perform data placement to maximize cache
> efficiency, and that practically guarantees that you can't rely on
> out-of-bounds array references.
[...]
Unless I'm mistaken (quite possible at my age), the OP was referring to
that in C, pointers and arrays are pretty much the same thing i.e.
"foo[-2]" means "take the pointer 'foo' and go back two things" (whatever
a "thing" is).
C is just a high level assembly language; there is no such object as a
"string" for example: it's just an "array of char" with the last element
being "\0" (viz: "strlen" vs. "sizeof".
What's the length of "abc" vs. how many bytes are needed to store it?
> Things were much simpler when V7 was written.
Giggle... In a device driver I wrote for V6, I used the expression
"0123"[n]
and the two programmers whom I thought were better than me had to ask me
what it did...
-- Dave, brought up on PDP-11 Unix[*]
[*]
I still remember the days of BOS/PICK/etc, and I staked my career on Unix.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-19 13:13 ` Rich Salz
@ 2024-09-20 13:33 ` Paul Winalski
2024-09-20 15:07 ` Dave Horsfall
2024-09-20 15:26 ` Rich Salz
2024-09-20 19:40 ` Leah Neukirchen
1 sibling, 2 replies; 17+ messages in thread
From: Paul Winalski @ 2024-09-20 13:33 UTC (permalink / raw)
To: Rich Salz; +Cc: Douglas McIlroy, TUHS main list
[-- Attachment #1: Type: text/plain, Size: 1018 bytes --]
On Thu, Sep 19, 2024 at 7:52 PM Rich Salz <rich.salz@gmail.com> wrote:
>
> In my first C programming job I saw the source to V7 grep which had a
> "foo[-2]" construct.
>
That sort of thing is very dangerous with modern compilers. Does K&R C
require that variables be allocated in the order that they are declared?
If not, you're playing with fire. To get decent performance out of modern
processors, the compiler must perform data placement to maximize cache
efficiency, and that practically guarantees that you can't rely on
out-of-bounds array references.
Unless "foo" were a pointer that the programmer explicitly pointed to the
inside of a larger data structure. In that case you could guarantee that
the construct would work reliably. But by pointing into the middle of
another data structure you've created a data aliasing situation, and that
complicates compiler data flow analysis and can block important
optimizations.
Things were much simpler when V7 was written.
-Paul W.
[-- Attachment #2: Type: text/html, Size: 1457 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-18 23:57 ` Henry Bent
@ 2024-09-19 13:13 ` Rich Salz
2024-09-20 13:33 ` Paul Winalski
2024-09-20 19:40 ` Leah Neukirchen
0 siblings, 2 replies; 17+ messages in thread
From: Rich Salz @ 2024-09-19 13:13 UTC (permalink / raw)
To: Henry Bent; +Cc: Douglas McIlroy, TUHS main list
[-- Attachment #1: Type: text/plain, Size: 439 bytes --]
>
> if there need to be negative references in array accesses (which certainly
> makes sense to me, on its face), it seems reasonable to have whatever
> intermediate variable be signed.
>
In my first C programming job I saw the source to V7 grep which had a
"foo[-2]" construct. It was a moment of enlightenment and another bit of
K&R fell into place. (
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/grep.c; search
for "[-")
[-- Attachment #2: Type: text/html, Size: 823 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
2024-09-18 23:51 Douglas McIlroy
@ 2024-09-18 23:57 ` Henry Bent
2024-09-19 13:13 ` Rich Salz
0 siblings, 1 reply; 17+ messages in thread
From: Henry Bent @ 2024-09-18 23:57 UTC (permalink / raw)
To: Douglas McIlroy; +Cc: TUHS main list
[-- Attachment #1: Type: text/plain, Size: 855 bytes --]
On Wed, 18 Sept 2024 at 19:51, Douglas McIlroy <
douglas.mcilroy@dartmouth.edu> wrote:
> > The array size} limit that I found through trial and error is (2^15)-1.
> > Declaring an array that is [larger] results in an error of "Constant
> required",
>
> On its face, it states that anything bigger cannot be an integer constant,
> which is reasonable because that's the largest (signed) integer value. Does
> that version of C support unsigned constants?
>
I believe that it does support (16 bit) unsigned int, but I don't think
that it supports (32 bit) unsigned long, only signed long. That's a great
suggestion of a place to start. Following Nelson's suggestion, if there
need to be negative references in array accesses (which certainly makes
sense to me, on its face), it seems reasonable to have whatever
intermediate variable be signed.
-Henry
[-- Attachment #2: Type: text/html, Size: 1273 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* [TUHS] Re: Maximum Array Sizes in 16 bit C
@ 2024-09-18 23:51 Douglas McIlroy
2024-09-18 23:57 ` Henry Bent
0 siblings, 1 reply; 17+ messages in thread
From: Douglas McIlroy @ 2024-09-18 23:51 UTC (permalink / raw)
To: TUHS main list, henry.r.bent
[-- Attachment #1: Type: text/plain, Size: 360 bytes --]
> The array size} limit that I found through trial and error is (2^15)-1.
> Declaring an array that is [larger] results in an error of "Constant
required",
On its face, it states that anything bigger cannot be an integer constant,
which is reasonable because that's the largest (signed) integer value. Does
that version of C support unsigned constants?
Doug
[-- Attachment #2: Type: text/html, Size: 490 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-09-20 22:19 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-20 15:24 [TUHS] Re: Maximum Array Sizes in 16 bit C Douglas McIlroy
-- strict thread matches above, loose matches on Subject: below --
2024-09-18 23:51 Douglas McIlroy
2024-09-18 23:57 ` Henry Bent
2024-09-19 13:13 ` Rich Salz
2024-09-20 13:33 ` Paul Winalski
2024-09-20 15:07 ` Dave Horsfall
2024-09-20 15:30 ` Larry McVoy
2024-09-20 15:56 ` Stuff Received
2024-09-20 16:14 ` Dan Cross
2024-09-20 17:11 ` G. Branden Robinson
2024-09-20 20:16 ` Bakul Shah via TUHS
2024-09-20 20:58 ` Warner Losh
2024-09-20 21:18 ` Rob Pike
2024-09-20 22:04 ` Bakul Shah via TUHS
2024-09-20 22:19 ` G. Branden Robinson
2024-09-20 15:26 ` Rich Salz
2024-09-20 19:40 ` Leah Neukirchen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).