From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [50.116.15.146]) by inbox.vuxu.org (Postfix) with ESMTP id 7E75D23974 for ; Fri, 20 Sep 2024 22:16:45 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 93FC7437EF; Sat, 21 Sep 2024 06:16:41 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tuhs.org; s=dkim; t=1726863401; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post; bh=mBnCiFfEdaeM/Rozs8emBo8w++4so0/dc6Xm0GlnQz8=; b=LlWdkCw+ywdDErUsn9iK2DbrixnPmlg8CuNTGD/537Xoi1p1nZTVgu4aGBBKLBZi4TrOm/ /em283Y8WxtfI1FXJk5uUKU9KeFWCM8m1wkPNOU5hW9oo1ueea6JaEZXt+VyhW7oQ9q5Jf tEsx2M0EyJ33C2BVkoxnSOFRfxDPzQU= Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by minnie.tuhs.org (Postfix) with ESMTPS id 58DF7437EE for ; Sat, 21 Sep 2024 06:16:37 +1000 (AEST) Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-2d871bd95ffso392072a91.2 for ; Fri, 20 Sep 2024 13:16:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iitbombay-org.20230601.gappssmtp.com; s=20230601; t=1726863397; x=1727468197; darn=tuhs.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mBnCiFfEdaeM/Rozs8emBo8w++4so0/dc6Xm0GlnQz8=; b=AkGH7YC3WIkD7TXWX+iMlXBMqXDUQ4lNv68Oj7dfFdokZKejf6tM8B/cFpH7LP+sHk aTf/J8BlJpyXlnmM9iEA+kSvX71MehWiL/kY6e9JdzdHVFwqzeJQb+/OiA14bcwCMt0L 18WUlVYrB3TSFOIsReQAgpK18HcEOm66H/zK69Ml234LfKqGYXJIz/qcF4PL/ESito3q PqJNbMO/tAzUIodUKYwY0lZQXuujOiCLlSciAcbzYlINiyrhqNUlzjic70dOMKMWg3FA NRQODqPzPd1VfZ0pP79qNaFX98VFz/nDOuPltQHro3V+6JLK1kH/O0mbNKH8gfBN84Wn nN7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726863397; x=1727468197; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mBnCiFfEdaeM/Rozs8emBo8w++4so0/dc6Xm0GlnQz8=; b=vlg4CSpmdB5H6bebgtH95t7SH2km9GKz+i9ZporJCvfBxeAXyDdam7yMMMi0sAsPpK QQeJgaO6jabz5LAtM2g8ibCKUhEXhvZW80BE2FbnFFbsxfC9DCO/fjdRExKD2h7i0pKm 7AEIBLtsM4pugY2InBFMmLBCthgiIqoYK6FRxRtXkVcz6UZ+wcx+9L2fRTxpfN8NpjRD sHy9Klq3wUmCiNmpHVzozTz5JnDPcxsXSuWAYKw73P3/UEiQnT6aYjg16X4opTNHX5kl SeEr9NPXjpsRcS56N1SvU/3VfLf1xjHpvWJNgNqFWUybzsNUfpH/Pn2fTx0s3PS6PEpV 2wMg== X-Gm-Message-State: AOJu0Yxt99vnrDsLlWVkXb5hvENE+Nr2ll6fZ2rLU7hHt8oLHGQq8NOH cBvosSXDw+SzZ4yAgn5hLbpCZvUlH5QqXDrA1H6j/p66eybaLgVxLGDAQKoMr5H8O7v7Ei1rNfE = X-Google-Smtp-Source: AGHT+IHasyH3gU1X8WVLkFxDEdK0js7K0Q7V5LPfBqT8lU/S82bmnVfTeTWfw1/Q6mkf30bQt4vRtQ== X-Received: by 2002:a17:902:c405:b0:207:428d:ecc0 with SMTP id d9443c01a7336-208d83d5747mr24391945ad.8.1726863396459; Fri, 20 Sep 2024 13:16:36 -0700 (PDT) Received: from smtpclient.apple (107-215-223-229.lightspeed.sntcca.sbcglobal.net. [107.215.223.229]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-207945da8a3sm98300405ad.56.2024.09.20.13.16.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 20 Sep 2024 13:16:35 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3818.100.11.1.3\)) In-Reply-To: <20240920171126.vgtl23xwj37kardb@illithid> Date: Fri, 20 Sep 2024 13:16:24 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <69643008-F7FE-4AC7-8519-B45E4C1CEA66@iitbombay.org> References: <11d46ab4-b90c-83fe-131a-ee399eebf342@horsfall.org> <20240920171126.vgtl23xwj37kardb@illithid> To: "G. Branden Robinson" X-Mailer: Apple Mail (2.3818.100.11.1.3) Message-ID-Hash: QMBISFNPOBMERZLSJYT44WQYWBBSUZOD X-Message-ID-Hash: QMBISFNPOBMERZLSJYT44WQYWBBSUZOD X-MailFrom: bakul@iitbombay.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tuhs.tuhs.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: The Eunuchs Hysterical Society X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: Maximum Array Sizes in 16 bit C List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Bakul Shah via TUHS Reply-To: Bakul Shah X-Spam: Yes You are a bit late with your screed. You will find posts with similar sentiments starting back in 1980s in Usenet groups such as comp.lang.{c,misc,pascal}. Perhaps a more interesting (but likely pointless) question is what is the *least* that can be done to fix C's major problems. Compilers can easily add bounds checking for the array[index] construct but ptr[index] can not be checked, unless we make a ptr a heavy weight object such as (address, start, limit). One can see how code can be generated for code such as this: Foo x[count]; Foo* p =3D x + n; // or &x[n] Code such as "Foo *p =3D malloc(size);" would require the compiler to know how malloc behaves to be able to compute the limit. But for a user to write a similar function will require some language extension. [Of course, if we did that, adding proper support for multidimensional slices would be far easier. But that is an exploration for another day!] Converting enums to behave like Pascal scalars would likely break things. The question is, can such breakage be fixed automatically (by source code conversion)? C's union type is used in two different ways: 1: similar to a sum type, which can be done type safely and 2: to cheat. The compiler should produce a warning when it can't verify a typesafe use -- one can add "unsafe" or some such to let the user absolve the compiler of such check. [May be naively] I tend to think one can evolve C this way and fix a lot of code &/or make a lot of bugs more explicit. > On Sep 20, 2024, at 10:11=E2=80=AFAM, G. Branden Robinson = wrote: >=20 > At 2024-09-21T01:07:11+1000, Dave Horsfall wrote: >> Unless I'm mistaken (quite possible at my age), the OP was referring >> to that in C, pointers and arrays are pretty much the same thing i.e. >> "foo[-2]" means "take the pointer 'foo' and go back two things" >> (whatever a "thing" is). >=20 > "in C, pointers and arrays are pretty much the same thing" is a common > utterance but misleading, and in my opinion, better replaced with a > different one. >=20 > We should instead say something more like: >=20 > In C, pointers and arrays have compatible dereference syntaxes. >=20 > They do _not_ have compatible _declaration_ syntaxes. >=20 > Chapter 4 of van der Linden's _Expert C Programming_: Deep C Secrets_ > (1994) tackles this issue head-on and at length. >=20 > Here's the salient point. >=20 > "Consider the case of an external declaration `extern char *p;` but a > definition of `char p[10];`. When we retrieve the contents of `p[i]` > using the extern, we get characters, but we treat it as a pointer. > Interpreting ASCII characters as an address is garbage, and if you're > lucky the program will coredump at that point. If you're not lucky it > will corrupt something in your address space, causing a mysterious > failure at some point later in the program." >=20 >> C is just a high level assembly language; >=20 > I disagree with this common claim too. Assembly languages correspond = to > well-defined machine models.[1] Those machine models have memory > models. C has no memory model--deliberately, because that would have > gotten in the way of performance. (In practice, C's machine model was > and remains the PDP-11,[2] with aspects thereof progressively sanded = off > over the years in repeated efforts to salvage the language's = reputation > for portability.) >=20 >> there is no such object as a "string" for example: it's just an = "array >> of char" with the last element being "\0" (viz: "strlen" vs. = "sizeof". >=20 > Yeah, it turns out we need a well-defined string type much more > powerfully than, it seems, anyone at the Bell Labs CSRC appreciated. > string.h was tacked on (by Nils-Peter Nelson, as I understand it) at = the > end of the 1970s and C aficionados have defended the language's > purported perfection with such vigor that they annexed the haphazardly > assembled standard library into the territory that they defend with = much > rhetorical violence and overstatement. =46rom useless or redundant = return > values to const-carelessness to Schlemiel the Painter algorithms in > implementations, it seems we've collectively made every mistake that > could be made with Nelson's original, minimal API, and taught those > mistakes as best practices in tutorials and classrooms. A sorry = affair. >=20 > So deep was this disdain for the string as a well-defined data type, = and > moreover one conceptually distinct from an array (or vector) of = integral > types that Stroustrup initially repeated the mistake in C++. People = can > easily roll their own, he seemed to have thought. Eventually he = thought > again, but C++ took so long to get standardized that by then, damage = was > done. >=20 > "A string is just an array of `char`s, and a `char` is just a > byte"--another hasty equivalence that surrendered a priceless hostage = to > fortune. This is the sort of fallacy indulged by people excessively > wedded to machine language programming and who apply its perspective = to > every problem statement uncritically. >=20 > Again and again, with signed vs. unsigned bytes, "wide" vs. "narrow" > characters, and "base" vs. "combining" characters, the champions of = the > "portable assembly" paradigm charged like Lord Cardigan into the pike > and musket lines of the character type as one might envision it in a > machine register. (This insistence on visualizing register-level > representations has prompted numerous other stupidities, like the use = of > an integral zero at the _language level_ to represent empty, null, or > false literals for as many different data types as possible. "If it > ends up as a zero in a register," the thinking appears to have gone, = "it > should look like a zero in the source code." Generations of code--and > language--cowboys have screwed us all over repeatedly with this hasty > equivalence. >=20 > Type theorists have known better for decades. But type theory is (1) > hard (it certainly is, to cowboys) and (2) has never enjoyed a trendy > day in the sun (for which we may be grateful), which means that is > seldom on the path one anticipates to a comfortable retirement from a > Silicon Valley tech company (or several) on a private yacht. >=20 > Why do I rant so splenetically about these issues? Because the result > of such confusion is _bugs in programs_. You want something concrete? > There it is. Data types protect you from screwing up. And the better > your data types are, the more care you give to specifying what sorts = of > objects your program manipulates, the more thought you give to the > invariants that must be maintained for your program to remain in a > well-defined state, the fewer bugs you will have. >=20 > But, nah, better to slap together a prototype, ship it, talk it up to > the moon as your latest triumph while interviewing with a rival of the > company you just delivered that prototype to, and look on in amusement > when your brilliant achievement either proves disastrous in deployment > or soaks up the waking hours of an entire team of your former = colleagues > cleaning up the steaming pile you voided from your rock star bowels. >=20 > We've paid a heavy price for C's slow and seemingly deeply grudging > embrace of the type concept. (The lack of controlled scope for > enumeration constants is one example; the horrifyingly ill-conceived > choice of "typedef" as a keyword indicating _type aliasing_ is = another.) > Kernighan did not help by trashing Pascal so hard in about 1980. He = was > dead right that Pascal needed, essentially, polymorphic subprograms in > array types. Wirth not speccing the language to accommodate that back > in 1973 or so was a sad mistake. But Pascal got a lot of other stuff > right--stuff that the partisanship of C advocates refused to = countenance > such that they ended up celebrating C's flaws as features. No amount = of > Jonestown tea could quench their thirst. I suspect the truth was more > that they didn't want to bother having to learn any other languages. > (Or if they did, not any language that anyone else on their team at = work > had any facility with.) A rock star plays only one instrument, no? > People didn't like it when Eddie Van Halen played keyboards instead of > guitar on stage, so he stopped doing that. The less your coworkers > understand your work, the more of a genius you must be. >=20 > Now, where was I? >=20 >> What's the length of "abc" vs. how many bytes are needed to store it? >=20 > Even what is meant by "length" has several different correct answers! > Quantity of code points in the sequence? Number of "grapheme = clusters" > a.k.a. "user-perceived characters" as Unicode puts it? Width as > represented on the output device? On an ASCII device these usually = had > the same answer (control characters excepted). But even at the Bell > Labs CSRC in the 1970s, thanks to troff, the staff knew that they = didn't > necessarily have to. (How wide is an em dash? How many bytes = represent > it, in the formatting language and in the output language?) >=20 >> Giggle... In a device driver I wrote for V6, I used the expression >>=20 >> "0123"[n] >>=20 >> and the two programmers whom I thought were better than me had to ask >> me what it did... >>=20 >> -- Dave, brought up on PDP-11 Unix[*] >=20 > I enjoy this application of that technique, courtesy of Alan Cox. >=20 > fsck-fuzix: blow 90 bytes on a progress indicator >=20 > static void progress(void) > { > static uint8_t progct; > progct++; > progct&=3D3; > printf("%c\010", "-\\|/"[progct]); > fflush(stdout); > } >=20 >> I still remember the days of BOS/PICK/etc, and I staked my career on >> Unix. >=20 > Not a bad choice. Your exposure to and recollection of other ways of > doing things, I suspect, made you a more valuable contributor than = those > who mazed themselves with thoughts of "the Unix way" to the point that > they never seriously considered any other. >=20 > It's fine to prefer "the C way" or "the Unix way", if you can > intelligibly define what that means as applied to the issue in = dispute, > and coherently defend it. Demonstrating an understanding of the > alternatives, and being able to credibly explain why they are inferior > approaches, is how to do advocacy correctly. >=20 > But it is not the cowboy way. The rock star way. >=20 > Regards, > Branden >=20 > [1] Unfortunately I must concede that this claim is less true than it > used to be thanks to the relentless pursuit of trade-secret means = of > optimizing hardware performance. Assembly languages now = correspond, > particularly on x86, to a sort of macro language that imperfectly > masks a massive amount of microarchitectural state that the > implementors themselves don't completely understand, at least not = in > time to get the product to market. Hence the field day of > speculative execution attacks and similar. It would not be fair to > say that CPUs of old had _no_ microarchitectural state--the Z80, = for > example, had the not-completely-official `W` and `Z` registers--but > they did have much less of it, and correspondingly less attack > surface for screwing your programs. I do miss the days of > deterministic cycle counts for instruction execution. But I know > I'd be sad if all the caches on my workaday machine switched off. >=20 > [2] https://queue.acm.org/detail.cfm?id=3D3212479