From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [50.116.15.146]) by inbox.vuxu.org (Postfix) with ESMTP id 9C11523C86 for ; Sat, 21 Sep 2024 00:19:24 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id DA419429F5; Sat, 21 Sep 2024 08:19:19 +1000 (AEST) Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) by minnie.tuhs.org (Postfix) with ESMTPS id 60D9F429F4 for ; Sat, 21 Sep 2024 08:19:16 +1000 (AEST) Received: by mail-oi1-x231.google.com with SMTP id 5614622812f47-3e05a5f21afso1423266b6e.0 for ; Fri, 20 Sep 2024 15:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726870755; x=1727475555; darn=tuhs.org; h=in-reply-to:content-disposition:mime-version:message-id:subject:to :from:date:from:to:cc:subject:date:message-id:reply-to; bh=oL9w0o1fDY0hwTAUxQyslwzHZvvIfWFouRE98UNPtIs=; b=eIMx9B+uybRg8Oxd8KvsWexh6yij4F9ExGaGQ+DSaFr2HART2mAxuZVZaBPRzIk949 loXMY0NTCzuA422rmrHwggRSIyeNl/q8suwoKafBZycnLso2OBX4EK2Am/pgiZAzsxQQ WRELRtKTdHo1DBM0r3u/xeExctd120KWLBAd1eo4T9NyJPJevbUbsp6QI4+0JYLzjg0t wIbWNRQgcfLv/fMtxGmcDdB+VUQNZzWqdFN76tmcpr/Dh3+FOXXjG4gBzvAF0tlyZl+6 78rBctphyv5WS7GACIIuXnvO01O/W1BkVcw2QtiQx3+IvWy6qvEeRFTxEogh32DDDT2x aQ7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726870755; x=1727475555; h=in-reply-to:content-disposition:mime-version:message-id:subject:to :from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oL9w0o1fDY0hwTAUxQyslwzHZvvIfWFouRE98UNPtIs=; b=SumoUg5PZvGCpzKol1jyi2AmPGA1ZMDhhWeqY84zVLMHr2f7g3TtgPc/x42tiybQ7a 05us7kMMHl7ePpT3Io8jDYNHcc76onVrKyUZsd3zAkTiGHis5dIXxB0IYZHJS1YrOWNw jHGq0UY/PN8GtWCF/tpKSYnKdYIX/QBEOyYpNKd60yi5XDG6bS7CRDWgIqLqhS3+4Imc bwc/u2h8MZZS3MwtbfqCI1JcQrJ4MXUd19d7Y0+pZNymEg14et388p9Bn/rWrGDTyZVk N8Auh2IRSLRLi28XYrilkCc/kI0zetqWI68hJYHRpfhCqwuAbd6UB6AY5uWnuKP2fA06 w1TA== X-Gm-Message-State: AOJu0YyW3+uvhInJnZiPpIyChVH1xI9NgdY2yEwhUpnktB0COQuLbIEe kV66pBiFLAwvUm37uIHKSDHHceUkPQQNt72wGQGzUnGEh1G961b/2U+mRw== X-Google-Smtp-Source: AGHT+IH40thG1aKRBeMhjwPIhNzzaYBPWvtCAon5OmsbI9CURxMBuPfvdKzQXDD4A/POuZ5Q3sU0+A== X-Received: by 2002:a05:6830:6a93:b0:711:3ed:422a with SMTP id 46e09a7af769-7139248550emr4138027a34.32.1726870754654; Fri, 20 Sep 2024 15:19:14 -0700 (PDT) Received: from illithid ([2600:1700:957d:1d70::49]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-71389b704dfsm913342a34.43.2024.09.20.15.19.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Sep 2024 15:19:14 -0700 (PDT) Date: Fri, 20 Sep 2024 17:19:12 -0500 From: "G. Branden Robinson" To: The Eunuchs Hysterical Society Message-ID: <20240920221912.o7uuxfnonrr2jbht@illithid> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="qc3pycln73gxbmm6" Content-Disposition: inline In-Reply-To: <69643008-F7FE-4AC7-8519-B45E4C1CEA66@iitbombay.org> Message-ID-Hash: CIRYZTCAPVWRJPZCMFKQQW26E7Y3E3SG X-Message-ID-Hash: CIRYZTCAPVWRJPZCMFKQQW26E7Y3E3SG X-MailFrom: g.branden.robinson@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tuhs.tuhs.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: Maximum Array Sizes in 16 bit C List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --qc3pycln73gxbmm6 Content-Type: text/plain; protected-headers=v1; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: [TUHS] Maximum Array Sizes in 16 bit C MIME-Version: 1.0 Hi Bakul & Warner, At 2024-09-20T13:16:24-0700, Bakul Shah wrote: > You are a bit late with your screed. =2E..I had hoped that my awareness of that was made evident by my citation of a 30-year-old book. ;-) > You will find posts with similar sentiments starting back in 1980s in > Usenet groups such as comp.lang.{c,misc,pascal}. Before my time, but I don't doubt it. The sad thing is that not enough people took such posts seriously. I spend a fair amount of time dealing with "legacy" code. Stuff that hasn't been touched in a long time. One thing I'm convinced of: bad idioms are forever. And that means people will keep learning and copying them. Of course no one wants to pay for the cleanup of such technical debt, not in spite of but _because_ it will expose bugs. You can't justify to any manager that we need to set up this one cost center so that we can expand another one. Not unless the manager cares about downside risk. And tech culture absolutely does not. Let the planes fall out of the sky and the reactors melt down. You can justify it all in the name of "ethical altruism", or whatever the trendy label for sociopathic anarcho- capitalism is these days. (I'm kidding, of course. Serious tech bros understand the essential function of government in maintaining structures for the allocation of economic rents [copyrights and patents] and the utility of employment law, police, and if it comes to it, the National Guard in the suppression of organized labor. Fortunately for management, software engineers think so highly of themselves that they identify with the billionaire CEO's economic class instead of their own.) > Perhaps a more interesting (but likely pointless) question is what is > the *least* that can be done to fix C's major problems. Not pointless. If we ask ourselves that question after every revision of the language standard, the language _will_ advance. C23 has a `nullptr` constant. K&R-style function declarations are gone, and good riddance. I did notice that some national bodies fought like hell to keep trigraphs, though. :-| > Compilers can easily add bounds checking for the array[index] Pascal expected this. One of Kernighan's complaints in his CSTR #100 paper (the one I mentioned) is that he feared precious machine cycles would be lost validating expressions that pointed within valid bounds. So why not a compiler switch, jeez louise? Develop in paranoid/slow mode and ship in sloppy/fast mode. If you must. It seems that static analysis was in its infancy back then. Compiler writers screeched like banshees at the forms of validation the Ada language spec required them to do, and complained so vociferously that they helped trash the language's reputation. A few years went by and, gosh, folks realized that you sure could prevent a lot of bugs by wiring such checks into compilers for other languages--in the places where the semantics would permit it, a count that was invariably lower than Ada's because, shock, Ada was actually thought out and went through several revisions _before_ being put into production. Did anyone ever repent of their loathsome shrieking? Doubt it. Static analysis became cool and they accepted whatever plaudits fell upon them. > construct but ptr[index] can not be checked, unless we make > a ptr a heavy weight object such as (address, start, limit). Base and bounds registers are an old idea. Older than C. But the PDP-11 didn't have them,[1] so C expected to do without and the rest is lamentable history. We would do well to learn from C++'s multiple attempts at "smart pointers". I guess they've got it right in C++11, at last? Not sure. C++'s aggressive promiscuity has not done C a favor, but rather conditioned the latter into reflexive, instead of reasoned, conservatism. > One can see how code can be generated for code such as this: >=20 > Foo x[count]; > Foo* p =3D x + n; // or &x[n] >=20 > Code such as "Foo *p =3D malloc(size);" would require the compiler to > know how malloc behaves to be able to compute the limit. C's refusal to specify dynamic memory allocation in the language runtime (as opposed to, eventually, the standard library) was a painful oversight. There was a strange tension between that and code idioms among C's own earliest practitioners to assume dynamically sized storage. I remember when novice C programmers managing strings would get ridiculed by their seniors for setting up and writing to static buffers. Why did they do that? Because it was easy--the language supported it well. Going to `malloc()` was like aiming a gun at your own face. The routine practice of memory overcommit in C on Unix systems led to a sort of perverse synergy. Programmers were actively conditioned _against_ performing algorithmic analysis of their _space_ requirements. (By contrast, seeing how far you could bro down your code's _time_ complexity was where you really showed your mettle. If you spent all of the time you saved waiting on I/O, hey man, that's not YOUR problem.) > But for a user to write a similar function will require some language > extension. >=20 > [Of course, if we did that, adding proper support for multidimensional > slices would be far easier. But that is an exploration for another > day!] When I read about Fortran 90/95/2003's facilities for array reshaping, I rocked back on my heels. > Converting enums to behave like Pascal scalars would likely break > things. The question is, can such breakage be fixed automatically (by > source code conversion)? I don't assert that C needs to ape _Pascal_ scalars in particular. Better Ada's. :P Or, equivalently, C++11's "enum class". As with many things in C++, the syntax is an ugly graft, but the idea is as sound as they come. One of the proposals that didn't make it for C23 was similarly ugly: "return break;". But the _idea_ was to mark tail recursion so that the compiler would know it's happening. That saves stack. _That's_ worth having. I worry that it didn't make it just because the syntax was so cringey. But the alternatives, like yet another new keyword, or overloading punctuation some more, seemed worse. C++ indulges both vices amply with every revision. > C's union type is used in two different ways: 1: similar to a sum > type, which can be done type safely and 2: to cheat. The compiler > should produce a warning when it can't verify a typesafe use -- one > can add "unsafe" or some such to let the user absolve the compiler of > such check. Agreed. C++'s family of typecasting operators is, once again, an ugly feature syntactically, but the benefits in terms of saying what you mean, and _only_ what you mean, are valuable. Casts in C are too often an express ticket to UB. > [May be naively] I tend to think one can evolve C this way and fix a > lot of code &/or make a lot of bugs more explicit. If that be na=EFvet=E9, let's have more of it. At 2024-09-20T21:58:26+0100, Warner Losh wrote: > The CHERI architecture extensions do this. It pushes this info into > hardware where all pointers point to a region (gross simplification) > that also grant you rights the area (including read/write/execute). > It's really cool, but it does come at a cost in performance. Each > pointer is a pointer, and a capacity that's basically a > cryptographically signed bit of data that's the bounds and access > permissions associated with the pointer. There's more details on their > web site: > https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ CHERI is absolutely cool and even if it doesn't conquer the world, I feel sure that there is a lot we can learn from it. > CHERI-BSD is a FreeBSD variant that runs on both CHERI variants > (aarch64 and riscv64) and where most of the research has been done. > There's also a Linux variant as well. >=20 > Members of this project know way too many of the corner cases of the C > language from porting most popular software to the CHERI... And have > gone on screeds of their own. The only one I can easily find is > https://people.freebsd.org/~brooks/talks/asiabsdcon2017-helloworld/hellow= orld.pdf Oh yes. I remember they presented at the LF's Open Source Summit one year (maybe the last year in was in downtown San Francisco, before the LF moved the conference to wine country to scrape off all the engineers and other tedious techy types who might point out what's wrong with somebody's grandiose sales pitch--conferences are for getting deals done [too many vice cops in SF?], not advancing the state of the art). It was a questionnaire along the lines of "what do you _really_ know about C?" and it opened my eyes wide for sure. Apparently it turns out that the Dunning-Kruger effect isn't what we think it is. https://www.scientificamerican.com/article/the-dunning-kruger-effect-isnt-w= hat-you-think-it-is/ Maybe D&K's findings were so rapidly assimilated into the cultural zeitgeist because far too many people are acquainted with highly confident C programmers. While preparing this message, I ran across this: https://csrc.nist.gov/files/pubs/conference/1998/10/08/proceedings-of-the-2= 1st-nissc-1998/final/docs/early-cs-papers/schi75.pdf "The Design and Specification of a Security Kernel for the PDP-11/45", by Schiller (1975). I'll try to read and absorb its 117 pages before burdening this list with any more of my yammerings. Happy weekend! Regards, Branden [1] I think. The PDP-11/20 infamously didn't have memory protection of any sort, and the CSRC wisely ran away from that as fast as they could once they could afford to. (See the preface to the Third Edition Programmer's Manual.) And it was reasonable to not expect support for such things if one wanted portability to embedded systems, but it's not clear to me how seriously the portability of C itself was considered until the first ports were actually _done_, and these were not to embedded systems, but to machines broadly comparable to PDP-11s. London and Reiser's paper on Unix/32V opened my eyes with respect to just how late some portability- impacting changes to "K&R C" were actually made. They sounded many cautionary notes that the community--or maybe it was just compiler writers (banshees again?)--seemed slow to acknowledge. --qc3pycln73gxbmm6 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEh3PWHWjjDgcrENwa0Z6cfXEmbc4FAmbt9NIACgkQ0Z6cfXEm bc4d5Q//ZhcEBId0vWzz6YM5g4voHpMMme8PWZfUQwAMLeZMOaAMa1KjCz5p7Q3a LCbJU6SdvfZhO1BhGFP9zua8IdYQDsS/Oxsu+50ljp8mOOJ9Tfgomv1dVJLGlwMK n8/aaNwdcz9lFrJAgFCpdT07bAOFSAtYikTzumkIL3Wbpgg2pokeYo1GZGQ0K+Lj rjFgySGAKUZtXD4+0uri+LJdtjdeGwof8LHuTOTArlHMojeD7Qvcu2xTfjVWSzkU awKPiywFO3OrYssdt3YlVe6EfGapH/O+terg7rfS3Ohj+iAK2c+xmAXlIMyx8PTQ XsFlVF2+L/47EN9h0WDur6jhC1hVogmIKOVnO+N7xeNokM7IU4we3GN240TceIx0 4vSB4MUC+y3dm0bdZhiEvubtw5uVqlfk/msl/YU2RyxVq4lS2GEn1dtZk2hl415k Mi+gYBf9JsYmDJk2jGvdeHkbtUBGFU5GrLYIx3zsIedNGgOAKa8cgrN58Bqc+M21 82BWgTCrbIiG+Ac0GKKzDqbX3XwMhdxdALDtvLtthL/gtYlbiU2x0XNWSB07PtQt 7Au7EnfG2ppZhpA98XBZC67oPrQySjKQr0qrl0vdNp5cMrW6qqw45PbUvNemjLdi qAFSCeHLkpARtSPxGQ0UOlYNKTk8/QFnXv4vuhhYgF6e5npo58s= =4Enx -----END PGP SIGNATURE----- --qc3pycln73gxbmm6--