From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, URIBL_DBL_BLOCKED_OPENDNS,URIBL_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [IPv6:2600:3c01:e000:146::1]) by inbox.vuxu.org (Postfix) with ESMTP id 95817248D9 for ; Fri, 15 Aug 2025 20:25:50 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id C3B4443BF9; Sat, 16 Aug 2025 04:25:42 +1000 (AEST) Received: from mout.perfora.net (mout.perfora.net [74.208.4.196]) by minnie.tuhs.org (Postfix) with ESMTPS id 60D5D43BF8 for ; Sat, 16 Aug 2025 04:25:36 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=makerlisp.com; s=s1-ionos; t=1755282335; x=1755887135; i=luther.johnson@makerlisp.com; bh=FEtGz1KYBkOy8DoKgnYC+qUarTMTgsJTVp3mkVVMIBk=; h=X-UI-Sender-Class:Subject:References:From:To:Message-ID:Date: MIME-Version:In-Reply-To:Content-Type:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=GiMwI12lUA3Vp8QFbIIkISYCeUfCwoTm2Da7Vv9wZ0fjyokpVTN0BxlQpOJywuVx bgmotkt/YgsZ4C1Qu2D+LMEAXQoeXdwZW2owKQ5XV2vFpmbwDILJ1UHfnp1zDO24H N74wvt73o5KzIQZVzww3XtWLrDr9Al4nTNSZdaUpyV6GG4Hr8XhG3Ht6QkV9mN9wW NCIJVqZyiM82JV1mUrkOHQmIZkJAlfLQNlX4TxOGlaU4LyDza1okJucawSNsT22Ux RUz+jOXbpB+tK5QnPkaG0Bh/QAkFFAPyWRZdoC1rkpu8dq0Sf4ard431yHF+m0pRo Vzkma9zLaPnciSiJ8w== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Received: from mlhivps ([74.50.126.128]) by mrelay.perfora.net (mreueus004 [74.208.5.2]) with ESMTPSA (Nemesis) id 1Mz9AN-1uQD2C1Hpz-016u4Y; Fri, 15 Aug 2025 20:25:35 +0200 Received: from [192.168.234.130] (unknown [172.56.176.94]) by mlhivps (Postfix) with ESMTPSA id 5A400480169; Fri, 15 Aug 2025 18:25:34 +0000 (UTC) References: <664f1cf9-ae56-11a5-1e94-f58e0ca23565@makerlisp.com> From: Luther Johnson To: Nevin Liber , tuhs@tuhs.org Message-ID: Date: Fri, 15 Aug 2025 11:25:32 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------724E5E4A82B6C655E475CFAF" X-Provags-ID: V03:K1:onqE1KM7lfFBpjXVdYFgHFOotUFTvHCDHOX5DXXoW7MDblpVRol y1K/ZOgOm8HUkxidSRqogDtJLidauM5eKFDJCbxa9K18LnF1crVVMeV55xAGv3B0O6JjhuH Nyb77x1LpBiR9nZB28stk+GCmifYtfeESEE7K+JO/2UiBv95+ACDyXvmRXDcl6apa4O6ZHD 5JYPzRULXyx1/yMeNfS9w== UI-OutboundReport: notjunk:1;M01:P0:WiVZkKXF2eU=;rpLDtPdSCVOX34k4KCR7Avv9/th loLw2zLFY76KuaNuBiVxpG27WBySxMRJGThGSNKdh8H6jGK+PZApBab7xt2pBh2fzbuSZtPc5 gwIU4/8UCHZbJwtTY1tMzzhzm3VyEoFi9r7nHpWtSpUOkxyMBNIeJ167vFaD6b8uTCfv+bJ3F YLH8p6jeBicdRTVaMzmUeT5LD1nUsTzyCzEWs4j5eXhH4ibOJ5OZOHay3wuj3E4g8BzVsvbyM 2cCPMnw0a8hNHsyLGkhFaxOh5AofhD/g1kdEMOc6vhGHltNuIyfnlrOFQEIEIzkRbAodg4/6d aHYYnW75rEE2/n8RMZoRhoT2A9YklN/+PgmX3jgGjAgExabJD9cSx6CrSlPnebPUgANpFtfRv tMcXtpXQoj6NdnSEk82xu8KZmdbEh51yLJ0dacoJdQxyEWpAlpj/WlQ/elMWqBjh8OY4C1u07 KQcxtI/YRtGX7VvUiabcDiZLmy8mOxrrGwEQNBtlEboZ6Rcd+jjeZKp8UdDlx75VXppJMmVyu rH89+s9Gn9m/T77k6OiKInSw5I5kiQNm3bLqPF8hGc441KxmmS2p61Jm9oyQwzn8aCZ5hzf0a CgoErMqMbFge51/n9yVp6KayWPdVwGvVPdpzqlnrmil6/y4Fj7MSlnByfkAZXm0OqxYxWvPG1 JWjP9Chzx2O9D9PmMiBWS9iEWM1MUQFPWmSurMR3FQVA4KmvyppzH4AoMIAF/SXqdTucPoYYd vs7zxgsBQgydaWlBYiXRL6S2Gtnps90aIBkHjHv1Uo/MXJYq6W8xJaTcrRmQ8Dz4mZgYh0v2X pMrwBO9lqUFE5CSzNOM5HMbNTuMdTC4fnPNKROtiVfieJwSaeioKgX5XOzUb4/1HWmwY6QyH3 dUfIEDi4ExudIdjN5S2qR17SC1pZFQ1XH54oxxlct0dCtDNJlzCbI0N7rzCvVsi9XptptNNlP jxh5lHXD/2S8OcMPSKA0LgwhF8Y43tsM+iOjxTmtAyqQsrGkQ5kxjXmJc6U9OXRMzCGr9W5z1 V0iq7nnHe+4P5akgeZH7YyzjCOBI0Ag9/5P//Eiv8kD6EI2o/W8xYvRmqLVQk0f/7qOT8dAV8 7Y9fKs1qch1bX/0QeGi3e9ka3lPlXSrTIowFt0UqFl9aOqgl4TQtml+uXNQWVfqAlT1VoRTwn F7CDtJM9LWnxNA8kRLVaP7iGAnc1Zi0zCOvG3Cktcdd0k+oFQm66oB6hJiBiMxrKROvb2rvE8 N0QPDcVEF2JE098GlBykycMpCfJw6Fyoofb05e1oHEC4DLRmksnOfYnkmaziGJ2bnUzFMPV5e 39ZsVyATfGJSJEMXEOEKE09NNOs70dZK9jo2FDn/0w37zkkO298ypnzwkWzxEEp94sIKpcQ0p g0enHuTXliI8SdZuQ8hCux7LIfnMvxooFxc7jQCU9N57IxbZ8u61rvBsNE0VCUV8WwIXsf+BT TgZQYMrH0hoZSqkGJmEUO2lbGttTkc3fNTONxGlEKJMNHGjrvVkCUZzmruzfomK+vFYF3OewW c4XxEj7bd8pmFx1qJxE5KTunS+ZypQ5ieg0pw7aDdIDbRrJk9bp72j+ihvbqsawO4jnMMdVtj 77Dk4PtYVSGYFlRegN0Z9C2uq/aGrMLLeUzdGuA4ioOlM7XawhLlWHihd+gW4JuyZY6FWNvcn d5LSdECdNnH4EaGhpTypjeNYo/EzHv9sP08Jjosv+ycQ== Message-ID-Hash: YNRXCHQ4NS6D67DKIN2AEL26ZD3VOBMG X-Message-ID-Hash: YNRXCHQ4NS6D67DKIN2AEL26ZD3VOBMG X-MailFrom: luther.johnson@makerlisp.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tuhs.tuhs.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: C history question: why is signed integer overflow UB? List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: This is a multi-part message in MIME format. --------------724E5E4A82B6C655E475CFAF Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable I hear and understand what you're saying. I think what I'm trying to=20 point out, is that in C, as it was originally implemented, in=20 expressions "a + b", "a >> 1", "++a", C "does what the machine does".=20 That's a very different thing from having rational, safe, predictable=20 language semantics for operations on types - but it was also a strength,= =20 and a simple way to describe what C would do, deferring to machine=20 semantics. I believe one place in C89/C90 where this is stated=20 explicitly, as "do what the machine does", is "-1 >> 1", as opposed to=20 "-1 / 2". On most machines, this program: #include int main() { printf("%d\n", -1 >> 1); printf("%d\n", -1 / 2); return 0; } returns: -1 0 directly reflecting the underlying machine shift and divide instructions= =20 - but if you made an appeal to rational integer type semantics, you=20 might decide for it to do something else. Old C was one way. Modern C has gone another way, good tools and=20 rational semantics for safer and/or higher performance code, or some=20 balance between those and other goals. Old C just did what the machine=20 did, and was a high leverage tool - but you had to understand your machine= . On 08/15/2025 11:02 AM, Nevin Liber wrote: > On Fri, Aug 15, 2025 at 12:32=E2=80=AFPM Luther Johnson=20 > >=20 > wrote: > > My belief is that this was done so compilers could employ > optimizations > that did not have to consider or maintain implementation-specific > behavior when integers would wrap. I don't agree with this, I > think 2's > complement behavior on integers as an implementation-specific > behavior > can be well-specified, and well-understood, machine by machine, but = I > think this is one of the places where compilers and benchmarks > conspire > to subvert the obvious and change the language to "language-legally" > allow optimizations that can break the used-to-be-expected 2's > complement implementation-specific behavior. > > > It isn't just about optimizations. > > Unsigned math in C is well defined here. The problem is that its=20 > wrapping behavior is almost (but not) always a bug. Because of that,=20 > for instance, one cannot write a no-false-positive sanitizer to catch=20 > this because it cannot tell the difference between an accidental bug=20 > and a deliberate use. This is a well-defined case with a very=20 > reasonable definition which most of the time leads to bugs. > > There are times folks want the wrapping behavior. There are times=20 > folks want saturating behavior. There are times folks want such code=20 > to error out. There are times folks want the optimizing behavior=20 > because their code doesn't go anywhere near wrapping. > > Ultimately, one needs different functions for the different=20 > behaviors, but if you only have one spelling for that operation, you=20 > can only get one behavior. A given type has to pick one of the above=20 > behaviors for a given spelling of an operation. > > You can, of course, disagree with what C picked here (many do), but it= =20 > is unlikely to change in the future. > > Not that it hasn't been tried. In 2018 there was a proposal for C++=20 > P0907R0 Signed Integers are Two's Complement=20 > , and if you look at the next revision of=20 > that paper P0907R1 , there was no consensus= =20 > for the wrapping behavior. Quoting the paper: > > * Performance concerns, whereby defining the behavior prevents > optimizers from assuming that overflow never occurs; > * Implementation leeway for tools such as sanitizers; > * Data from Google suggesting that over 90% of all overflow is a > bug, and defining wrapping behavior would not have solved the bug. > > Fun fact: in C++ std::atomic does wrap, so you can actually get=20 > the behavior you want. I haven't looked to see if that is also true=20 > using C's _Atomic type qualifier. > > Full disclosure: I am on the WG21 (C++) Committee and am starting to=20 > participate on the WG14 (C) Committee. > --=20 > Nevin ":-)" Liber > +1-847-691-1404 --------------724E5E4A82B6C655E475CFAF Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable

I hear and understand what you're saying. I think what I'm trying to point out, is that in C, as it was originally implemented, in expressions "a + b", "a >> 1", "++a", C "does what the machine does". That's a very different thing from having rational, safe, predictable language semantics for operations on types - but it was also a strength, and a simple way to describe what C would do, deferring to machine semantics. I believe one place in C89/C90 where this is stated explicitly, as "do what the machine does", is "-1 >> 1", as opposed to "-1 / 2".=C2=A0 On most machines, thi= s program:

#include <stdio.h>

int main()
{
=C2=A0=C2=A0=C2=A0 printf("%d\n", -1 >> 1);
=C2=A0=C2=A0=C2=A0 printf("%d\n", -1 / 2);

=C2=A0=C2=A0=C2=A0 return 0;
}

returns:

-1
0

directly reflecting the underlying machine shift and divide instructions - but if you made an appeal to rational integer type semantics, you might decide for it to do something else.

Old C was one way. Modern C has gone another way, good tools and rational semantics for safer and/or higher performance code, or some balance between those and other goals. Old C just did what the machine did, and was a high leverage tool - but you had to understand your machine.

On 08/15/2025 11:02 AM, Nevin Liber wrote:
On Fri, Aug 15, 2025 at 12:32=E2=80=AFPM Luther J= ohnson <luther.johnson@ma= kerlisp.com> wrote:
My belief is that this was done so compilers could employ optimizations
that did not have to consider or maintain implementation-specific
behavior when integers would wrap. I don't agree with this, I think 2's
complement behavior on integers as an implementation-specific behavior
can be well-specified, and well-understood, machine by machine, but I
think this is one of the places where compilers and benchmarks conspire
to subvert the obvious and change the language to "language-legally"
allow optimizations that can break the used-to-be-expected 2's
complement implementation-specific behavior.

It isn't just about optimizations.

Unsigned math in C is well defined here.=C2=A0 The problem = is that its wrapping behavior is almost (but not) always a bug.=C2=A0 Because of that, for instance, one cannot write a no-false-positive sanitizer to catch this because it cannot tell the difference between an accidental bug and a deliberate use.=C2=A0 This is a well-defined case with a very reasonable definition which most of the time leads to bugs.

There are times folks want the wrapping behavior.=C2=A0 The= re are times folks want saturating behavior.=C2=A0 There are time= s folks want such code to error out.=C2=A0 There are times folks want the optimizing behavior because their code doesn't go anywhere near wrapping.

Ultimately, one needs different functions for the different behaviors,=C2=A0but if you only have one spelling fo= r that operation, you can only get one behavior.=C2=A0 A given t= ype has to pick one of the above behaviors for a given spelling of an operation.

You can, of course, disagree with what C picked here (many do), but it is unlikely to change in the future.

Not that it hasn't been tried.=C2=A0 In 2018 there was a proposal for C++ P0907R0 Signed Integers are Two's Complement, and if you look at the next revision of that paper P0907R1, there was no consensus for the wrapping behavior.=C2=A0 Quoting the paper:<= /div>
  • Performance concerns, whereby defining the behavior prevents optimizers from assuming that overflow never occurs;
  • Implementation leeway for tools such as sanitizers;
  • Data from Google suggesting that over 90% of all overflow is a bug, and defining wrapping behavior would not have solved the bug.
Fun fact: =C2=A0in C++ std::atomic= <int> does wrap, so you can actually get the behavior you want.=C2= =A0 I haven't looked to see if that is also true using C's _Atomic type qualifier.

Full disclosure: =C2=A0I am on the WG21 (C++) Committee and= am starting to participate on the WG14 (C) Committee.
--
=C2=A0Nevin ":-)" Liber=C2=A0 <mailto:nevin@eviloverlord.com> =C2=A0+1-847-691-1404

--------------724E5E4A82B6C655E475CFAF--