From: Luther Johnson <luther.johnson@makerlisp.com>
To: Nevin Liber <nevin@eviloverlord.com>, tuhs@tuhs.org
Subject: [TUHS] Re: C history question: why is signed integer overflow UB?
Date: Fri, 15 Aug 2025 11:25:32 -0700 [thread overview]
Message-ID: <e1a46242-a9e6-a580-9f05-994e76262f50@makerlisp.com> (raw)
In-Reply-To: <CAGg_6+OZOV0+yNm-T0pR3zqkhxOuhZAbY_B5hWg5JVcnXaV5hA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4310 bytes --]
I hear and understand what you're saying. I think what I'm trying to
point out, is that in C, as it was originally implemented, in
expressions "a + b", "a >> 1", "++a", C "does what the machine does".
That's a very different thing from having rational, safe, predictable
language semantics for operations on types - but it was also a strength,
and a simple way to describe what C would do, deferring to machine
semantics. I believe one place in C89/C90 where this is stated
explicitly, as "do what the machine does", is "-1 >> 1", as opposed to
"-1 / 2". On most machines, this program:
#include <stdio.h>
int main()
{
printf("%d\n", -1 >> 1);
printf("%d\n", -1 / 2);
return 0;
}
returns:
-1
0
directly reflecting the underlying machine shift and divide instructions
- but if you made an appeal to rational integer type semantics, you
might decide for it to do something else.
Old C was one way. Modern C has gone another way, good tools and
rational semantics for safer and/or higher performance code, or some
balance between those and other goals. Old C just did what the machine
did, and was a high leverage tool - but you had to understand your machine.
On 08/15/2025 11:02 AM, Nevin Liber wrote:
> On Fri, Aug 15, 2025 at 12:32 PM Luther Johnson
> <luther.johnson@makerlisp.com <mailto:luther.johnson@makerlisp.com>>
> wrote:
>
> My belief is that this was done so compilers could employ
> optimizations
> that did not have to consider or maintain implementation-specific
> behavior when integers would wrap. I don't agree with this, I
> think 2's
> complement behavior on integers as an implementation-specific
> behavior
> can be well-specified, and well-understood, machine by machine, but I
> think this is one of the places where compilers and benchmarks
> conspire
> to subvert the obvious and change the language to "language-legally"
> allow optimizations that can break the used-to-be-expected 2's
> complement implementation-specific behavior.
>
>
> It isn't just about optimizations.
>
> Unsigned math in C is well defined here. The problem is that its
> wrapping behavior is almost (but not) always a bug. Because of that,
> for instance, one cannot write a no-false-positive sanitizer to catch
> this because it cannot tell the difference between an accidental bug
> and a deliberate use. This is a well-defined case with a very
> reasonable definition which most of the time leads to bugs.
>
> There are times folks want the wrapping behavior. There are times
> folks want saturating behavior. There are times folks want such code
> to error out. There are times folks want the optimizing behavior
> because their code doesn't go anywhere near wrapping.
>
> Ultimately, one needs different functions for the different
> behaviors, but if you only have one spelling for that operation, you
> can only get one behavior. A given type has to pick one of the above
> behaviors for a given spelling of an operation.
>
> You can, of course, disagree with what C picked here (many do), but it
> is unlikely to change in the future.
>
> Not that it hasn't been tried. In 2018 there was a proposal for C++
> P0907R0 Signed Integers are Two's Complement
> <https://wg21.link/P0907R0>, and if you look at the next revision of
> that paper P0907R1 <https://wg21.link/P0907R1>, there was no consensus
> for the wrapping behavior. Quoting the paper:
>
> * Performance concerns, whereby defining the behavior prevents
> optimizers from assuming that overflow never occurs;
> * Implementation leeway for tools such as sanitizers;
> * Data from Google suggesting that over 90% of all overflow is a
> bug, and defining wrapping behavior would not have solved the bug.
>
> Fun fact: in C++ std::atomic<int> does wrap, so you can actually get
> the behavior you want. I haven't looked to see if that is also true
> using C's _Atomic type qualifier.
>
> Full disclosure: I am on the WG21 (C++) Committee and am starting to
> participate on the WG14 (C) Committee.
> --
> Nevin ":-)" Liber <mailto:nevin@eviloverlord.com
> <mailto:nevin@eviloverlord.com>> +1-847-691-1404
[-- Attachment #2: Type: text/html, Size: 7107 bytes --]
next prev parent reply other threads:[~2025-08-15 18:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-15 17:17 [TUHS] C history question: why is signed integer overflow UB? Dan Cross
2025-08-15 17:31 ` [TUHS] " Luther Johnson
2025-08-15 17:36 ` Luther Johnson
2025-08-15 18:03 ` Warner Losh
2025-08-16 6:01 ` Lars Brinkhoff
2025-08-15 18:02 ` Nevin Liber
2025-08-15 18:25 ` Luther Johnson [this message]
2025-08-15 18:44 ` John Levine
2025-08-15 21:04 ` Douglas McIlroy
2025-08-15 21:59 ` Dave Horsfall
2025-08-15 23:58 ` Luther Johnson
2025-08-17 2:25 ` Clem Cole
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1a46242-a9e6-a580-9f05-994e76262f50@makerlisp.com \
--to=luther.johnson@makerlisp.com \
--cc=nevin@eviloverlord.com \
--cc=tuhs@tuhs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).