There is quite a bit there to work through, but I was struck by one of the responses to a gcc bug report:

"you're leading to undefined behaviour - do you understand this simple fact?
in such cases compiler can do *anything* with your code."

I've seen similar comments before (about #pragma, I think). I don't think it was an official response, but it prompts me to discuss

the use of "undefined".

The word "undefined" has historically been used by language definitions as a technical term.

For instance, by Algol 68:

"1.1.4.3. Undefined

a) If something is left “undefined” or is said to be “undefined”, then this means that it is not defined by this Report alone and that, for its definition, information from outside this Report has to be taken into account. {A distinction must be drawn between the yielding of an undefined value (whereupon elaboration continues with possibly unpredictable results) and the complete undefinedness of the further elaboration. The action to be taken in this latter case is at the discretion of the implementer, and may be some form of continuation (but not necessarily the same as any other implementer’s continuation), or some form of interruption (2.1.4.3.h) brought about by some run-time check.}

b) If some condition is “required” to be satisfied during some elaboration then, if it is not so satisfied, the further elaboration is undefined."

For example, a computation that uses a value from store that has not previously been initialised by the program will, at the machine level,

load and use whatever happened to be there, which is especially exciting if it's used as a pointer or floating-point number;

there will be a similar effect in a language that does not check array bounds if an index is out of bounds; and so on.

There are more subtle cases where machine arithmetic at one time did differ in its handling of (say) arithmetic overflow.

Some of the cases are machine-dependent, and others are language-dependent.

Some languages will say things like "the order of evaluation of subexpressions is undefined" to allow different implementations some flexibility,

where other languages that emphasise absolute portability might say that evaluation is (say) strictly left-to-right.

Others will state that dereferencing nil will necessarily produce a trap; others will leave you with whatever result the machine

and run-time system produce when you do it.

I deliberately did not use the C standard's own definition to emphasise that it's ancient, and nothing new with C:

Pascal's standard will have some similar concept, as does Ada.

In all cases, however, there has never been any suggestion *whatsoever* that "undefined" allowed

completely arbitrary, capricious or even whimsical effects. It meant either that the results might simply depend on an implementation choice

between one or other plausible interpretation of a construction, or they reflected differences in either the machine operations or its run-time state (in the case of referencing uninitialised storage).

I refer again to Algol 68: "this means that it is not defined by this Report alone and that, for its definition, information from outside this Report has to be taken into account". In the past, that outside information might include the documentation for the compiler you're using, or the machine definition.

No sane compiler writer would ever assume it allowed the compiler to "do *anything" with your code".

The Plan 9 C compiler is firmly in that historical tradition: the compiler takes advantage of flexibility in evaluation order when it seems helpful,

and tries to avoid making assumptions (ie, "optimisations") that frustrate a programmer's explicit attempt to get a particular effect (eg

with overflow checks); the compiler doesn't do much non-local optimisation (which is where many problems arise) above the expression level;

when handling pointers and arithmetic, it very traditionally gives you what the machine gives you for arithmetic and references to undefined values or out of range indices, unless the language definition (rarely) defines a particular effect. It also eliminates several "undefined" effects

in ANSI C, for convenience or portability, notably the state of program values after longjmp.

One of the examples in the first paper was:

struct tun_struct *tun = ...;

struct sock *sk = tun->sk;

if (!tun)

return POLLERR; /* write to address based on tun */

and the text described its handling by gcc:

"For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined [24: §6.5.3]. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be."

I should say, as a compiler writer, that seems to have the reasoning back to front, by making a perverse appeal to "undefined".

Really, in that text, it's only after the conditional test that one can conclude

that the pointer is or is not null in the relevant branch of the if-else. It is what we call "a compiler bug".

Apparently Hoare's "billion dollar mistake" can be made worse by misguided automated reasoning!