9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Charles Forsyth <charles.forsyth@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] Undefined Behaviour in C
Date: Mon, 23 Nov 2015 11:32:50 +0000	[thread overview]
Message-ID: <CAOw7k5jphEwBiv7OUU_CwXiSZBQ5XHJOg42t0CtgtDL0Yd6+2w@mail.gmail.com> (raw)
In-Reply-To: <1448274004.1751482.447419065.2BE466C4@webmail.messagingengine.com>

[-- Attachment #1: Type: text/plain, Size: 5869 bytes --]

There is quite a bit there to work through, but I was struck by one of the
responses to a gcc bug report:

   "you're leading to undefined behaviour - do you understand this simple
fact?
    in such cases compiler can do *anything* with your code."

I've seen similar comments before (about #pragma, I think). I don't think
it was an official response, but it prompts me to discuss
the use of "undefined".

The word "undefined" has historically been used by language definitions as
a technical term.
For instance, by Algol 68:

"1.1.4.3. Undefined
a) If something is left “undefined” or is said to be “undefined”, then this
means that it is not defined by this Report alone and that, for its
definition, information from outside this Report has to be taken into
account. {A distinction must be drawn between the yielding of an undefined
value (whereupon elaboration continues with possibly unpredictable results)
and the complete undefinedness of the further elaboration. The action to be
taken in this latter case is at the discretion of the implementer, and may
be some form of continuation (but not necessarily the same as any other
implementer’s continuation), or some form of interruption (2.1.4.3.h)
brought about by some run-time check.}
b) If some condition is “required” to be satisfied during some elaboration
then, if it is not so satisfied, the further elaboration is undefined."

For example, a computation that uses a value from store that has not
previously been initialised by the program will, at the machine level,
load and use whatever happened to be there, which is especially exciting if
it's used as a pointer or floating-point number;
there will be a similar effect in a language that does not check array
bounds if an index is out of bounds; and so on.
There are more subtle cases where machine arithmetic at one time did differ
in its handling of (say) arithmetic overflow.
Some of the cases are machine-dependent, and others are language-dependent.
Some languages will say things like "the order of evaluation of
subexpressions is undefined" to allow different implementations some
flexibility,
where other languages that emphasise absolute portability might say that
evaluation is (say) strictly left-to-right.
Others will state that dereferencing nil will necessarily produce a trap;
others will leave you with whatever result the machine
and run-time system produce when you do it.

I deliberately did not use the C standard's own definition to emphasise
that it's ancient, and nothing new with C:
Pascal's standard will have some similar concept, as does Ada.
In all cases, however, there has never been any suggestion *whatsoever*
that "undefined" allowed
completely arbitrary, capricious or even whimsical effects. It meant either
that the results might simply depend on an implementation choice
between one or other plausible interpretation of a construction, or they
reflected differences in either the machine operations or its run-time
state (in the case of referencing uninitialised storage).

I refer again to Algol 68: "this means that it is not defined by this
Report alone and that, for its definition, information from outside this
Report has to be taken into account". In the past, that outside information
might include the documentation for the compiler you're using, or the
machine definition.

No sane compiler writer would ever assume it allowed the compiler to "do
*anything" with your code".

The Plan 9 C compiler is firmly in that historical tradition: the compiler
takes advantage of flexibility in evaluation order when it seems helpful,
and tries to avoid making assumptions (ie, "optimisations") that frustrate
a programmer's explicit attempt to get a particular effect (eg
with overflow checks); the compiler doesn't do much non-local optimisation
(which is where many problems arise) above the expression level;
when handling pointers and arithmetic, it very traditionally gives you what
the machine gives you for arithmetic and references to undefined values or
out of range indices, unless the language definition (rarely) defines a
particular effect. It also eliminates several "undefined" effects
in ANSI C, for convenience or portability, notably the state of program
values after longjmp.

One of the examples in the first paper was:

    struct tun_struct *tun = ...;
    struct sock *sk = tun->sk;
    if (!tun)
        return POLLERR; /* write to address based on tun */

and the text described its handling by gcc:

"For example, when gcc first sees the dereference tun->sk, it concludes
that the pointer tun must be non-null, because the C standard states that
dereferencing a null pointer is undefined [24: §6.5.3]. Since tun is
non-null, gcc further determines that the null pointer check is unnecessary
and eliminates the check, making a privilege escalation exploit possible
that would not otherwise be."

I should say, as a compiler writer, that seems to have the reasoning back
to front, by making a perverse appeal to "undefined".
Really, in that text, it's only after the conditional test that one can
conclude
that the pointer is or is not null in the relevant branch of the if-else.
It is what we call "a compiler bug".
Apparently Hoare's "billion dollar mistake" can be made worse by misguided
automated reasoning!

On 23 November 2015 at 10:20, Ramakrishnan Muthukrishnan <ram@rkrishnan.org>
wrote:

> Had been reading the SOSP paper:
> <https://pdos.csail.mit.edu/papers/stack:sosp13.pdf>
>
> and this blog post that proposes a simpler C:
> <http://blog.regehr.org/archives/1180>
>
> I wonder how Plan 9 C compiler, which is a non-ANSI compliant compiler,
> treats those parts that the ANSI C standard treats as undefined.
>
> --
>   Ramakrishnan
>
>

[-- Attachment #2: Type: text/html, Size: 7182 bytes --]

  parent reply	other threads:[~2015-11-23 11:32 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-23 10:20 Ramakrishnan Muthukrishnan
2015-11-23 11:20 ` Vasudev Kamath
2015-11-25 10:27   ` Alexandru Gheorghe
2015-11-25 10:43     ` Brantley Coile
2015-11-25 10:53       ` Brantley Coile
2015-11-25 12:59       ` Charles Forsyth
2015-11-25 13:48         ` erik quanstrom
2015-11-25 14:25           ` Brantley Coile
2015-11-25 14:31             ` Brantley Coile
2015-11-25 16:03             ` plannine
2015-11-25 17:13               ` Ryan Gonzalez
2015-11-25 18:41                 ` Brantley Coile
2015-11-26  2:04                   ` Prof Brucee
2015-11-26  2:43                     ` Brantley Coile
2015-11-26  2:57                       ` Prof Brucee
2015-11-26  3:48                         ` Ryan Gonzalez
2015-11-26  7:27                     ` Bakul Shah
2015-11-26 11:22                       ` Brantley Coile
2015-11-26 11:37                         ` tlaronde
2015-11-26 11:55                           ` Charles Forsyth
2015-11-26 11:38                         ` Bruce Ellis
2015-11-26 16:31                         ` erik quanstrom
2015-11-26 16:42                           ` Brantley Coile
2015-11-26 16:50                             ` Charles Forsyth
2015-11-26 17:12                               ` erik quanstrom
2015-11-26 16:46                           ` Alexandru Gheorghe
2015-11-26 17:48                         ` Bakul Shah
2015-11-26 18:04                           ` Brantley Coile
2015-11-26 23:14                           ` Steve Simon
2015-11-26 23:24                             ` Charles Forsyth
2015-11-26 23:55                             ` Brantley Coile
2015-11-25 19:19               ` Steffen Nurpmeso
2015-11-23 11:32 ` Charles Forsyth [this message]
2015-11-23 11:37   ` Charles Forsyth
2015-11-23 11:50 ` Brantley Coile
2015-11-23 12:05   ` Charles Forsyth
2015-11-23 12:17     ` Brantley Coile
2015-11-23 12:40       ` Charles Forsyth
2015-11-23 12:09   ` Charles Forsyth
2015-11-23 14:30 ` Charles Forsyth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOw7k5jphEwBiv7OUU_CwXiSZBQ5XHJOg42t0CtgtDL0Yd6+2w@mail.gmail.com \
    --to=charles.forsyth@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).