From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <1448274004.1751482.447419065.2BE466C4@webmail.messagingengine.com> References: <1448274004.1751482.447419065.2BE466C4@webmail.messagingengine.com> Date: Mon, 23 Nov 2015 14:30:29 +0000 Message-ID: From: Charles Forsyth To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: multipart/alternative; boundary=001a1130c8d01ce1950525361183 Subject: Re: [9fans] Undefined Behaviour in C Topicbox-Message-UUID: 770f5cd6-ead9-11e9-9d60-3106f5b1d025 --001a1130c8d01ce1950525361183 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 23 November 2015 at 10:20, Ramakrishnan Muthukrishnan wrote: > Had been reading the SOSP paper: > > As an example of how tricky it can be, one of their examples is const uint8_t *data =3D /* buffer head */; const uint8_t *data_end =3D /* buffer tail */; int size =3D bytestream_get_be16(&data); if (data + size >=3D data_end || data + size < data) return -1; They say "A correct fix is to replace data + x >=3D data_end || data + x < data with x >=3D data_end =E2=88=92 data, which is simpler and also avoids = invoking undefined behavior; one should also add the check x < 0 if x can be negative." Unfortunately, that replacement is itself well-defined only if data and data_end "point to elements of the same array object, or one past the last element of the array object" (and there's an implementation-dependent option for the interpretation of "one past" when ensuring the address can be represented). It looks from the comments as though that might be true in this particular case (or it's intended to be understood), but if not, avoiding the compiler's "optimisation" that messes up one form of undefined behaviour will lead you to write code that has different undefined states. Generally, an optimising compiler for a systems language, especially one in which pointer values can be manipulated explicitly, needs to be sure of its ground when second-guessing the effect of a given statement or expression. --001a1130c8d01ce1950525361183 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

= On 23 November 2015 at 10:20, Ramakrishnan Muthukrishnan = <ram@rkrishnan.or= g> wrote:
Had been reading the SOSP paper:
<https://pdos.csail.mit.edu/papers/stack:sosp13= .pdf>

As an example of how tricky it can= be, one of their examples is

const uint8_t *data =3D /* buffer head */;
const uint8_t *data_end =3D /* buffer tail */;
int size =3D bytestream_get_be16(&data);
if (data + size >=3D data_end || data + si= ze < data)
=C2=A0 =C2=A0return -1;

They sa= y "A correct fix is to replace data + x >=3D data_end || data + x < data with x >=3D data_end =E2=88=92 data, which is simpler and also avoids invoking undefined behavior; one should also add the check x < 0 if x can be negative."

Unfortunately, that replacemen= t is itself well-defined only if data and data_end "point to elements = of the same array object, or one past the last element of the array object&= quot; (and there's an implementation-dependent option for the interpret= ation of "one past" when ensuring the address can be represented)= . It looks from the comments as though that might be true in this particula= r case (or it's intended to be understood), but if not, avoiding the co= mpiler's "optimisation" that messes up one form of undefined = behaviour will lead you to write code that has different undefined states.<= /div>

Genera= lly, an optimising compiler for a systems language, especially one =C2=A0in= which pointer values can be manipulated explicitly,
needs to be sure of its ground when second-guessing the effect of= a given statement or expression.

--001a1130c8d01ce1950525361183--