From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 30537 invoked from network); 16 May 2020 16:15:49 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 16 May 2020 16:15:49 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 385DC9C5F1; Sun, 17 May 2020 02:15:46 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 4803D9C5E5; Sun, 17 May 2020 02:15:10 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="miTwNdzR"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 2A6169C5E5; Sun, 17 May 2020 02:15:07 +1000 (AEST) Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by minnie.tuhs.org (Postfix) with ESMTPS id 77DCB9C5E4 for ; Sun, 17 May 2020 02:15:06 +1000 (AEST) Received: by mail-qv1-f49.google.com with SMTP id er16so2125727qvb.0 for ; Sat, 16 May 2020 09:15:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8SFwlgQ+4sowgeuoAdkLrakCVjMu1sAFyFlheWS3IDc=; b=miTwNdzR4UI+rzmT7Qy1utd8v4BUfboGsdsasUTMQwqXFD4BgwgHKRBTDYqGuzvLKJ 1TXOmJieegXyL89nLo8fJoMbyQ58xl6FQfigu4BpU6lYC+QFLLJ2YNzq4skyU+JSYb8F +MzZXiqqaRtMw7IqbT00FcEpHort4l/ymDwRPkGF+1fQvEYEn4OivxvbCwbSCq4xPUtU jvgLqzfKcPFDq51ZnUXmWrhwLrYYnZdr1AVaT1wtZ7gjbWoLWjKAasPKQmbLX2sxacaO gBdObB0RecDV0aCFRbmtl1r5fydcYLVR81XMLQGT0XNyBRiMb42ZWWLPZFCDS8ejVwoh tQew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8SFwlgQ+4sowgeuoAdkLrakCVjMu1sAFyFlheWS3IDc=; b=qQM5ONNDusQhs42Pp9PUBxh6TPcT8xlXLwXr8CzMXA4HRTZb3mX8dSYHQVxKDsXWcb /kzkRAD85Z9q9n/zDbO3LsUKUBPMGKHovHMN+RsoCeePgXfNt/Dt0SOb2RqOUGT1i77d 9K5dVCOB9jR2fIXuX/kqc72BP2oIoBAdk7p+vUWR9zdf4LFXot33RYat3p8F3K4Z6D66 dknsUJMyI58Aoc+hSUO1A9121lmKG0Xx+d/bEPToU4thT7C1Pi0vm14hZK4O8+JFH2xA qbM/TeJOpZmgRJLUSSsRgQWj3iqXa9JFz69qvsKqWZiYjzqXSE1ChhNS66fTz48kXvsL 08fg== X-Gm-Message-State: AOAM530efC3HAwgQzH9VIndmQqneE/IyyeUOdqdxMWj2yqM2sTofT6EO f5N4z8MEwDBgpX+Jk2iZfWxXrOUHXrWO9COhBW4= X-Google-Smtp-Source: ABdhPJyAeAYBnKrbq7dNXAKjxCL+pFG7KS0fC8SeqL8bQA1rK7E1qnREExbPCVUrFfzVy1noBYxF/++2E3TcqIMMOm0= X-Received: by 2002:a0c:fb4b:: with SMTP id b11mr2354816qvq.96.1589645705423; Sat, 16 May 2020 09:15:05 -0700 (PDT) MIME-Version: 1.0 References: <202005141841.04EIfvEZ063529@tahoe.cs.Dartmouth.EDU> <20200515150122.GF30160@mcvoy.com> <014001d62af3$9cc209b0$d6461d10$@ronnatalie.com> <019701d62af6$050344b0$0f09ce10$@ronnatalie.com> <0A2C62EF-E43E-43E4-8C53-CE4C99BC5B32@coraid.com> In-Reply-To: <0A2C62EF-E43E-43E4-8C53-CE4C99BC5B32@coraid.com> From: Dan Cross Date: Sat, 16 May 2020 12:14:28 -0400 Message-ID: To: Brantley Coile Content-Type: multipart/alternative; boundary="000000000000972f5d05a5c63b05" Subject: Re: [TUHS] v7 K&R C X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: The Eunuchs Hysterical Society Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --000000000000972f5d05a5c63b05 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, May 15, 2020 at 8:58 PM Brantley Coile wrote: > I always kept local, single characters in ints. This avoided the problem > with loading a character being signed or unsigned. The reason for not > specifying is obvious. Today, you can pick the move-byte-into-word > instruction that either sign extends or doesn't. But when C was defined > that wasn't the case. Some machines sign extended when a byte was loaded > into a register and some filled the upper bits with zero. For machines th= at > filled with zero, a char was unsigned. If you forced the language to do o= ne > or the other, it would be expensive on the opposite kind of machine. > Not only that, but if one used an exactly `char`-width value to hold, er, character data as returned from `getchar` et al, then one would necessarily give up the possibility of handling whatever character value was chosen for the sentinel marking end-of-input stream. `getchar` et al are defined to return EOF on end of input; if they didn't return a wider type than `char`, there would be data that could not be read. On probably every machine I am ever likely to use again in my lifetime, byte value 255 would be -1 as a signed char, but it is also a perfect valid value for a byte. The details of whether char is signed or unsigned aside, use of a wider type is necessary for correctness and ability to completely represent the input data. It's one of the things that made C a good choice on a wide variety of > machines. > > I guess I always "saw" the return value of the getchar() as being in a in= t > sized register, at first namely R0, so kept the character values returned > as ints. The actual EOF indication from a read is a return value of zero > for the number of characters read. > That's certainly true. Had C supported multiple return values or some kind of option type from the outset, it might have been that `getchar`, read, etc, returned a pair with some useful value (e.g., for `getchar` the value of the byte read; for `read` a length) and some indication of an error/EOF/OK value etc. Notably, both Go and Rust support essentially this: in Go, `io.Read()` returns a `(int, error)` pair, and the error is `io.EOF` on end-of-input; in Rust, the `read` method of the `Read` trait returns a `Result`, though a `Result::Ok(n)`, where `n=3D=3D0` indicates EOF. But I'm just making noise because I'm sure everyone knows all this. > I think it's worthwhile stating these things explicitly, sometimes. - Dan C. > On May 15, 2020, at 4:18 PM, ron@ronnatalie.com wrote: > > > > EOF is defined to be -1. > > getchar() returns int, but c is a unsigned char, the value of (c =3D > getchar()) will be 255. This will never compare equal to -1. > > > > > > > > Ron, > > > > Hmmm... getchar/getc are defined as returning int in the man page and C > is traditionally defined as an int in this code.. > > > > On Fri, May 15, 2020 at 4:02 PM wrote: > >> Unfortunately, if c is char on a machine with unsigned chars, or it=E2= =80=99s > of type unsigned char, the EOF will never be detected. > >> > >> > >> > >>> =E2=80=A2 while ((c =3D getchar()) !=3D EOF) if (c =3D=3D '\n') {= /* entire record > is now there */ > > --000000000000972f5d05a5c63b05 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, May 15, 2020 at 8:58 PM Brantley = Coile <brantley@coraid.com>= ; wrote:
I always kept local, single characters in ints. This av= oided the problem with loading a character being signed or unsigned. The re= ason for not specifying is obvious. Today, you can pick the move-byte-into-= word instruction that either sign extends or doesn't. But when C was de= fined that wasn't the case. Some machines sign extended when a byte was= loaded into a register and some filled the upper bits with zero. For machi= nes that filled with zero, a char was unsigned. If you forced the language = to do one or the other, it would be expensive on the opposite kind of machi= ne.

Not only that, but if one used an e= xactly `char`-width value to hold, er, character data as returned from `get= char` et al, then one would necessarily give up the possibility of handling= whatever character value was chosen for the sentinel=C2=A0marking end-of-i= nput stream.=C2=A0 `getchar` et al are defined to return EOF on end of inpu= t; if they didn't return a wider type than `char`, there would be data = that could not be read. On probably every machine I am ever likely to use a= gain in my lifetime, byte value 255 would be -1 as a signed char, but it is= also a perfect valid value for a byte.

The detail= s of whether char is signed or unsigned aside, use of a wider type is neces= sary for correctness and ability to completely represent the input data.

It's one of the things that made C a good choice on a wide variety of m= achines.

I guess I always "saw" the return value of the getchar() as being= in a int sized register, at first namely R0, so kept the character values = returned as ints. The actual EOF indication from a read is a return value o= f zero for the number of characters read.

That's certainly true. Had C supported multiple return values or som= e kind of option type from the outset, it might have been that `getchar`, r= ead, etc, returned a pair with some useful value (e.g., for `getchar` the v= alue of the byte read; for `read` a length) and some indication of an error= /EOF/OK value etc. Notably, both Go and Rust support essentially this: in G= o, `io.Read()` returns a `(int, error)` pair, and the error is `io.EOF` on = end-of-input; in Rust, the `read` method of the `Read` trait returns a `Res= ult<usize, io::Error>`, though a `Result::Ok(n)`, where `n=3D=3D0` in= dicates EOF.

But I'm just making noise because I'm sure everyone knows all this.=

I think it's worthwhile stating th= ese things explicitly, sometimes.

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 - Dan C.

> On May 15, 2020, at 4:18 PM, ron@ronnatalie.com wrote:
>
> EOF is defined to be -1.
> getchar() returns int, but c is a unsigned char, the value of (c =3D g= etchar()) will be 255.=C2=A0 =C2=A0 This will never compare equal to -1. >=C2=A0
>=C2=A0
>=C2=A0
> Ron,
>=C2=A0
> Hmmm... getchar/getc are defined as returning int in the man page and = C is traditionally defined as an int in this code..
>=C2=A0
> On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
>> Unfortunately, if c is char on a machine with unsigned chars, or i= t=E2=80=99s of type unsigned char, the EOF will never be detected.
>>=C2=A0
>>=C2=A0
>>=C2=A0
>>>=C2=A0 =C2=A0 =C2=A0=E2=80=A2 while ((c =3D getchar()) !=3D EOF= ) if (c =3D=3D '\n') { /* entire record is now there */

--000000000000972f5d05a5c63b05--