On Fri, May 15, 2020 at 8:58 PM Brantley Coile <brantley@coraid.com> wrote:
I always kept local, single characters in ints. This avoided the problem with loading a character being signed or unsigned. The reason for not specifying is obvious. Today, you can pick the move-byte-into-word instruction that either sign extends or doesn't. But when C was defined that wasn't the case. Some machines sign extended when a byte was loaded into a register and some filled the upper bits with zero. For machines that filled with zero, a char was unsigned. If you forced the language to do one or the other, it would be expensive on the opposite kind of machine.

Not only that, but if one used an exactly `char`-width value to hold, er, character data as returned from `getchar` et al, then one would necessarily give up the possibility of handling whatever character value was chosen for the sentinel marking end-of-input stream.  `getchar` et al are defined to return EOF on end of input; if they didn't return a wider type than `char`, there would be data that could not be read. On probably every machine I am ever likely to use again in my lifetime, byte value 255 would be -1 as a signed char, but it is also a perfect valid value for a byte.

The details of whether char is signed or unsigned aside, use of a wider type is necessary for correctness and ability to completely represent the input data.

It's one of the things that made C a good choice on a wide variety of machines.

I guess I always "saw" the return value of the getchar() as being in a int sized register, at first namely R0, so kept the character values returned as ints. The actual EOF indication from a read is a return value of zero for the number of characters read.

That's certainly true. Had C supported multiple return values or some kind of option type from the outset, it might have been that `getchar`, read, etc, returned a pair with some useful value (e.g., for `getchar` the value of the byte read; for `read` a length) and some indication of an error/EOF/OK value etc. Notably, both Go and Rust support essentially this: in Go, `io.Read()` returns a `(int, error)` pair, and the error is `io.EOF` on end-of-input; in Rust, the `read` method of the `Read` trait returns a `Result<usize, io::Error>`, though a `Result::Ok(n)`, where `n==0` indicates EOF.

But I'm just making noise because I'm sure everyone knows all this.

I think it's worthwhile stating these things explicitly, sometimes.

        - Dan C.

> On May 15, 2020, at 4:18 PM, ron@ronnatalie.com wrote:
>
> EOF is defined to be -1.
> getchar() returns int, but c is a unsigned char, the value of (c = getchar()) will be 255.    This will never compare equal to -1.



> Ron,

> Hmmm... getchar/getc are defined as returning int in the man page and C is traditionally defined as an int in this code..

> On Fri, May 15, 2020 at 4:02 PM <ron@ronnatalie.com> wrote:
>> Unfortunately, if c is char on a machine with unsigned chars, or it’s of type unsigned char, the EOF will never be detected.
>> 
>> 
>> 
>>>     • while ((c = getchar()) != EOF) if (c == '\n') { /* entire record is now there */