On Fri, May 15, 2020 at 8:58 PM Brantley Coile wrote: > I always kept local, single characters in ints. This avoided the problem > with loading a character being signed or unsigned. The reason for not > specifying is obvious. Today, you can pick the move-byte-into-word > instruction that either sign extends or doesn't. But when C was defined > that wasn't the case. Some machines sign extended when a byte was loaded > into a register and some filled the upper bits with zero. For machines that > filled with zero, a char was unsigned. If you forced the language to do one > or the other, it would be expensive on the opposite kind of machine. > Not only that, but if one used an exactly `char`-width value to hold, er, character data as returned from `getchar` et al, then one would necessarily give up the possibility of handling whatever character value was chosen for the sentinel marking end-of-input stream. `getchar` et al are defined to return EOF on end of input; if they didn't return a wider type than `char`, there would be data that could not be read. On probably every machine I am ever likely to use again in my lifetime, byte value 255 would be -1 as a signed char, but it is also a perfect valid value for a byte. The details of whether char is signed or unsigned aside, use of a wider type is necessary for correctness and ability to completely represent the input data. It's one of the things that made C a good choice on a wide variety of > machines. > > I guess I always "saw" the return value of the getchar() as being in a int > sized register, at first namely R0, so kept the character values returned > as ints. The actual EOF indication from a read is a return value of zero > for the number of characters read. > That's certainly true. Had C supported multiple return values or some kind of option type from the outset, it might have been that `getchar`, read, etc, returned a pair with some useful value (e.g., for `getchar` the value of the byte read; for `read` a length) and some indication of an error/EOF/OK value etc. Notably, both Go and Rust support essentially this: in Go, `io.Read()` returns a `(int, error)` pair, and the error is `io.EOF` on end-of-input; in Rust, the `read` method of the `Read` trait returns a `Result`, though a `Result::Ok(n)`, where `n==0` indicates EOF. But I'm just making noise because I'm sure everyone knows all this. > I think it's worthwhile stating these things explicitly, sometimes. - Dan C. > On May 15, 2020, at 4:18 PM, ron@ronnatalie.com wrote: > > > > EOF is defined to be -1. > > getchar() returns int, but c is a unsigned char, the value of (c = > getchar()) will be 255. This will never compare equal to -1. > > > > > > > > Ron, > > > > Hmmm... getchar/getc are defined as returning int in the man page and C > is traditionally defined as an int in this code.. > > > > On Fri, May 15, 2020 at 4:02 PM wrote: > >> Unfortunately, if c is char on a machine with unsigned chars, or it’s > of type unsigned char, the EOF will never be detected. > >> > >> > >> > >>> • while ((c = getchar()) != EOF) if (c == '\n') { /* entire record > is now there */ > >