More from Yost below. My purpose in relating this was to point out that the original unix implementation choices were mostly fine; they just had to be tweaked a bit. Clearly an independent implementation such as in Linux would veer off in a different direction, done in a different era and with different prior experience. I was a bit surprised that Bruce didn't make this same tweak to cblock size but no way of knowing his reasons now. > Begin forwarded message: > > From: Dave Yost > Subject: Re: [TUHS] 386BSD released > Date: July 16, 2021 at 9:21:53 AM PDT > To: Bakul Shah > > Plz forward this > thanks > > This was in early 1983 or late 1982. > > We got the serial driver to go 19200 out and 9600 in. > > I did 2 things in the Fortune Systems 68k serial driver: > • hand-coded asm pseudo-DMA, suggested by Robert P Warnock III > • cblock size 128 bytes instead of 8, count ’em, 8. > > From Lyons, > https://cs3210.cc.gatech.edu/r/unix6.pdf > the unix v6 serial driver used a clist of cblocks, like this: > > > The pseudo-DMA interrupt handler was a function made up of a few hand-coded 68k instructions, entered into C code as hex data. That code transferred one byte into or out of a cblock, and at the end of the cblock it grabbed the next cblock from a queue and rang the “doorbell” hardware interrupt, which caused a “software interrupt” at lower priority for further processing. Rob put the doorbell into the architecture with a couple of gates on the board because he was well aware of this software interrupt trick, which was already used in bsd. For some reason I didn’t look at the bsd code, probably because Rob’s explanation was lucid and sufficient. > > I once had occasion to mention this, and specifically the relaxing of the draconian 8 byte cblock size, to Dennis Ritchie. He said, sure, why not, the 8 byte cblock size was just a neglected holdover from early days. > > This approach was just an interrupt version of what I had proposed to Rick Kiessig as a first project at Fortune Systems: to get a 30x speed up when writing to the Fortune Systems memory-mapped character display hardware. I had done the same thing a few years earlier in Z80 in C code in a serial CRT terminal. It’s simple and obvious: make the inner loop do as little as possible. The most primitive operation needs to be a block operation, not a byte-at-a-time operation.