More from Yost below.

My purpose in relating this was to point out that the original unix
implementation choices were mostly fine; they just had to be tweaked a
bit. Clearly an independent implementation such as in Linux would veer
off in a different direction, done in a different era and with different prior
experience. I was a bit surprised that Bruce didn't make this same
tweak to cblock size but no way of knowing his reasons now.

Begin forwarded message:

From: Dave Yost 
Subject: Re: [TUHS] 386BSD released
Date: July 16, 2021 at 9:21:53 AM PDT
To: Bakul Shah

Plz forward this
thanks

This was in early 1983 or late 1982.

We got the serial driver to go 19200 out and 9600 in.

I did 2 things in the Fortune Systems 68k serial driver:
• hand-coded asm pseudo-DMA, suggested by Robert P Warnock III
• cblock size 128 bytes instead of 8, count ’em, 8.

From Lyons,
the unix v6 serial driver used a clist of cblocks, like this:

The pseudo-DMA interrupt handler was a function made up of a few hand-coded 68k instructions, entered into C code as hex data. That code transferred one byte into or out of a cblock, and at the end of the cblock it grabbed the next cblock from a queue and rang the “doorbell” hardware interrupt, which caused a “software interrupt” at lower priority for further processing. Rob put the doorbell into the architecture with a couple of gates on the board because he was well aware of this software interrupt trick, which was already used in bsd. For some reason I didn’t look at the bsd code, probably because Rob’s explanation was lucid and sufficient.

I once had occasion to mention this, and specifically the relaxing of the draconian 8 byte cblock size, to Dennis Ritchie. He said, sure, why not, the 8 byte cblock size was just a neglected holdover from early days.

This approach was just an interrupt version of what I had proposed to Rick Kiessig as a first project at Fortune Systems: to get a 30x speed up when writing to the Fortune Systems memory-mapped character display hardware. I had done the same thing a few years earlier in Z80 in C code in a serial CRT terminal. It’s simple and obvious: make the inner loop do as little as possible. The most primitive operation needs to be a block operation, not a byte-at-a-time operation.