The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] non-blocking IO
@ 2020-05-31 11:09 Paul Ruizendaal
  2020-05-31 16:05 ` Clem Cole
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Ruizendaal @ 2020-05-31 11:09 UTC (permalink / raw)
  To: TUHS main list


This time looking into non-blocking file access. I realise that the term has wider application, but right now my scope is “communication files” (tty’s, pipes, network connections).

As far as I can tell, prior to 1979 non-blocking access did not appear in the Spider lineage, nor did it appear in the NCP Unix lineage. First appearance of non-blocking behaviour seems to have been with Chesson’s multiplexed files where it is marked experimental (an experiment within an experiment, so to say) in 1979.

The first appearance resembling the modern form appears to have been with SysIII in 1980, where open() gains a O_NDELAY flag and appears to have had two uses: (i) when used on TTY devices it makes open() return without waiting for a carrier signal (and subsequent read() / write() calls on the descriptor return with 0, until the carrier/data is there); and (ii) on pipes and fifo’s, read() and write() will not block on an empty/full pipe, but return 0 instead. This behaviour seems to have continued into SysVR1, I’m not sure when EAGAIN came into use as a return value for this use case in the SysV lineage. Maybe with SysVR3 networking?

In the Research lineage, the above SysIII approach does not seem to exist, although the V8 manual page for open() says under BUGS "It should be possible [...] to optionally call open without the possibility of hanging waiting for carrier on communication lines.” In the same location for V10 it reads "It should be possible to call open without waiting for carrier on communication lines.”

The July 1981 design proposals for 4.2BSD note that SysIII non-blocking files are a useful feature and should be included in the new system. In Jan/Feb 1982 this appears to be coded up, although not all affected files are under SCCS tracking at that point in time. Non-blocking behaviour is changed from the SysIII semantics, in that EWOULDBLOCK is returned instead of 0 when progress is not possible. The non-blocking behaviour is extended beyond TTY’s and pipes to sockets, with additional errors (such as EINPROGRESS). At this time EWOULDBLOCK is not the same error number as EGAIN.

It would seem that the differences between the BSD and SysV lineages in this area persisted until around 2000 or so.

Is that a fair summary?

- - -

I’m not quite sure why the Research lineage did not include non-blocking behaviour, especially in view of the man page comments. Maybe it was seen as against the Unix philosophy, with select() offering sufficient mechanism to avoid blocking (with open() the hard corner case)?

In the SysIII code base, the FNDELAY flag is stored on the file pointer (i.e. with struct file). This has the effect that the flag is shared between processes using the same pointer, but can be changed in one process (using fcntl) without the knowledge of others. It seems more logical to me to have made it a per-process flag (i.e. with struct user) instead. In this aspect the SysIII semantics carry through to today’s Unix/Linux. Was this semantic a deliberate design choice, or simply an overlooked complication?








^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [TUHS] non-blocking IO
@ 2020-06-01 23:17 Noel Chiappa
  0 siblings, 0 replies; 31+ messages in thread
From: Noel Chiappa @ 2020-06-01 23:17 UTC (permalink / raw)
  To: tuhs; +Cc: jnc

    > From: Paul Ruizendaal

    > This time looking into non-blocking file access.  (... right now my
    > scope is 'communication files' (tty's, pipes, network connections).
    > ...
    > First appearance of non-blocking behaviour seems to have been with
    > Chesson's multiplexed files ... in 1979.

At around that point in time (I don't have the very _earliest_ code, to get an
exact date, but the oldest traces I see [in mapalloc(), below] are from
September '78), the CSR group at MIT-LCS (which were the people in LCS doing
networking) was doing a lot with asynchronous I/O (when you're working below
the reliable stream level, you can't just do a blocking 'read' for a packet;
it pretty much has to be asynchronous). I was working in Unix V6 - we were
building an experimental 1Mbit/second ring - and there was work in Multics as
well.

I don't think the wider Unix community heard about the Unix work, but our
group regularly filed updates on our work for the 'Internet Monthly Reports',
which was distributed to the whole TCP/IP experimental community. If you can
find an archive of early issues (I'm too lazy to go look for one), we should
be in there (although our report will alsocover the Multics TCP/IP work, and
maybe some other stuff too).


There were two main generations of code; I don't recall the second one well,
and I'm too lazy to go look, but I can tell you off the top of my head a bit
about how the first one worked. Open/read/write all looked standard to the
user in the process (the latter two were oriented to packets, a bit like raw
disks being blocks); multiple operations could be queued in each
direction. (There was only one user allowed at a time for the network device;
no input demultiplexing.)

Whenever an I/O operation completed, the process was sent a signal. Since the
read/write call had long since returned, it had to do a getty() to get info
about that operation - the size of the packet, error indications, etc.

One complication was that for a variety of reasons (we wanted to avoid having
to copy data, and the interface did not have packet buffers) we did DMA
directly to/from the user's memory; this meant the process has to be locked
in place while I/O was pending.

(I didn't realize it at the time, but we dodged a bullet there; a comment
in xalloc(), which I only absorbed recently, explains the problem. More
here:

  https://gunkies.org/wiki/UNIX_V6_internals#exec()_and_pure-text_images

if anyone wants the gory details.)


That all (the queing, signals for I/O completion, locking the process to a
fixed location in memory while it continued to run) etc all worked well, as I
recall (although I guess it couldn't do an sbrk() while locked), but one
complication was the UNIBUS map on the -11/70.

The DSSR/RTS group at LCS wanted to have a ring interface, but their machine
was a /70 (ours, the one the driver was initially done on/for, was a /40), so
with DMA we had to use the UNIBUS map.

The stock V6 code had mapalloc(), (and mapfree(), both called on all DMA
operations), but... it allocated the whole map to whatever I/O operation asked
for the map. Clearly, if you're about to start a network input operation, and
wait a packet to show up, you don't want the disk controller to have to sit
and wait for for a packet to show up so _it_ can have the map.

Luckily, mapalloc() was called with a pointer to the buffer header (which had
all the info about the xfer), so I added a 'ubmap' array, and called the
existing malloc() on it, to allocate only a big enough chunk of the UNIBUS map
for the I/O operation defined by the header. Since there was 248KB of map
space, and the largest single DMA transfer possible in V6 was about 64KB
(maybe a little more, for a max-sized process with its 'user' block), there
was never a problem with contention for the map, and we didn't have to touch
any of the other drivers at all.

That was dandy, and only a couple of lines of extra code, but I somehow made a
math error in my changes, and as I recall I had to debug it with a printf() in
mapalloc(). I was not popular that day! Luckily, the error was quickly
obvious, a fix was applied, and we were on our way.

	 Noel


^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [TUHS] non-blocking IO
@ 2020-06-02  0:08 Noel Chiappa
  0 siblings, 0 replies; 31+ messages in thread
From: Noel Chiappa @ 2020-06-02  0:08 UTC (permalink / raw)
  To: tuhs; +Cc: jnc

    > when you're working below the reliable stream level, you can't just do a
    > blocking 'read' for a packet; it pretty much has to be asynchronous

Oh, you should look at the early BBN TCP for V6 Unix - they would have faced
the same issue, with their TCP process. They did have the capac() call (which
kind of alleviates the need for non-blocking I/O), but that may have only been
available for ports/pipes; I'm not sure if the ARPANET device supported it.

(With the NCP as well, that did some amount of demultiplexing in the kernel,
and probably had buffering there, so, if so, in theory capac() could have been
done there. Of course, with the ARPANET link being only 100Kbit/sec maximum -
although only to a host on the same IMP - the overhead of copying buffered
data made kernel buffering more 'affordable'.)

     Noel

^ permalink raw reply	[flat|nested] 31+ messages in thread
* [TUHS] non-blocking IO
@ 2020-06-02  8:22 Paul Ruizendaal
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Ruizendaal @ 2020-06-02  8:22 UTC (permalink / raw)
  To: TUHS main list

>     > when you're working below the reliable stream level, you can't just do a > blocking 'read' for a packet; it pretty much has to be asynchronous
> Oh, you should look at the early BBN TCP for V6 Unix - they would have faced the same issue, with their TCP process. They did have the capac() call (which kind of alleviates the need for non-blocking I/O), but that may have only been available for ports/pipes; I'm not sure if the ARPANET device supported it.

I did. There is capac() support also for the IMP interface:
https://www.tuhs.org/cgi-bin/utree.pl?file=BBN-V6/dmr/imp11a.c
(see bottom two functions)

BBN took the same approach as Research: with capac() or select() one can prevent blocking on read() and write().


^ permalink raw reply	[flat|nested] 31+ messages in thread
* [TUHS] non-blocking IO
@ 2020-06-02 14:19 Paul Ruizendaal
  2020-06-02 17:45 ` Paul Winalski
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Ruizendaal @ 2020-06-02 14:19 UTC (permalink / raw)
  To: TUHS main list

> At around that point in time (I don't have the very _earliest_ code, to get an exact date, but the oldest traces I see [in mapalloc(), below] are from September '78), the CSR group at MIT-LCS (which were the people in LCS doing networking) was doing a lot with asynchronous I/O (when you're working below the reliable stream level, you can't just do a blocking 'read' for a packet; it pretty much has to be asynchronous). I was working in Unix V6 - we were building an experimental 1Mbit/second ring - and there was work in Multics as well.

> I don't think the wider Unix community heard about the Unix work, but our group regularly filed updates on our work for the 'Internet Monthly Reports', which was distributed to the whole TCP/IP experimental community. If you can find an archive of early issues (I'm too lazy to go look for one), we should be in there (although our report will alsocover the Multics TCP/IP work, and maybe some other stuff too).

Sounds very interesting!

Looked around a bit, but I did not find a source for the “Internet Monthly Reports” for the late 70’s (rfc-editor.org/museum/ has them for the 1990’s).

In the 1970’s era, it seems that NCP Unix went in another direction, using newly built message and event facilities to prevent blocking. This is described in "CAC Technical Memorandum No. 84, Illinois Inter-Process Communication Facility for Unix.” - but that document appears lost as well.

Ah, well, topics for another day.

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [TUHS] non-blocking IO
@ 2020-06-02 20:13 Noel Chiappa
  2020-06-02 20:43 ` Clem Cole
  0 siblings, 1 reply; 31+ messages in thread
From: Noel Chiappa @ 2020-06-02 20:13 UTC (permalink / raw)
  To: tuhs; +Cc: jnc

    > From: Paul Winalski

    > I'm curious as to what the rationale was for Unix to have been designed
    > with basic I/O being blocking rather than asynchronous.

It's a combination of two factors, I reckon. One, which is better depends a
lot on the type of thing you're trying to do. For many typical thing (e.g.
'ls'), blocking is a good fit. And, as As Arnold says, asyhchronous I/O is
more complicated, and Unix was (well, back then at least) all about getting
the most bang for the least bucks.

More complicated things do sometimes benefit from asynchronous I/O, but
complicated things weren't Unix's 'target market'. E.g. even though pipes
post-date the I/O decision, they too are a better match to blocking I/O.


    > From: Arnold Skeeve

    > the early Unixs were on smaller -11s, not the /45 or /70 with split I&D
    > space and the ability to address lost more RAM.

Ahem. Lots more _core_. People keeep forgetting that we're looking at
decicions made at a time when each bit in main memory was stored in a
physically separate storage device, and having tons of memory was a dream of
the future.

E.g. the -11/40 I first ran Unix on had _48 KB_ of core memory - total!
And that had to hold the resident OS, plus the application! It's no
surprise that Unix was so focused on small size - and as a corollary, on
high bang/buck ratio.

But even in his age of lighting one's cigars with gigabytes of main memory
(literally), small is still beautiful, because it's easier to understand, and
complexity is bad. So it's too bad Unix has lost that extreme parsimony.


    > From: Dan Cross 

    > question whether asynchrony itself remains untamed, as Doug put it, or
    > if rather it has proved difficult to retrofit asynchrony onto a system
    > designed around fundamentally synchronous primitives?

I'm not sure it's 'either or'; I reckon they are both true.


	Noel

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [TUHS] non-blocking IO
@ 2020-06-06 13:29 Noel Chiappa
  0 siblings, 0 replies; 31+ messages in thread
From: Noel Chiappa @ 2020-06-06 13:29 UTC (permalink / raw)
  To: tuhs; +Cc: jnc

    > From: Peter Jeremy <peter@rulingia.com>

    > My view may be unpopular but I've always been disappointed that Unix
    > implemented blocking I/O only and then had to add various hacks to cover
    > up for the lack of asynchonous I/O.  It's trivial to build blocking I/O
    > operations on top of asynchonous I/O operations.  It's impossible to do
    > the opposite without additional functionality.

Back when I started working on networks, I looked at other kinds of systems
to see what general lessons I could learn about the evolution of systems, which
might apply to the networks we were building. (I should have written all that up,
never did, sigh.)

One major one was that a system, when small, often collapses multiple needs
onto one machanism. Only as the system grows in size do scaling effects, etc
necessitate breaking them up into separate mechanisms. (There are some good
examples in file systems, for example.)

I/O is a perfect example of this; a small system can get away with only one
kind; it's only when the system grows that one benefits from having both
synchronous and asynchronous. Since the latter is more complicated, _both_ in
the system and in the applications which use it, it's no surprise that
synchronous was the pick.


The reasons why synchronous is simpler in applications have a nice
illustration in operating systems, which inevitably support both blocking
(i.e. implied process switching) and non-blocking 'operation initiation' and
'operation completed notification' mechanisms. (The 'timeout/callout'
mechanism is Unix is an example of the latter, albeit specialized to timers.)

Prior to the Master Control Program in the Burroughs B000 (there may be older
examples, but I don't know of them - I would be more than pleased to be
informed of any such, if there are), the technique of having a per-process
_kernel_ stack, and on a process block (and implied switch), switching stacks,
was not used. This idea was picked up for Jerry Saltzer's PhD thesis, used in
Multics, and then copied by almost every other OS since (including Unix).

The advantage is fairly obvious: if one is deep in some call stack, one can
just wait there until the thing one needs is done, and then resume without
having to work one's way back to that spot - which will inevitably be
complicated (perhaps more in the need to _return_ through all the places that
called down - although the code to handle a 'not yet' return through all those
places, after the initial call down, will not be inconsiderable either).


Exactly the same reasoning applies to blocking I/O; one can sit where one is,
waiting for the I/O to be done, without having to work one's way back there
later. (Examples are legion, e.g. in recursive descent parsers - and can make
the code _much_ simpler.)

It's only when one _can't_ wait for the I/O to complete (e.g. for a packet to
arrive - although others have mentioned other examples in this thread, such as
'having other stuff to do in the meanwhile') than having only blocking I/O
becomes a problem...

In cases where blocking would be better, one can always build a 'blocking' I/O
subsystem on top of asynchronous I/O primitives.

However, in a _tiny_ system (remember my -11/40 which ran Unix on a system
with _48KB_ of main memory _total_- i.e. OS and application together had to be
less than 48KB - no virtual memory on that machine :-), building blocking I/O
on top of asynchonous I/O, for those very few cases which need it, may not be
the best use of very limited space - although I agree that it's the way to go,
overall.

	Noel

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-06-12  8:19 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-31 11:09 [TUHS] non-blocking IO Paul Ruizendaal
2020-05-31 16:05 ` Clem Cole
2020-05-31 16:46   ` Warner Losh
2020-05-31 22:01     ` Rob Pike
2020-06-01  3:32       ` Dave Horsfall
2020-06-01 14:58         ` Larry McVoy
2020-06-04  9:04           ` Peter Jeremy
2020-06-04 14:19             ` Warner Losh
2020-06-04 16:34               ` Tony Finch
2020-06-04 16:50               ` Larry McVoy
2020-06-05 16:00                 ` Dan Cross
2020-06-12  8:18                   ` Dave Horsfall
2020-06-01 16:58     ` Heinz Lycklama
2020-06-01 23:17 Noel Chiappa
2020-06-02  0:08 Noel Chiappa
2020-06-02  8:22 Paul Ruizendaal
2020-06-02 14:19 Paul Ruizendaal
2020-06-02 17:45 ` Paul Winalski
2020-06-02 17:59   ` arnold
2020-06-02 18:53     ` Paul Winalski
2020-06-02 19:18       ` Clem Cole
2020-06-02 21:15         ` Lawrence Stewart
2020-06-02 18:23   ` Dan Cross
2020-06-02 18:56     ` Paul Winalski
2020-06-02 19:23       ` Clem Cole
2020-06-02 20:13 Noel Chiappa
2020-06-02 20:43 ` Clem Cole
2020-06-02 22:14   ` Rich Morin
2020-06-03 16:31     ` Paul Winalski
2020-06-03 19:19       ` John P. Linderman
2020-06-06 13:29 Noel Chiappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).