This time looking into non-blocking file access. I realise that the term has wider application, but right now my scope is “communication files” (tty’s, pipes, network connections). As far as I can tell, prior to 1979 non-blocking access did not appear in the Spider lineage, nor did it appear in the NCP Unix lineage. First appearance of non-blocking behaviour seems to have been with Chesson’s multiplexed files where it is marked experimental (an experiment within an experiment, so to say) in 1979. The first appearance resembling the modern form appears to have been with SysIII in 1980, where open() gains a O_NDELAY flag and appears to have had two uses: (i) when used on TTY devices it makes open() return without waiting for a carrier signal (and subsequent read() / write() calls on the descriptor return with 0, until the carrier/data is there); and (ii) on pipes and fifo’s, read() and write() will not block on an empty/full pipe, but return 0 instead. This behaviour seems to have continued into SysVR1, I’m not sure when EAGAIN came into use as a return value for this use case in the SysV lineage. Maybe with SysVR3 networking? In the Research lineage, the above SysIII approach does not seem to exist, although the V8 manual page for open() says under BUGS "It should be possible [...] to optionally call open without the possibility of hanging waiting for carrier on communication lines.” In the same location for V10 it reads "It should be possible to call open without waiting for carrier on communication lines.” The July 1981 design proposals for 4.2BSD note that SysIII non-blocking files are a useful feature and should be included in the new system. In Jan/Feb 1982 this appears to be coded up, although not all affected files are under SCCS tracking at that point in time. Non-blocking behaviour is changed from the SysIII semantics, in that EWOULDBLOCK is returned instead of 0 when progress is not possible. The non-blocking behaviour is extended beyond TTY’s and pipes to sockets, with additional errors (such as EINPROGRESS). At this time EWOULDBLOCK is not the same error number as EGAIN. It would seem that the differences between the BSD and SysV lineages in this area persisted until around 2000 or so. Is that a fair summary? - - - I’m not quite sure why the Research lineage did not include non-blocking behaviour, especially in view of the man page comments. Maybe it was seen as against the Unix philosophy, with select() offering sufficient mechanism to avoid blocking (with open() the hard corner case)? In the SysIII code base, the FNDELAY flag is stored on the file pointer (i.e. with struct file). This has the effect that the flag is shared between processes using the same pointer, but can be changed in one process (using fcntl) without the knowledge of others. It seems more logical to me to have made it a per-process flag (i.e. with struct user) instead. In this aspect the SysIII semantics carry through to today’s Unix/Linux. Was this semantic a deliberate design choice, or simply an overlooked complication?
[-- Attachment #1: Type: text/plain, Size: 2556 bytes --] On Sun, May 31, 2020 at 7:10 AM Paul Ruizendaal <pnr@planet.nl> wrote: > This behaviour seems to have continued into SysVR1, I’m not sure when > EAGAIN came into use as a return value for this use case in the SysV > lineage. Maybe with SysVR3 networking? Actually, I'm pretty sure that was a product of the POSIX discussions. BSD already had networking an EWOULDBLOCK. We had argued about EWOULDBLOCK a great deal, we also were arguing about signal semantics. I've forgotten many of the details, Heinz may remember more than I do. EAGAIN was created as a compromise -- IIRC neither system had it yet. SVR3 networking was where it went into System V, although some of the AT&T representatives were none too happy about it. > > In the Research lineage, the above SysIII approach does not seem to exist, > although the V8 manual page for open() says under BUGS "It should be > possible [...] to optionally call open without the possibility of hanging > waiting for carrier on communication lines.” In the same location for V10 > it reads "It should be possible to call open without waiting for carrier on > communication lines.” > > The July 1981 design proposals for 4.2BSD note that SysIII non-blocking > files are a useful feature and should be included in the new system. In > Jan/Feb 1982 this appears to be coded up, although not all affected files > are under SCCS tracking at that point in time. Non-blocking behaviour is > changed from the SysIII semantics, in that EWOULDBLOCK is returned instead > of 0 when progress is not possible. The non-blocking behaviour is extended > beyond TTY’s and pipes to sockets, with additional errors (such as > EINPROGRESS). At this time EWOULDBLOCK is not the same error number as > EGAIN. > My memory is that Keith was the BSD (CSRG) person at the POSIX meeting (he, Jim McGinness of DEC, and I created PAX at one point as a compromise). I wish I could remember all of the details, but this was all argued at the POSIX meetings. As I said before the folks from AT&T just wanted to take the SVID and rubber stamp it at the specification. Part of it the problem was they wanted to be free to do what things/make choices that the rest of us might or might not like (for instance, they did not want the sockets interface). > > It would seem that the differences between the BSD and SysV lineages in > this area persisted until around 2000 or so. > Yep - cause around then POSIX started to settle out and both systems began to follow it. [-- Attachment #2: Type: text/html, Size: 3650 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2908 bytes --] Sorry to top post, but LSX or Miniunix had non blocking I/O as well. It was in one of the documents that Clem scanned in the last year. It specifically was an experiment into how to do it. Warner On Sun, May 31, 2020, 10:07 AM Clem Cole <clemc@ccc.com> wrote: > > > On Sun, May 31, 2020 at 7:10 AM Paul Ruizendaal <pnr@planet.nl> wrote: > >> This behaviour seems to have continued into SysVR1, I’m not sure when >> EAGAIN came into use as a return value for this use case in the SysV >> lineage. Maybe with SysVR3 networking? > > Actually, I'm pretty sure that was a product of the POSIX discussions. > BSD already had networking an EWOULDBLOCK. We had argued about > EWOULDBLOCK a great deal, we also were arguing about signal semantics. > I've forgotten many of the details, Heinz may remember more than I do. > EAGAIN was created as a compromise -- IIRC neither system had it yet. > SVR3 networking was where it went into System V, although some of the AT&T > representatives were none too happy about it. > > > >> >> In the Research lineage, the above SysIII approach does not seem to >> exist, although the V8 manual page for open() says under BUGS "It should be >> possible [...] to optionally call open without the possibility of hanging >> waiting for carrier on communication lines.” In the same location for V10 >> it reads "It should be possible to call open without waiting for carrier on >> communication lines.” >> >> The July 1981 design proposals for 4.2BSD note that SysIII non-blocking >> files are a useful feature and should be included in the new system. In >> Jan/Feb 1982 this appears to be coded up, although not all affected files >> are under SCCS tracking at that point in time. Non-blocking behaviour is >> changed from the SysIII semantics, in that EWOULDBLOCK is returned instead >> of 0 when progress is not possible. The non-blocking behaviour is extended >> beyond TTY’s and pipes to sockets, with additional errors (such as >> EINPROGRESS). At this time EWOULDBLOCK is not the same error number as >> EGAIN. >> > My memory is that Keith was the BSD (CSRG) person at the POSIX meeting > (he, Jim McGinness of DEC, and I created PAX at one point as a > compromise). I wish I could remember all of the details, but this was all > argued at the POSIX meetings. > > As I said before the folks from AT&T just wanted to take the SVID and > rubber stamp it at the specification. Part of it the problem was they > wanted to be free to do what things/make choices that the rest of us might > or might not like (for instance, they did not want the sockets interface). > > > >> >> It would seem that the differences between the BSD and SysV lineages in >> this area persisted until around 2000 or so. >> > Yep - cause around then POSIX started to settle out and both systems began > to follow it. > > > [-- Attachment #2: Type: text/html, Size: 4249 bytes --]
[-- Attachment #1: Type: text/plain, Size: 359 bytes --] > > I’m not quite sure why the Research lineage did not include non-blocking > behaviour, especially in view of the man page comments. Maybe it was seen > as against the Unix philosophy, with select() offering sufficient mechanism > to avoid blocking (with open() the hard corner case)? That's it. Select was good enough for our purposes. -rob [-- Attachment #2: Type: text/html, Size: 559 bytes --]
[-- Attachment #1: Type: text/plain, Size: 506 bytes --] On Mon, 1 Jun 2020, Rob Pike wrote: > I’m not quite sure why the Research lineage did not include > non-blocking behaviour, especially in view of the man page comments. > Maybe it was seen as against the Unix philosophy, with select() > offering sufficient mechanism to avoid blocking (with open() the hard > corner case)? > > That's it. Select was good enough for our purposes. After being dragged through both Berserkley and SysVile, I never did get the hang of poll()/select() etc,,, -- Dave
On Mon, Jun 01, 2020 at 01:32:56PM +1000, Dave Horsfall wrote:
> On Mon, 1 Jun 2020, Rob Pike wrote:
>
> > I???m not quite sure why the Research lineage did not include
> > non-blocking behaviour, especially in view of the man page comments.
> > Maybe it was seen as against the Unix philosophy, with select()
> > offering sufficient mechanism to avoid blocking (with open() the hard
> > corner case)?
> >
> >That's it. Select was good enough for our purposes.
>
> After being dragged through both Berserkley and SysVile, I never did get the
> hang of poll()/select() etc,,,
I'm sure you could, select is super handy, think a network server like
apache.
[-- Attachment #1: Type: text/plain, Size: 3428 bytes --] I did add a few new features to LSX to deal with contiguous files and to handle asynchronous read/write's for real time applications. They are described in the LSX paper in the 1978 BSTJ on the UNIX Time-Sharing System. Heinz On 5/31/2020 9:46 AM, Warner Losh wrote: > Sorry to top post, but LSX or Miniunix had non blocking I/O as well. > It was in one of the documents that Clem scanned in the last year. It > specifically was an experiment into how to do it. > > Warner > > On Sun, May 31, 2020, 10:07 AM Clem Cole <clemc@ccc.com > <mailto:clemc@ccc.com>> wrote: > > > > On Sun, May 31, 2020 at 7:10 AM Paul Ruizendaal <pnr@planet.nl > <mailto:pnr@planet.nl>> wrote: > > This behaviour seems to have continued into SysVR1, I’m not > sure when EAGAIN came into use as a return value for this use > case in the SysV lineage. Maybe with SysVR3 networking? > > Actually, I'm pretty sure that was a product of the POSIX > discussions. BSD already had networking an EWOULDBLOCK. We had > argued about EWOULDBLOCK a great deal, we also were arguing about > signal semantics. I've forgotten many of the details, Heinz may > remember more than I do. EAGAIN was created as a compromise -- > IIRC neither system had it yet. SVR3 networking was where it > went into System V, although some of the AT&T representatives were > none too happy about it. > > > In the Research lineage, the above SysIII approach does not > seem to exist, although the V8 manual page for open() says > under BUGS "It should be possible [...] to optionally call > open without the possibility of hanging waiting for carrier on > communication lines.” In the same location for V10 it reads > "It should be possible to call open without waiting for > carrier on communication lines.” > > The July 1981 design proposals for 4.2BSD note that SysIII > non-blocking files are a useful feature and should be included > in the new system. In Jan/Feb 1982 this appears to be coded > up, although not all affected files are under SCCS tracking at > that point in time. Non-blocking behaviour is changed from the > SysIII semantics, in that EWOULDBLOCK is returned instead of 0 > when progress is not possible. The non-blocking behaviour is > extended beyond TTY’s and pipes to sockets, with additional > errors (such as EINPROGRESS). At this time EWOULDBLOCK is not > the same error number as EGAIN. > > My memory is that Keith was the BSD (CSRG) person at the POSIX > meeting (he, Jim McGinness of DEC, and I created PAX at one point > as a compromise). I wish I could remember all of the details, > but this was all argued at the POSIX meetings. > > As I said before the folks from AT&T just wanted to take the SVID > and rubber stamp it at the specification. Part of it the problem > was they wanted to be free to do what things/make choices that the > rest of us might or might not like (for instance, they did not > want the sockets interface). > > > It would seem that the differences between the BSD and SysV > lineages in this area persisted until around 2000 or so. > > Yep - cause around then POSIX started to settle out and both > systems began to follow it. > [-- Attachment #2: Type: text/html, Size: 6759 bytes --]
> From: Paul Ruizendaal > This time looking into non-blocking file access. (... right now my > scope is 'communication files' (tty's, pipes, network connections). > ... > First appearance of non-blocking behaviour seems to have been with > Chesson's multiplexed files ... in 1979. At around that point in time (I don't have the very _earliest_ code, to get an exact date, but the oldest traces I see [in mapalloc(), below] are from September '78), the CSR group at MIT-LCS (which were the people in LCS doing networking) was doing a lot with asynchronous I/O (when you're working below the reliable stream level, you can't just do a blocking 'read' for a packet; it pretty much has to be asynchronous). I was working in Unix V6 - we were building an experimental 1Mbit/second ring - and there was work in Multics as well. I don't think the wider Unix community heard about the Unix work, but our group regularly filed updates on our work for the 'Internet Monthly Reports', which was distributed to the whole TCP/IP experimental community. If you can find an archive of early issues (I'm too lazy to go look for one), we should be in there (although our report will alsocover the Multics TCP/IP work, and maybe some other stuff too). There were two main generations of code; I don't recall the second one well, and I'm too lazy to go look, but I can tell you off the top of my head a bit about how the first one worked. Open/read/write all looked standard to the user in the process (the latter two were oriented to packets, a bit like raw disks being blocks); multiple operations could be queued in each direction. (There was only one user allowed at a time for the network device; no input demultiplexing.) Whenever an I/O operation completed, the process was sent a signal. Since the read/write call had long since returned, it had to do a getty() to get info about that operation - the size of the packet, error indications, etc. One complication was that for a variety of reasons (we wanted to avoid having to copy data, and the interface did not have packet buffers) we did DMA directly to/from the user's memory; this meant the process has to be locked in place while I/O was pending. (I didn't realize it at the time, but we dodged a bullet there; a comment in xalloc(), which I only absorbed recently, explains the problem. More here: https://gunkies.org/wiki/UNIX_V6_internals#exec()_and_pure-text_images if anyone wants the gory details.) That all (the queing, signals for I/O completion, locking the process to a fixed location in memory while it continued to run) etc all worked well, as I recall (although I guess it couldn't do an sbrk() while locked), but one complication was the UNIBUS map on the -11/70. The DSSR/RTS group at LCS wanted to have a ring interface, but their machine was a /70 (ours, the one the driver was initially done on/for, was a /40), so with DMA we had to use the UNIBUS map. The stock V6 code had mapalloc(), (and mapfree(), both called on all DMA operations), but... it allocated the whole map to whatever I/O operation asked for the map. Clearly, if you're about to start a network input operation, and wait a packet to show up, you don't want the disk controller to have to sit and wait for for a packet to show up so _it_ can have the map. Luckily, mapalloc() was called with a pointer to the buffer header (which had all the info about the xfer), so I added a 'ubmap' array, and called the existing malloc() on it, to allocate only a big enough chunk of the UNIBUS map for the I/O operation defined by the header. Since there was 248KB of map space, and the largest single DMA transfer possible in V6 was about 64KB (maybe a little more, for a max-sized process with its 'user' block), there was never a problem with contention for the map, and we didn't have to touch any of the other drivers at all. That was dandy, and only a couple of lines of extra code, but I somehow made a math error in my changes, and as I recall I had to debug it with a printf() in mapalloc(). I was not popular that day! Luckily, the error was quickly obvious, a fix was applied, and we were on our way. Noel
> when you're working below the reliable stream level, you can't just do a > blocking 'read' for a packet; it pretty much has to be asynchronous Oh, you should look at the early BBN TCP for V6 Unix - they would have faced the same issue, with their TCP process. They did have the capac() call (which kind of alleviates the need for non-blocking I/O), but that may have only been available for ports/pipes; I'm not sure if the ARPANET device supported it. (With the NCP as well, that did some amount of demultiplexing in the kernel, and probably had buffering there, so, if so, in theory capac() could have been done there. Of course, with the ARPANET link being only 100Kbit/sec maximum - although only to a host on the same IMP - the overhead of copying buffered data made kernel buffering more 'affordable'.) Noel
> > when you're working below the reliable stream level, you can't just do a > blocking 'read' for a packet; it pretty much has to be asynchronous > Oh, you should look at the early BBN TCP for V6 Unix - they would have faced the same issue, with their TCP process. They did have the capac() call (which kind of alleviates the need for non-blocking I/O), but that may have only been available for ports/pipes; I'm not sure if the ARPANET device supported it. I did. There is capac() support also for the IMP interface: https://www.tuhs.org/cgi-bin/utree.pl?file=BBN-V6/dmr/imp11a.c (see bottom two functions) BBN took the same approach as Research: with capac() or select() one can prevent blocking on read() and write().
> At around that point in time (I don't have the very _earliest_ code, to get an exact date, but the oldest traces I see [in mapalloc(), below] are from September '78), the CSR group at MIT-LCS (which were the people in LCS doing networking) was doing a lot with asynchronous I/O (when you're working below the reliable stream level, you can't just do a blocking 'read' for a packet; it pretty much has to be asynchronous). I was working in Unix V6 - we were building an experimental 1Mbit/second ring - and there was work in Multics as well. > I don't think the wider Unix community heard about the Unix work, but our group regularly filed updates on our work for the 'Internet Monthly Reports', which was distributed to the whole TCP/IP experimental community. If you can find an archive of early issues (I'm too lazy to go look for one), we should be in there (although our report will alsocover the Multics TCP/IP work, and maybe some other stuff too). Sounds very interesting! Looked around a bit, but I did not find a source for the “Internet Monthly Reports” for the late 70’s (rfc-editor.org/museum/ has them for the 1990’s). In the 1970’s era, it seems that NCP Unix went in another direction, using newly built message and event facilities to prevent blocking. This is described in "CAC Technical Memorandum No. 84, Illinois Inter-Process Communication Facility for Unix.” - but that document appears lost as well. Ah, well, topics for another day.
The operating systems that I cut my teeth on (OS/360, DOS/360, VAX/VMS) all had basic I/O system calls that were non-blocking. Blocking I/O calls were all built on top of that framework. I thus found it curious that Unix took the opposite tack, and non-blocking I/O was an afterthought. So I'm curious as to what the rationale was for Unix to have been designed with basic I/O being blocking rather than asynchronous. Especially that non-blocking I/O primitives were the norm for OSes in those days. -Paul W.
Paul Winalski <paul.winalski@gmail.com> wrote:
> So I'm curious as to what the rationale was for Unix to have been
> designed with basic I/O being blocking rather than asynchronous.
I don't doubt that it was "simplify, simplify, simplify". Async I/O
is much messier than Unix's read/write model. The Unix model was
simpler to design, simpler to code, simpler to get right, and undoubtedly
took much less OS code than an async model would have; on the PDP-11
that would have mattered.
Also, the early Unixs were on smaller -11s, not the /45 or /70 with
split I&D space and the ability to address lost more RAM.
My guess, anyway. :-)
Arnold
[-- Attachment #1: Type: text/plain, Size: 1807 bytes --] On Tue, Jun 2, 2020 at 1:47 PM Paul Winalski <paul.winalski@gmail.com> wrote: > The operating systems that I cut my teeth on (OS/360, DOS/360, > VAX/VMS) all had basic I/O system calls that were non-blocking. > Blocking I/O calls were all built on top of that framework. I thus > found it curious that Unix took the opposite tack, and non-blocking > I/O was an afterthought. > > So I'm curious as to what the rationale was for Unix to have been > designed with basic I/O being blocking rather than asynchronous. > Especially that non-blocking I/O primitives were the norm for OSes in > those days. Doug addressed this, albeit in an oblique manner, on this list back in 2015: https://minnie.tuhs.org/pipermail/tuhs/2015-September/007509.html Quoting him: """ Unix was what the authors wanted for a productive computing environment, not a bag of everything they thought somebody somewhere might want. One objective, perhaps subliminal originally, was to make program behavior easy to reason about. Thus pipes were accepted into research Unix, but more general (and unruly) IPC mechanisms such as messages and events never were. The infrastructure had to be asynchronous. The whole point was to surmount that difficult model and keep everyday programming simple. User visibility of asynchrony was held to a minimum: fork(), signal(), wait(). Signal() was there first and foremost to support SIGKILL; it did not purport to provide a sound basis for asynchronous IPC. The complexity of sigaction() is evidence that asynchrony remains untamed 40 years on. """ My response at the time was to question whether asynchrony itself remains untamed, as Doug put it, or if rather it has proved difficult to retrofit asynchrony onto a system designed around fundamentally synchronous primitives? - Dan C. [-- Attachment #2: Type: text/html, Size: 2453 bytes --]
On 6/2/20, arnold@skeeve.com <arnold@skeeve.com> wrote:
> Paul Winalski <paul.winalski@gmail.com> wrote:
>
>> So I'm curious as to what the rationale was for Unix to have been
>> designed with basic I/O being blocking rather than asynchronous.
>
> Also, the early Unixs were on smaller -11s, not the /45 or /70 with
> split I&D space and the ability to address lost more RAM.
I first encountered DOS/360 on a System/360 model 25 with 48K of
memory. This was a one-job-at-a-time batch system, but the I/O
primitive (EXCP--execute channel program) was asynchronous. So I
don't think the small memory rationale really applies.
-Paul W.
On 6/2/20, Dan Cross <crossd@gmail.com> wrote:
>
> My response at the time was to question whether asynchrony itself remains
> untamed, as Doug put it, or if rather it has proved difficult to retrofit
> asynchrony onto a system designed around fundamentally synchronous
> primitives?
I think that's a very good question. It's analogous to
record-oriented I/O vs. byte stream I/O. It's easy to build
record-oriented I/O on top of a byte stream, but it's a real bear to
do it the other way around. Similarly, it's easy to build synchronous
I/O on top of asynchronous I/O but the reverse ends up looking
contrived.
-Paul W.
[-- Attachment #1: Type: text/plain, Size: 2062 bytes --] On Tue, Jun 2, 2020 at 2:54 PM Paul Winalski <paul.winalski@gmail.com> wrote: > I first encountered DOS/360 on a System/360 model 25 with 48K of > memory. This was a one-job-at-a-time batch system, but the I/O > primitive (EXCP--execute channel program) was asynchronous. So I > don't think the small memory rationale really applies. > Hrrmpt... it was single task. Being asynchronous in the I/O and allowed process asynchrony takes a lot more housekeeping. Paul you know I agree with you, it was always an issue with UNIX IMO. The problem is how to do it differently. At Masscomp RRU, our solution was not to try to 'fix it' as much as add a new scheme beside the synchronous one. We added a general AST's mechanism that anyone could use (very much like RSX and VMS in semantics), but left signals alone. We added new async calls, which were just implemented via ioctl's (the universal hack) for the specific HW that supported it. In retrospect, that was an error, it should have been aread(2)/awrite(2) and then added the completion routine/call back as the 4th & 5th parameters. Since Stellix was not Real-Time, we left them out. [ tjt and I have argued about that for years ]. So back to Doug/Dan's answer -- I think for a small system, like the original PDP-7 and the PDP-11/20 putting the effort to make it multiprocess and leaving the I/O synchronous definitely made it easier. Particularly, since what was created with the signal semantics. To me, UNIX got way more right than wrong. I suspect if signals had been more like AST's, the idea of being async might have gone farther. But to use Dennis's like about C being quirky, signals are very quirky. wnj tried to 'fix' them and frankly it just made matters worse. And IMO, signaction(3) as Doug says, is a nightmare. I've generally been a fan of introducing a new idea separately as a 'stronger strain' and then seeing if people likely it. FWIW: A couple of us did try to get AST's into the POSIX.4 (they were there in a draft), but ultimately *.4 was rejected as 'not UNIX.' [-- Attachment #2: Type: text/html, Size: 3388 bytes --]
[-- Attachment #1: Type: text/plain, Size: 846 bytes --] On Tue, Jun 2, 2020 at 2:58 PM Paul Winalski <paul.winalski@gmail.com> wrote: > I think that's a very good question. It's analogous to > record-oriented I/O vs. byte stream I/O. It's easy to build > record-oriented I/O on top of a byte stream, but it's a real bear to > do it the other way around. Similarly, it's easy to build synchronous > I/O on top of asynchronous I/O but the reverse ends up looking contrived. > Which was exactly the point I tried to make in the POSIX.4 discussions, but it does take more work in the basic housekeeping and you need a way to handle events and completions that are priority based, queued, and a few other details. As Doug said, they stayed away from some features (like messaging). async I/O was one of them. But as I said, Ken, Dennis and the rest of the crew did an amazing job with very little. [-- Attachment #2: Type: text/html, Size: 1575 bytes --]
> From: Paul Winalski > I'm curious as to what the rationale was for Unix to have been designed > with basic I/O being blocking rather than asynchronous. It's a combination of two factors, I reckon. One, which is better depends a lot on the type of thing you're trying to do. For many typical thing (e.g. 'ls'), blocking is a good fit. And, as As Arnold says, asyhchronous I/O is more complicated, and Unix was (well, back then at least) all about getting the most bang for the least bucks. More complicated things do sometimes benefit from asynchronous I/O, but complicated things weren't Unix's 'target market'. E.g. even though pipes post-date the I/O decision, they too are a better match to blocking I/O. > From: Arnold Skeeve > the early Unixs were on smaller -11s, not the /45 or /70 with split I&D > space and the ability to address lost more RAM. Ahem. Lots more _core_. People keeep forgetting that we're looking at decicions made at a time when each bit in main memory was stored in a physically separate storage device, and having tons of memory was a dream of the future. E.g. the -11/40 I first ran Unix on had _48 KB_ of core memory - total! And that had to hold the resident OS, plus the application! It's no surprise that Unix was so focused on small size - and as a corollary, on high bang/buck ratio. But even in his age of lighting one's cigars with gigabytes of main memory (literally), small is still beautiful, because it's easier to understand, and complexity is bad. So it's too bad Unix has lost that extreme parsimony. > From: Dan Cross > question whether asynchrony itself remains untamed, as Doug put it, or > if rather it has proved difficult to retrofit asynchrony onto a system > designed around fundamentally synchronous primitives? I'm not sure it's 'either or'; I reckon they are both true. Noel
[-- Attachment #1: Type: text/plain, Size: 2080 bytes --] On Tue, Jun 2, 2020 at 4:14 PM Noel Chiappa <jnc@mercury.lcs.mit.edu> wrote: > Ahem. Lots more _core_. People keeep forgetting that we're looking at > decicions made at a time when each bit in main memory was stored in a > physically separate storage device, and having tons of memory was a dream > of > the future. > Yeah -- that is something that forgotten. There's a kit/hackday project to make 32-byte core for an Arduino I did with some of my boy scouts doing electronic MB a while back just to try to give them a feel what a 'bit' was. Similarly, there was a update of in late 1960's children's book originally called 'A Million' it's now called: A Million Dots <https://www.amazon.com/Million-Dots-Andrew-Clements/dp/0689858248/ref=sr_1_1?crid=2AX8H8L2EM0HL&dchild=1&keywords=a+million+dots+by+andrew+clements&qid=1591129965&sprefix=a+million+dots%2Caps%2C155&sr=8-1> Each page has 10K dots. The idea is to help young readers get a real feel for what 'a million' means visually. > > E.g. the -11/40 I first ran Unix on had _48 KB_ of core memory - total! > And that had to hold the resident OS, plus the application! It's no > surprise that Unix was so focused on small size - and as a corollary, on > high bang/buck ratio.' Amen -- I ran an 11/34 with 64K under V6 for about 3-6 months while we were awaiting the 256K memory upgrade. > > > But even in his age of lighting one's cigars with gigabytes of main memory > (literally), small is still beautiful, because it's easier to understand, > and > complexity is bad. So it's too bad Unix has lost that extreme parsimony. > Yep -- I think we were discussing this last week WRT to cat -v/fmt et al. I fear some people confuse 'progress' with 'feature creep.' Just because we can do something, does not mean we should. As I said, I'm a real fan of async I/O and like Paul, feel that it is a 'better' primitive. But I fully understand and accept, that given the tradeoffs of the time, UNIX did really well and I much prefer what we got than the alternative. I'm happy we ended up with simply and just works. [-- Attachment #2: Type: text/html, Size: 4080 bytes --]
I remember working on getting Arpanet access on an 11/34 running V7 around 1978 or 1979. (SU-ISL). We used an 11/23 as a front end to run NCP, using a variation of Rand’s code. I wrote some sort of bisync driver for packet communications between the /23 and the /34, and I think added an IOCTL or some hack to ask if there was a message ready. So a polling variety of non-blocking I/O :) Meanwhile, on the Alto, surely an underpowered machine, the style for UI programming and asynch I/O was mostly event driven, much like libevent in more recent years. I found that very easy to understand. The main trajectory of PARC was lightweight threads calling synchronous I/O which really has to be counted as a Bad Idea not yet fully stamped out. I’ve always thought select(2) was a pretty good thing. In 1995 at Open Market we had a single-process web server that had no difficulties running 1000 to 1200 connections. I think that was BSD, and later OSF-1. -L
Case study (read, war story :-) of non-blocking I/O... Back in the mid-70's, I was doing scientific support programming for an X-Ray sky survey, using Fortran-63 under DRUM SCOPE on a CDC 3800 system which NRL's Space Sciences Division had (sigh) acquired and cobbled together from the US Government "excess property" list. Most of the time, our satellite was used in "scan mode", spinning on an axis which pointed at the Sun. This let us collect data on a two degree wide strip of sky. So, every six months we scanned the entire X-ray sky. The track was basically a helix, bent to match the shape of the Earth's orbit. Every so often, in order to look at a particularly interesting area, the spinning would be stopped, so the instrument could be put into "point mode". The challenge was to "bin" this data for further analysis. We did this by loading sets of 7-track tapes onto a CDC 813 drive, then dumping it out into 36 2x10 degree bins. This put a quarter of the data onto a set of tapes. Rinse, repeat for the other 3/4 of the data (after skipping past the already-written data). The result was 36 tapes, each holding an 8x10 degree bin (plus one extra tape for any point mode data). IIRC, we had five tape drives; my challenge was to keep them all as busy as possible, so as to dump the data set expeditiously. Because I had asynchronous I/O (mostly in the form of BUFFER IN and BUFFER OUT commands), I was able to implement a simple but quite effective polling loop. The machine room was a bit of a madhouse, but the tapes were written about as quickly as the hardware allowed. Asynchronous I/O FTW... https://en.wikipedia.org/wiki/United_States_Naval_Research_Laboratory#Space_sciences https://en.wikipedia.org/wiki/High_Energy_Astronomy_Observatory_1#A1:_Large-Area_Sky_Survey_instrument https://ub.fnwi.uva.nl/computermuseum/cdcdisk.html -r
On 6/2/20, Rich Morin <rdm@cfcl.com> wrote:
>
> IIRC, we had five tape drives; my challenge was to keep them all as busy as
> possible, so as
> to dump the data set expeditiously. Because I had asynchronous I/O (mostly
> in the form of
> BUFFER IN and BUFFER OUT commands), I was able to implement a simple but
> quite effective
> polling loop. The machine room was a bit of a madhouse, but the tapes were
> written about
> as quickly as the hardware allowed. Asynchronous I/O FTW...
With 9-track magnetic tape devices, reading and writing can't start
until the tape is up to speed. Once up to speed the drive can read
and write records while keeping the tape moving at speed. This is
called streaming. If there's a pause in the read/write requests from
the CPU, time is lost as the drive stops and starts moving the tape.
It was essential that applications doing large amounts of tape I/O
keep up the I/O requests at a rate that allows streaming.
Asynchronous I/O with multi-buffering is a straightforward way to
accomplish this. The IBM S/360 channel commands for tape devices
provided a mechanism for the tape control unit to send an interrupt to
the CPU when a read or write channel command completed. This notified
the sequential access method (the user program I/O interface) when I/O
to each buffer had completed and the buffer was available for reuse.
OS/360's Sequential Access Method could read or write an entire tape
using a single SIO (start I/O) instruction, as long as no read or
write errors were encountered.
-Paul W.
[-- Attachment #1: Type: text/plain, Size: 3062 bytes --] On a distantly related note, when I worked part time for the MIT administration in the 60's, we'd do processing on the 7094 in the main comp center, then bring a tape of results back to our office to do the printing and card punching. For reasons that were never explained to me, the blocks of print and punch data were combined on a single tape, and were distinguished by being written with different parities. If you attempted to read a block with the correct parity setting, the tape could stream. If you guessed wrong, you had to stop, back up a block, change the parity, and try again. So I wrote a simple 360 assembly program to keep track how often blocks of each parity followed the observed parities of the previous 8 blocks. Essentially, 256 pairs of parity observations, indexed by the previous 8 parity observations. Blocks of print and punch data tended to fall into patterns that depended on what job was being run on the 7094, so detecting those patterns and correctly anticipating the upcoming parity made the tapes move much more smoothly. It was fun to watch the tape at the start of a run. It was mostly just a coin-toss, so the tape was jerking around fitfully. As the patterns started to emerge, the predictions got better, the jerking got less and less common, and the tapes were streaming most of the time. My introduction to learning algorithms. On Wed, Jun 3, 2020 at 12:33 PM Paul Winalski <paul.winalski@gmail.com> wrote: > On 6/2/20, Rich Morin <rdm@cfcl.com> wrote: > > > > IIRC, we had five tape drives; my challenge was to keep them all as busy > as > > possible, so as > > to dump the data set expeditiously. Because I had asynchronous I/O > (mostly > > in the form of > > BUFFER IN and BUFFER OUT commands), I was able to implement a simple but > > quite effective > > polling loop. The machine room was a bit of a madhouse, but the tapes > were > > written about > > as quickly as the hardware allowed. Asynchronous I/O FTW... > > With 9-track magnetic tape devices, reading and writing can't start > until the tape is up to speed. Once up to speed the drive can read > and write records while keeping the tape moving at speed. This is > called streaming. If there's a pause in the read/write requests from > the CPU, time is lost as the drive stops and starts moving the tape. > It was essential that applications doing large amounts of tape I/O > keep up the I/O requests at a rate that allows streaming. > Asynchronous I/O with multi-buffering is a straightforward way to > accomplish this. The IBM S/360 channel commands for tape devices > provided a mechanism for the tape control unit to send an interrupt to > the CPU when a read or write channel command completed. This notified > the sequential access method (the user program I/O interface) when I/O > to each buffer had completed and the buffer was available for reuse. > OS/360's Sequential Access Method could read or write an entire tape > using a single SIO (start I/O) instruction, as long as no read or > write errors were encountered. > > -Paul W. > [-- Attachment #2: Type: text/html, Size: 3600 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1362 bytes --] On 2020-Jun-01 07:58:02 -0700, Larry McVoy <lm@mcvoy.com> wrote: >On Mon, Jun 01, 2020 at 01:32:56PM +1000, Dave Horsfall wrote: >> On Mon, 1 Jun 2020, Rob Pike wrote: >> >> > I???m not quite sure why the Research lineage did not include >> > non-blocking behaviour, especially in view of the man page comments. >> > Maybe it was seen as against the Unix philosophy, with select() >> > offering sufficient mechanism to avoid blocking (with open() the hard >> > corner case)? >> > >> >That's it. Select was good enough for our purposes. >> >> After being dragged through both Berserkley and SysVile, I never did get the >> hang of poll()/select() etc,,, > >I'm sure you could, select is super handy, think a network server like >apache. My view may be unpopular but I've always been disappointed that Unix implemented blocking I/O only and then had to add various hacks to cover up for the lack of asynchonous I/O. It's trivial to build blocking I/O operations on top of asynchonous I/O operations. It's impossible to do the opposite without additional functionality. I also found it disappointing that poll()/select() only worked on TTY and network operations. HDDs are really slow compared to CPUs and it would be really nice if a process could go and do something else whilst waiting for a file to open. -- Peter Jeremy [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 963 bytes --]
[-- Attachment #1: Type: text/plain, Size: 3065 bytes --] On Thu, Jun 4, 2020 at 3:06 AM Peter Jeremy <peter@rulingia.com> wrote: > On 2020-Jun-01 07:58:02 -0700, Larry McVoy <lm@mcvoy.com> wrote: > >On Mon, Jun 01, 2020 at 01:32:56PM +1000, Dave Horsfall wrote: > >> On Mon, 1 Jun 2020, Rob Pike wrote: > >> > >> > I???m not quite sure why the Research lineage did not include > >> > non-blocking behaviour, especially in view of the man page comments. > >> > Maybe it was seen as against the Unix philosophy, with select() > >> > offering sufficient mechanism to avoid blocking (with open() the hard > >> > corner case)? > >> > > >> >That's it. Select was good enough for our purposes. > >> > >> After being dragged through both Berserkley and SysVile, I never did > get the > >> hang of poll()/select() etc,,, > > > >I'm sure you could, select is super handy, think a network server like > >apache. > > My view may be unpopular but I've always been disappointed that Unix > implemented blocking I/O only and then had to add various hacks to cover > up for the lack of asynchonous I/O. It's trivial to build blocking I/O > operations on top of asynchonous I/O operations. It's impossible to do > the opposite without additional functionality. > > I also found it disappointing that poll()/select() only worked on TTY and > network operations. HDDs are really slow compared to CPUs and it would be > really nice if a process could go and do something else whilst waiting for > a file to open. > Lest anybody think this is a theoretical concern, Netflix has spent quite a bit of effort to reduce the sources of latency in our system. The latency for open doesn't happen often, due to caching, but when it does this causes a hickup for nginx worker thread (since open is blocking). If you get enough hickups, you wind up consuming all your worker threads and latency for everybody suffers while waiting for these to complete (think flaky disk that suddenly takes a really long time for each of its I/Os for one example). We've hacked FreeBSD in various ways to reduce or eliminate this delay.... In FreeBSD we have to translate from the pathname to a vnode, and to do that we have to look up directories, indirect block tables, etc. All these are surprising places that one could bottleneck at... And if you try to access the vnode before all this is done, you'll wait for it (though in the case of sendfile it doesn't matter since that's async and only affect the one I/O)... So I could see how having a async open could introduce a lot of hair into the mix depending on how you do it. Without a robust callback/AST mechanism, my brain is recoiling from the EALREADY errors in sockets for things that are already in progress... reads and write are easy by comparison :) The kicker is that all of the kernel is callback driven. The upper half queues the request and then sleeps until the lower half signals it to wakeup. And that signal is often just a wakeup done from the completion routine in the original request. All of that would be useful in userland for high volume activity, none of it is exposed... Warner [-- Attachment #2: Type: text/html, Size: 3745 bytes --]
Warner Losh <imp@bsdimp.com> wrote: > > So I could see how having a async open could introduce a lot of hair into > the mix depending on how you do it. Linux io_uring, for example. I would be interested if anyone has a historical perspective on its design :-) Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Wight, Portland, Plymouth, Biscay, Fitzroy, Sole, Lundy, Fastnet: North or northwest 4 to 6, occasionally 7 in Biscay. Slight or moderate at first except in Biscay and Fitzroy, otherwise moderate or rough. Showers. Good, occasionally moderate.
On Thu, Jun 04, 2020 at 08:19:58AM -0600, Warner Losh wrote:
> The kicker is that all of the kernel is callback driven. The
> upper half queues the request and then sleeps until the lower half signals
> it to wakeup. And that signal is often just a wakeup done from the
> completion routine in the original request. All of that would be useful in
> userland for high volume activity, none of it is exposed...
Yeah, I've often wondered why this stuff wasn't exposed. We already have
signal handlers, seems like that maps.
I tried to get the NFS guys at Sun to rethink the biod junk and do it like
UFS does, where it queues something and gets a callback. I strongly suspect
that two processes, one to queue, one to handle callbacks, would be more
efficient and actually faster than the biod nonsense.
That's one of the arguments I lost unfortunately.
Warner, exposing that stuff in FreeBSD is not really that hard, I suspect.
Might be a fun project for a young kernel hacker with some old dude like
you or me or someone, watching over it and thinking about the API.
--lm
[-- Attachment #1: Type: text/plain, Size: 3011 bytes --] On Thu, Jun 4, 2020 at 12:51 PM Larry McVoy <lm@mcvoy.com> wrote: > On Thu, Jun 04, 2020 at 08:19:58AM -0600, Warner Losh wrote: > > The kicker is that all of the kernel is callback driven. The > > upper half queues the request and then sleeps until the lower half > signals > > it to wakeup. And that signal is often just a wakeup done from the > > completion routine in the original request. All of that would be useful > in > > userland for high volume activity, none of it is exposed... > > Yeah, I've often wondered why this stuff wasn't exposed. We already have > signal handlers, seems like that maps. > Was it Rob who said that signals were really just for SIGKILL? Here, signals would be gang-pressed into service as a general IPC mechanism. In fairness, they've mutated that way, but they didn't start out that way. While I obviously wasn't there, the strong impression I get is that by the time people were seriously thinking about async IO in Unix, the die had already been cast for better or worse. > I tried to get the NFS guys at Sun to rethink the biod junk and do it like > UFS does, where it queues something and gets a callback. I strongly > suspect > that two processes, one to queue, one to handle callbacks, would be more > efficient and actually faster than the biod nonsense. > > That's one of the arguments I lost unfortunately. > > Warner, exposing that stuff in FreeBSD is not really that hard, I suspect. > Might be a fun project for a young kernel hacker with some old dude like > you or me or someone, watching over it and thinking about the API. > I'm going to actually disagree with you here, Larry. While I think a basic mechanism wouldn't be THAT hard to implement, it wouldn't compose nicely with the existing primitives. I suspect the edge cases would be really thorny, particularly without a real AST abstraction. For instance, what happens if you initiate an async IO operation, then block on a `read`? Where does the callback happen? If on the same thread, The real challenge isn't providing the operation, it's integrating it into the existing model. As a counter-point to the idea that it's completely unruly, in Akaros this was solved in the C library: all IO operations were fundamentally asynchronous, but the C library provided blocking read(), write(), etc by building those from the async primitives. It worked well, but Akaros had something akin to an AST environment and fine-grain scheduling decisions were made in userspace: in Akaros the unit of processor allocation is a CPU core, not a thread, and support exists for determining the status of all cores allocated to a process. There are edge cases (you can't roll-your-own mutex, for example, and the basic threading library does a lot of heavy lifting for you making it challenging to integrate into the runtime of a language that doesn't use the same ABI), but by and large it worked. It was also provided by a kernel that was a pretty radical departure from a Unix-like kernel. - Dan C. [-- Attachment #2: Type: text/html, Size: 3647 bytes --]
> From: Peter Jeremy <peter@rulingia.com> > My view may be unpopular but I've always been disappointed that Unix > implemented blocking I/O only and then had to add various hacks to cover > up for the lack of asynchonous I/O. It's trivial to build blocking I/O > operations on top of asynchonous I/O operations. It's impossible to do > the opposite without additional functionality. Back when I started working on networks, I looked at other kinds of systems to see what general lessons I could learn about the evolution of systems, which might apply to the networks we were building. (I should have written all that up, never did, sigh.) One major one was that a system, when small, often collapses multiple needs onto one machanism. Only as the system grows in size do scaling effects, etc necessitate breaking them up into separate mechanisms. (There are some good examples in file systems, for example.) I/O is a perfect example of this; a small system can get away with only one kind; it's only when the system grows that one benefits from having both synchronous and asynchronous. Since the latter is more complicated, _both_ in the system and in the applications which use it, it's no surprise that synchronous was the pick. The reasons why synchronous is simpler in applications have a nice illustration in operating systems, which inevitably support both blocking (i.e. implied process switching) and non-blocking 'operation initiation' and 'operation completed notification' mechanisms. (The 'timeout/callout' mechanism is Unix is an example of the latter, albeit specialized to timers.) Prior to the Master Control Program in the Burroughs B000 (there may be older examples, but I don't know of them - I would be more than pleased to be informed of any such, if there are), the technique of having a per-process _kernel_ stack, and on a process block (and implied switch), switching stacks, was not used. This idea was picked up for Jerry Saltzer's PhD thesis, used in Multics, and then copied by almost every other OS since (including Unix). The advantage is fairly obvious: if one is deep in some call stack, one can just wait there until the thing one needs is done, and then resume without having to work one's way back to that spot - which will inevitably be complicated (perhaps more in the need to _return_ through all the places that called down - although the code to handle a 'not yet' return through all those places, after the initial call down, will not be inconsiderable either). Exactly the same reasoning applies to blocking I/O; one can sit where one is, waiting for the I/O to be done, without having to work one's way back there later. (Examples are legion, e.g. in recursive descent parsers - and can make the code _much_ simpler.) It's only when one _can't_ wait for the I/O to complete (e.g. for a packet to arrive - although others have mentioned other examples in this thread, such as 'having other stuff to do in the meanwhile') than having only blocking I/O becomes a problem... In cases where blocking would be better, one can always build a 'blocking' I/O subsystem on top of asynchronous I/O primitives. However, in a _tiny_ system (remember my -11/40 which ran Unix on a system with _48KB_ of main memory _total_- i.e. OS and application together had to be less than 48KB - no virtual memory on that machine :-), building blocking I/O on top of asynchonous I/O, for those very few cases which need it, may not be the best use of very limited space - although I agree that it's the way to go, overall. Noel
[-- Attachment #1: Type: text/plain, Size: 1501 bytes --] On Fri, 5 Jun 2020, Dan Cross wrote: > Was it Rob who said that signals were really just for SIGKILL? Here, > signals would be gang-pressed into service as a general IPC mechanism. > In fairness, they've mutated that way, but they didn't start out that > way. While I obviously wasn't there, the strong impression I get is that > by the time people were seriously thinking about async IO in Unix, the > die had already been cast for better or worse. I will quite happily strangle anyone who uses signals for IPC. Why? I got bitten quite badly by that, if anyone here remembers BSD/OS... It seemed that "fdump" forked off several kiddies, and they chatted amongst themselves using signals. Anyway, let's just say that after a disk crash this was a poor time to discover that *some* of my backups were screwed in a weird way; for example, there was a 1/4" QIC[*] tape with files, but no inodes to put them into their corresponding home directories... I wrote something to extract whatever I could, excavate what I could, and then either rebuild or rely upon memory from there. Not fun. Did I mention that I will quite happily strangle anyone who uses signals for IPC? Signals mean "stop what you're doing now, do this instead, then hopefully go back to whatever you thought you were doing". [*] Don't ask me about those !@#$% QIC tapes, that chose to use whatever density they wished depending upon the phase of the moon etc. -- Dave, who lost a lot of valuable files that day