I always used the design principle "Write locally, read over NFS". This obviated locking issues and fit in with the idea of fate-sharing: a write would always succeed, even if reading would have to wait until R (the machine doing the reading) was up. The only additional thing I needed was the ability for W (the machine doing the writing) to notify R that something had changed, which I did by having R run a process that listened on a port that would be opened and then closed by W: no data flowed over this connection. If this connection could not be made, the process on the W side would loop in bounded exponential backoff. On Sun, Jul 5, 2020 at 4:09 PM Clem Cole wrote: > > > On Sun, Jul 5, 2020 at 10:43 AM Larry McVoy wrote: > >> My guess is that other people didn't understand the "rules" and did >> things that created problems. Sun's clients did understand and did >> not push NFS in ways that would break it. > > I >>believe<< that a difference was file I/O was based on mmap on SunOS > and not on other systems (don't know about Solaris). The error was > handled by the OS memory system. You tell me about how SGI handled I/O. > Tru64 used mmap and I think macOS does also from the Mach heritage. > RTU/Ultrix was traditional BSD. Stellix was SRV3. Both had a file > system cache with write-behind. > > I never knew for sure, but I always suspected that was crux of the > difference in how/where the write failure were handled. But as you > pointed out, many production NFS sites not running Suns had huge problems > with holes in files that were not discovered until it was too late to fix > them. SCCS/RCS repositories were particularly suspect and because people > tried to use them for shared development areas, it could be a big issue. >