Re: [TUHS] Paging code in SysV R2

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

From: Larry McVoy <lm@mcvoy.com>
To: Paul Ruizendaal <pnr@planet.nl>
Cc: TUHS main list <tuhs@minnie.tuhs.org>
Subject: Re: [TUHS] Paging code in SysV R2
Date: Tue, 29 Mar 2022 09:01:28 -0700	[thread overview]
Message-ID: <20220329160128.GD26960@mcvoy.com> (raw)
In-Reply-To: <9316583A-2461-40B9-8B87-15AC4A719198@planet.nl>

On Tue, Mar 29, 2022 at 05:24:17PM +0200, Paul Ruizendaal wrote:
> No, sorry, my scope of interest is mostly 1975-1985.
> 
> I did read the Mach virtual memory paper from 1988 - from that paper I gather that the data structures used are totally different from those in Sys V or BSD.
> 
> There is also the VM implementation that Richard Miller did on SysV r1
> in 1983.  Interestingly, his design seems to parallel the choices made
> by Reiser a few years before, but it is lighter touch. Both Reiser and
> Miller refer to Denning and Tenex as prior art. Miller's 1984 Usenix
> paper about this project argues that doing approximated LRU from the
> page table data results in a process local working set view, which he
> argued was preferable to the system global working set view generated
> in the BSD clock algorithm.

I have HUGE respect for the BSD pageout code.  I was the guy who first
got rid of the rotational delay in UFS which doubled the speed at which
you could read data off the disk.  Which was nice, but it filled memory
with pages and the pageout code could not free them as fast as UFS 
was making more.

So I wrote 13 different attempts at a better pageout system (no, I didn't
save the code, it belonged to Sun, I think it is lost at this point).
While some of those attempts were better in very specific cases,
none of them, not one of them, were as good, in all cases, as the BSD
pageout code.

That said, it couldn't keep up now that UFS was faster (it couldn't 
keep up with a single disk, it had no hope of keeping up with multiple
disks).  So I did an awful hack, that last I checked was still there:

    if (we are doing sequential reads AND we are about to start paging out) {
    	start freeing pages a window behind where we are reading
    }

The effect was that if you were about to be out of free memory, sequential
I/O became a sliding 256K window of pages, the vnode that was using up all
the memory became the one that started freeing memory.  It was much faster
and simpler than having the pageout code scan through every page.

It was also an ugly hack, it was just the best I could come up with at the
time.

The whole effort left me in awe of the BSD pageout code.  It's pretty
simple but I'd challenge anyone to do better than that code in all
use cases.  Not saying it can't be done, but when I was at my coding
prime, I couldn't do it.

I did start down the path of doing a vnode based working set design.
I gave up when I realized that the swap vnode was by far the biggest
vnode, everyone's brk() came from that and it was all jumbled together.
Trying to do a vnode per process got really messy around fork().

I had many a late night design discussion at Antonio's Nut House in
Palo Alto with my office mate, Howard Chartok, trying to come up with
a design that couldn't be shot down.  Perhaps too much beer because we
never got to one.  

I still suspect there is something to be said for a vnode based working
set approach.  Especially if you brought back the hack I did that put
the basename of the vnode in the vnode (Shannon took it out because of
"hard links").  Too bad that didn't survive, I wrote a topvn (like top
but for vnodes) that was really good at giving you some insight into
what was going on.

     prev parent reply	other threads:[~2022-03-29 16:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 11:21 Paul Ruizendaal via TUHS
2022-03-29 12:38 ` Rob Pike
2022-03-29 14:05 ` Clem Cole
2022-03-29 15:24   ` Paul Ruizendaal
2022-03-29 15:57     ` Dan Cross
2022-03-29 16:01     ` Larry McVoy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220329160128.GD26960@mcvoy.com \
    --to=lm@mcvoy.com \
    --cc=pnr@planet.nl \
    --cc=tuhs@minnie.tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).