9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "John S. Dyson" <dyson@iquest.net>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Virtual memory in BSD and Plan9
Date: Wed, 31 Oct 2001 10:00:32 +0000	[thread overview]
Message-ID: <c66cd78b.0110301602.3c436ece@posting.google.com> (raw)
In-Reply-To: <vi4itcxxj8q.fsf@blue.cs.yorku.ca>

Ozan Yigit <oz@blue.cs.yorku.ca> wrote in message news:<vi4itcxxj8q.fsf@blue.cs.yorku.ca>...
> talking about VM, here is an interesting piece on the politics and
> technology of linux VM. appearently there is a "simpler and faster"
> [quotes for conceptual relativism :-] version...
> 
> http://www.byte.com/documents/s=1436/byt20011024s0002/1029_moshe.html
> 
> [i wish some of this hacker energy could be directed to plan9 projects]
> 
Actually, doing the VM code 'correctly' only takes a little more work than
to do it at all.  Some of the advanced Linux VM code was a result of
discussions between Matt Dillon (the guy that took over my code on
FreeBSD) and the Linux developers.  The basic concept of the various
page queues actually was an evolution from the original MACH code.

One key to the 'better' page selection is to be able to find out the
pages that are often used recently, so that they won't be thrown
away in a flurry of paging activity.  To make the VM work well, there
has to be some resistance against throwing out the first page that
seems not to have been used during the most recent scan of free
pages, but to delve deeper into recent and not so recent usage.

The pageout daemon can only do so much, with it's limited knowledge
of recent activity, and too much page invalidation (and dissociation
with previous mappings) will only cause thrashing, and will NOT
improve overall system behavior.

Another mistake often made is the potential impulse change in buffer
cache size, and the subsequent destabilization of the balance of page
usage.   Think of the VM and buffer cache tension as a feedback and
control system -- if you change things too much too quickly, even a
relatively 'stable' situation will become unstable often due to hitting
limits (both rate limits and absolute limits.)   The work on FreeBSD
was meant to avoid errant behavior under changing loads, and is one
reason why it APPEARS to be self-tuning.   In fact, it really isn't
as self-tuning, but simply tries to function as one would intuitively
expect.  FreeBSD seldom really needed to be 'tuned' as expected,
and the system would mostly compensate for meddling tuning attempts,
as any stable 'feedback' system would behave.   Really useful tunables
included changing the paging queue lengths when more I/O devices
could be interleaved, or changing the size of the page queues based
upon memory size.

In all VM paging systems (those that allow for pageouts), mistakes
are made with respect to desirable pages being removed from working
sets.  The 2nd chance type algorithms (which it appears that Linux has
picked up), really help also.   In a way, one might think of this as
a 2nd level of paging, where there isn't really much work done, and
paging mistakes can be corrected.   I found that 2nd chance helped,
but didn't do as well as I had hoped, so other 2nd chance type mechanisms
were added so as to hide pageout choice mistakes.

One of the simplest changes that I made to the MACH VM, and is very
obvious, and wouldn't really be applicable to most sane VM designs:
a kind of free page queue was added that allowed the page to continue
to contain data.   This page queue was the first consulted during userland
memory needs, but maximized and buffered the availability of pages without
impact on memory usage efficiency.   Most (good) VM systems already
have this facility, but was something that marginally helped FreeBSD.

Frankly, with today's memory costs, and a definition that a machine
MUST contain enough memory, alot of effort need not be made in the
choice of pages to invalidate.   However, it doesn't take alot of effort
to make 'good choices.'   Unfortunately, I HATE to write long papers,
and even my earlier notes on the VM code terminology, and Matt Dillon's
cursory explaination of the code still don't get into the meat of
the subtle nature of the code.   When Matt first started wanting
to 'improve' the code, after I left, I had to seriously petition that
he not make any changes until he could benchmark and quantify the changes.
After that was successful, he tended to agree with my position that
it works 'especially well.'   There is some 'junk' in the code that
isn't magic to me, but looks like 'magic' on initial inspection.

ANYONE who is interested in specific ideas for VM paging code
(and VM code in general) is welcome to contact me (without prejudice.)
I don't really think that it is necessary to do things in the way
that I did the FreeBSD stuff, but there are concepts that are fairly
universal that often seem to be discounted.

I'd like to see 'cool' alternatives like Plan 9 make progress into
the marketplace, and all of the geek-talk that I might spew, along
with the REAL Plan 9 experts who really do know what they are
talking about WRT that OS, technical excellence isn't sufficient (or
even the primary predictor) of success.

I really don't think that the VM code needs to fully hide the size
of available physical memory, because such elite positions only create
polarization.   I do believe that if paging performance under exceptional
loading conditions is a criteria, then UP FRONT consideration for
being able to include design that avoids loop instabilities and
wise page choices would be a responsible position to take.

John


  reply	other threads:[~2001-10-31 10:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-10-29 12:38 rob pike
2001-10-29 12:57 ` Borja Marcos
2001-10-30 15:22 ` Douglas A. Gwyn
2001-10-30 15:22 ` John S. Dyson
2001-10-30 21:13   ` Boyd Roberts
2001-11-02  9:59     ` Thomas Bushnell, BSG
2001-10-30 15:23 ` Ozan Yigit
2001-10-31 10:00   ` John S. Dyson [this message]
2001-10-31 18:12     ` Douglas A. Gwyn
2001-10-31 20:21       ` Dan Cross
2001-11-13 10:34         ` John S. Dyson
2001-11-02  9:58 ` Thomas Bushnell, BSG
  -- strict thread matches above, loose matches on Subject: below --
2001-11-13 11:56 forsyth
     [not found] <dhog@plan9.bell-labs.com>
2001-11-01 21:19 ` David Gordon Hogan
2001-11-01 21:23   ` Scott Schwartz
2001-10-30 16:08 bwc
2001-10-30 15:37 bwc
2001-10-25 17:55 Russ Cox
2001-10-25 18:29 ` William Josephson
2001-10-29 10:16   ` John S. Dyson
2001-10-25 16:59 Wladimir Mutel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c66cd78b.0110301602.3c436ece@posting.google.com \
    --to=dyson@iquest.net \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).