The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Dan Cross <crossd@gmail.com>
To: Bakul Shah <bakul@iitbombay.org>
Cc: TUHS main list <tuhs@minnie.tuhs.org>
Subject: Re: [TUHS] FreeBSD behind the times? (was: Favorite unix design principles?)
Date: Fri, 5 Feb 2021 21:06:45 -0500	[thread overview]
Message-ID: <CAEoi9W7ZYojXNJYcUo5pw2+v1kApAz1MA5qGW6+EQkK=8nDAxA@mail.gmail.com> (raw)
In-Reply-To: <0253BE0F-94CB-41BB-921D-6BD09A188601@iitbombay.org>

[-- Attachment #1: Type: text/plain, Size: 3633 bytes --]

On Fri, Feb 5, 2021 at 7:04 PM Bakul Shah <bakul@iitbombay.org> wrote:

> On Feb 5, 2021, at 6:18 AM, Larry McVoy <lm@mcvoy.com> wrote:
> >
> > On Thu, Feb 04, 2021 at 09:17:54PM -0800, Bakul Shah wrote:
> >> On Feb 4, 2021, at 4:33 PM, Larry McVoy <lm@mcvoy.com> wrote:
> >>>
> >>> Ignoring the page cache and make their own cache has big problems.
> >>> You can mmap() ZFS files and doing so means that when a page is
> referenced
> >>> it is copied from the ZFS cache to the page cache.  That creates a
> >>> coherency problem, I can write via the mapping and I can write via
> >>> write(2) and now you have two copies of the data that don't match,
> >>> that's pretty much OS no-no #1.
> >>
> >> Write(2)ing to a mapped page sounds pretty dodgy. Likely to get you
> >> in trouble in any case. Similarly read(2)ing.
> >
> > The entire point of the SunOS 4.0 VM system was that the page you
> > saw via mmap(2) is the exact same page you saw via read(2).  It's
> > the page cache, it has page sized chunks of memory that cache
> > file,offset pairs.
> >
> > There is one, and only one, copy of the truth.  Doesn't matter how
> > you get at it, there is only one "it".
> >
> > ZFS broke that contract and that was a step backwards in terms of
> > OS design.
>
> Let me repeat a part of my response you cut out:
>
>     And you can keep track of mapped pages and read/write from them if
>     necessary even if you have a  separate cache for any compressed pages.
>
> In essence you pass the ownership of a page's data from a compressed
> page cache to the mapped page. Just like in processor cache coherence
> algorithms there is one source of truth: the current owner of a cached
> unit (line or page or whatever). In other words, the you see via mmap(2)
> will be the exact same page you will see via read(2). Not having actually
> tried this I may have missed corner cases + any practical considerations
> complicating things but *conceptually* this doesn't seem hard.


In essence, that's what the merged page/buffer cache is all about: file IO
(read/write) operations are satisfied from the same memory cache that backs
up VM objects. I agree that conceptually it's not that complex; but that's
not what ZFS does.

Of course the original Unix buffer cache didn't do that either, because no
one was mmap'ing files on the PDP-11, let alone the PDP-7. A RAM-based
buffer cache for blocks as the nexus around which the system serialized
access to the disc-resident filesystem sufficed. When virtual address
spaces got bigger (starting on the VAX) and folks wanted to start being
more clever with marrying virtual memory and IO, you had an impedance
mismatch with a fairly large extant kernel that had developed not taking
into consideration memory-mapped IO to/from files. Sun fixed this, at what
I take to be great expense (I followed keenly the same path of development
in *BSD and Linux at the time and saw how long that took, so I believe
this). But then the same Sun broke it again.

Warner mentions not using ZFS for its double copying. May be omething
> like the above can a step in the direction of integrating the caches?
>

But the cache was integrated! Until it wasn't again....

As Ron says, I too would like to hear what the authors of ZFS have to
> say....
>

Sounds like they thought it was too hard because compression means the
place on storage where an offset in a file lands is no longer a linear
function of the file's contents. Presumably the compressed contents are not
kept in RAM in any of the caches (aside from a temporary buffer to which
the compressed contents are read or something).

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 4759 bytes --]

  reply	other threads:[~2021-02-06  2:07 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25 11:10 [TUHS] Favorite unix design principles? Tyler Adams
2021-01-25 12:32 ` Steve Nickolas
2021-01-26  2:06   ` M Douglas McIlroy
2021-01-26  2:53     ` Steve Nickolas
2021-01-26 10:22     ` Tyler Adams
2021-01-26 12:26       ` John P. Linderman
2021-01-26 15:23       ` Clem Cole
2021-01-26 16:00         ` Niklas Karlsson
2021-01-26 16:13           ` Adam Thornton
     [not found]       ` <CAKH6PiXKjksEpQOMMMQTbcsMvX2thz3WzqjoRWJAsXnZ4Eq_iQ@mail.gmail.com>
2021-01-30 19:01         ` Tyler Adams
2021-01-30 19:50           ` Jon Steinhart
2021-01-30 20:06             ` Tyler Adams
2021-01-30 21:28               ` Clem Cole
2021-01-30 21:42                 ` Dave Horsfall
2021-01-30 21:45                 ` Tyler Adams
2021-01-30 22:31                   ` Larry McVoy
2021-01-30 22:28                 ` Larry McVoy
2021-01-30 23:11                   ` [TUHS] FreeBSD behind the times? (was: Favorite unix design principles?) Greg 'groggy' Lehey
2021-01-30 23:17                     ` Larry McVoy
2021-01-30 23:22                       ` Warner Losh
2021-01-30 23:31                         ` [TUHS] [SPAM] " Larry McVoy
2021-01-30 23:37                           ` Jon Steinhart
2021-01-30 23:54                             ` Larry McVoy
2021-01-31 12:23                               ` [TUHS] [SPAM] Re: FreeBSD behind the times? Dermot Tynan
2021-01-31  0:00                             ` [TUHS] [SPAM] Re: FreeBSD behind the times? (was: Favorite unix design principles?) Bakul Shah
2021-02-09  2:15                         ` [TUHS] " Will Senn
2021-02-09  2:16                           ` Will Senn
2021-02-09  2:30                             ` Greg 'groggy' Lehey
2021-01-31  0:39                     ` Steve Nickolas
2021-01-31  1:47                     ` Will Senn
2021-01-31  2:25                       ` Larry McVoy
2021-01-31  2:52                         ` Will Senn
2021-01-31  3:00                           ` Larry McVoy
2021-01-31  3:06                             ` Will Senn
2021-01-31  3:32                               ` John Cowan
2021-02-04  5:43                         ` Dave Horsfall
2021-02-04  6:10                           ` Angus Robinson
2021-02-04  7:46                             ` Andy Kosela
2021-02-04 22:25                             ` Dave Horsfall
2021-02-04 15:45                           ` Will Senn
2021-02-04 16:03                             ` Henry Bent
2021-02-04 16:32                             ` Dan Cross
2021-02-04 16:49                               ` Will Senn
2021-02-04 17:46                               ` Larry McVoy
2021-02-04 18:41                               ` Bakul Shah
2021-02-04 22:28                                 ` George Michaelson
2021-02-04 22:41                                   ` Bakul Shah
2021-02-05  0:33                                   ` Larry McVoy
2021-02-05  5:17                                     ` Bakul Shah
2021-02-05 14:18                                       ` Larry McVoy
2021-02-05 18:16                                         ` Warner Losh
2021-02-05 18:21                                         ` ron minnich
2021-02-06  0:03                                         ` Bakul Shah
2021-02-06  2:06                                           ` Dan Cross [this message]
2021-02-06  3:01                                             ` Bakul Shah
2021-02-06  1:18                                         ` John Gilmore
2021-02-06  1:43                                           ` joe mcguckin
2021-02-06  1:55                                           ` Bakul Shah
2021-02-05 20:50                             ` Dave Horsfall
2021-02-06  0:21                               ` Brad Spencer
2021-02-06  2:22                               ` Rico Pajarola
2021-02-06  2:55                                 ` Larry McVoy
2021-02-06  3:07                                   ` Will Senn
2021-02-27  8:54                                   ` Stuart Remphrey
2021-02-06  4:55                               ` John Cowan
2021-02-04  7:46                         ` Chris Torek
2021-02-04 15:47                           ` Will Senn
2021-02-11 21:01                         ` Angel M Alganza
2021-01-30 23:09                 ` [TUHS] Favorite unix design principles? John Cowan
2021-01-30 23:22                   ` Jon Steinhart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEoi9W7ZYojXNJYcUo5pw2+v1kApAz1MA5qGW6+EQkK=8nDAxA@mail.gmail.com' \
    --to=crossd@gmail.com \
    --cc=bakul@iitbombay.org \
    --cc=tuhs@minnie.tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).