On Fri, Feb 5, 2021, 7:19 AM Larry McVoy <lm@mcvoy.com> wrote:
On Thu, Feb 04, 2021 at 09:17:54PM -0800, Bakul Shah wrote:
> On Feb 4, 2021, at 4:33 PM, Larry McVoy <lm@mcvoy.com> wrote:
> >
> > Ignoring the page cache and make their own cache has big problems.
> > You can mmap() ZFS files and doing so means that when a page is referenced
> > it is copied from the ZFS cache to the page cache.  That creates a
> > coherency problem, I can write via the mapping and I can write via
> > write(2) and now you have two copies of the data that don't match,
> > that's pretty much OS no-no #1.
>
> Write(2)ing to a mapped page sounds pretty dodgy. Likely to get you
> in trouble in any case. Similarly read(2)ing.

The entire point of the SunOS 4.0 VM system was that the page you
saw via mmap(2) is the exact same page you saw via read(2).  It's
the page cache, it has page sized chunks of memory that cache
file,offset pairs.

There is one, and only one, copy of the truth.  Doesn't matter how
you get at it, there is only one "it".

ZFS broke that contract and that was a step backwards in terms of
OS design.

The double copy is the primary reason we don't use it to store videos we serve. It's a performance bottleneck as well.

And fixing it is... rather involved... possible, but a lot of work to teach the ARC about the buffer cache or the buffer cache about the ARC...

But for everything else I do, I accept the imperfect design because of all the other features it unlocks.

Warner