9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Frank D. Engel, Jr." <fde101@fjrhome.net>
To: 9fans@9fans.net
Subject: Re: mmaping on plan9? (was Re: [9fans] venti /plan9port mmapped
Date: Mon, 16 Feb 2026 05:55:55 -0500	[thread overview]
Message-ID: <d0135221-5344-4d1a-aff2-e88f865b24c8@fjrhome.net> (raw)
In-Reply-To: <20260215221737.76b53658158e65fe6d1b853d@eigenstate.org>

Technically it could also be worked around on a more traditionally 
designed OS when working with strictly local filesystems - even if 
shared - as long as the local system was in full control of the 
filesystem any writes would go through a common kernel where the changes 
could be managed and either controlled or synced in some acceptable 
manner if multiple processes were accessing the same file.

It breaks completely on Plan 9 systems due to the nature of all 
filesystems - even local filesystems - being accessed from a server 
process to which the client processes cannot safely assume to have 
exclusive access.  As the server may or may not be remote and may or may 
not be in use by multiple consumers the kernel can't safely assume 
exclusive access and this becomes something of a non-starter unless 
baked into the protocol at a fundamental level.

Since 9p was never designed for this, you would be breaking 
compatibility to get this to work safely in a sane manner with existing 
filesystem implementations.

You could theoretically work around that by extending the protocol in a 
detectable way to provide the required support and only enabling this 
feature for filesystems which declare correct implementation of the 
extensions, but I am also of the school that it is not clear how much of 
a benefit this modification really provides and whether or not it would 
be worth putting in the effort.

Of course if you were creating something completely new without the need 
to keep compatibility with existing 9p filesystems then you could 
engineer your new system with this goal in mind from the beginning - 
that would be an entirely different matter.


On 2/15/26 22:17, Ori Bernstein wrote:
> The difficulty here is that having read mark a region
> as paged in "later" delays the actual I/O, by which
> time the file contents may have changed, and your
> read returns incorrect results.
>
> This idea can work if your OS has a page cache, the
> data is already in the page cache, and you eagerly
> read the data that is not loaded -- but the delayed
> i/o semantics otherwise simply break.
>
> fixing this would need deep filesystem-level help,
> where the filesystem would need to take a snapshot
> when the read is invoked, in order to prevent any
> subsequent mutations from being visible to the reader.
>
> (on most Plan 9 file systems, this per-file snapshot
> is fairly expensive; on gefs, for example, this would
> snapshot all files within the mount)
>
> On Sun, 15 Feb 2026 21:24:32 -0500
> "Alyssa M via 9fans" <9fans@9fans.net> wrote:
>
>> I think the difficulty here is thinking about this as memory mapping. What I'm really doing is deferred I/O. By the time a read completes, the read has logically happened, it's just that not all of the data has been transferred yet.
>> That happens later as the buffer is examined, and if pages of the buffer are not examined, it doesn't happen in those pages at all.
>>
>> My implementation (on my hobby OS) only does this in a custom segment type. A segment of this type can be of any size, but is not pre-allocated pages in memory or the swap file - I do this to allow it to be very large, and because a read has to happen within the boundaries of a segment. I back it with a file system temporary file, so when pages migrate to the swap area the disk allocation can be sparse. You can load or store bytes anywhere in this segment. Touching pages allocates them, first in memory and eventually in the swap file as they get paged out.
>>
>> On Saturday, February 14, 2026, at 2:27 PM, Dan Cross wrote:
>>> but
>> read/write work in terms of byte buffers that have no obligation to be
>> byte aligned. Put another way, read and write relate the contents of a
>> "file" with an arbitrarily sized and aligned byte-buffer in memory,
>> but there is no obligation that those byte buffers have the properties
>> required to be a "page" in the virtual memory sense.
>> Understood. My current implementation does conventional I/O with any fragments of pages at the beginning and end of the read/write buffers. So small reads and writes happen traditionally. At the moment that's done before the read completes, so your example of doing lots of adjacent reads of small areas would work very badly (few pages would get the deferred loading), but I think I can do better by deferring the fragment I/O, so adjacent reads can coalesce the snapshots. My main scenario of interest though is for very large reads and writes, because that's where the sparse access has value.
>>
>> Because reads are copies and not memory mapping, it doesn't matter if the reads are not page-aligned. The process's memory pages are not being shared with the cache of the file (snapshot), so if the data is not aligned then page faults will copy bytes from two cached file blocks (assuming they're the same size). In practice I'm expecting that large reads will be into large allocations, which will be aligned, so there's an opportunity to steal blocks from the file cache. But I'm not expecting to implement this. There's no coherence problem here because the snapshot is private to the process. And readonly.
>>
>> When I do a read call into the segment, firstly a snapshot is made of the data to be read. This is functionally equivalent to making a temporary file and copying the data into it. Making this copy-on-write so the snapshot costs nothing is a key part of this without which there would be no point.
>> The pages of the read buffer in the segment are then associated with parts of the snapshot - rather than the swap file. So rather than zero filling (or reloading paged-out data) when a load instruction is executed, the memory pages are filled from the snapshot.
>> When a store instruction happens, the page becomes dirty, and loses its association with the snapshot. It's then backed by the swap file. If you alter all pages of the buffer, then all pages are disconnected from the snapshot, and the snapshot is deleted. At that point you can't tell that anything unconventional happened.
>> If I 'read over' a buffer with something else, the pages get associated with the new snapshot, and disassociated from the old one.
>>
>> When I do a write call, the write call looks at each page, and decides whether it is part of a snapshot. If it is, and we're writing back to the same part of the same file (an update) and the corresponding block has not been changed in the file, then the write call can skip that page. In other cases it actually writes to the file. Any other writing to the file that we made a snapshot from invokes the copy-on-write mechanism, so the file changes, but the snapshot doesn't.
>>
>> If you freed the read buffer memory, then parts of it might get demand loaded in the act of writing malloc's book-keeping information into it - depending on how the malloc works. If you later use calloc (or memset), it will zero the memory, which will detach it all from the snapshot, albeit loading every page from the snapshot as it goes...
>> One could change calloc to read from /dev/zero for allocations over a certain size, and special-case that to set up pages for zero-fill when it happens in this type of segment, which would disassociate the pages from the old snapshot without loading them, just as any other subsequent read does. A memset syscall might be better.
>> Practically, though, I think malloc and free are not likely to be used in this type of segment. You'd probably just detach the segment rather than free parts of it, but I've illustrated how you could drop the deferred snapshot if you needed to.
>>
>> So this is not mmap by another name. It's an optimization of the standard read/write approach that has some of the desirable characteristics of mmap. In particular: it lets you do an arbitrarily large read call instantly, and fault in just the pages you actually need as you need them. So like demand-paging, but from a snapshot of a file. Similarly, if you're writing back to the same file region, write will only write the pages that have altered - either in memory or in the file. This is effectively an update, somewhat like msync.
>>
>> It's different from mmap in some ways: the data read is always a copy of the file contents, so there's never any spooky changing of memory under your feet. The behaviour is not detectably different to the program from the traditional implementation - except for where and if the time is spent.
>>
>> There's still more I could add, but if I'm still not making sense, perhaps I'd better stop there. I think I've ended up making it sound more complicated than it is.
>>
>> On Sunday, February 15, 2026, at 10:19 AM, hiro wrote:
>>> since you give no reasons yourself, let me try to hallucinate a reason
>> why you might be doing what you're doing here:
>>
>> Here was my example for you:
>>
>> On Thursday, February 12, 2026, at 1:34 PM, Alyssa M wrote:
>>> I've built a couple of simple disk file systems. I thinking of taking the cache code out of one of them and mapping the whole file system image into the address space - to see how much it simplifies the code. I'm not expecting it will be faster.
>> This is interesting because it's a large data structure that's very sparsely read or written. I'd read the entire file system image into the segment in one gulp, respond to some file protocol requests (e.g. over 9P) by treating the segment as a single data structure, and write the entire image out periodically to implement what we used to call 'sync'.
>> With traditional I/O that would be ridiculous. With the above mechanism it should work about as well as mmap would. And without all that cache code and block fetching. Which is the point of this.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/Te8d7c6e48b5c075b-Mdd1b6431d6b56fc32ce1d597
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

  reply	other threads:[~2026-02-16 13:22 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-02 19:54 [9fans] venti /plan9port mmapped wb.kloke
2026-01-02 20:39 ` ori
2026-01-02 20:58   ` Bakul Shah via 9fans
2026-01-06 22:59     ` Ron Minnich
2026-01-07  4:27       ` Noam Preil
2026-01-07  6:15       ` Shawn Rutledge
2026-01-07 15:46         ` Persistent memory (was Re: [9fans] venti /plan9port mmapped) arnold
2026-01-07 16:11           ` Noam Preil
2026-01-07 17:26             ` Wes Kussmaul
2026-01-07  8:52       ` [9fans] venti /plan9port mmapped wb.kloke
2026-01-07 16:30         ` mmaping on plan9? (was " Bakul Shah via 9fans
2026-01-07 16:40           ` Noam Preil
2026-01-07 16:41           ` ori
2026-01-07 20:35             ` Bakul Shah via 9fans
2026-01-07 21:31               ` ron minnich
2026-01-08  7:56                 ` arnold
2026-01-08 10:31                 ` wb.kloke
2026-01-09  0:02                   ` ron minnich
2026-01-09  3:57                 ` Paul Lalonde
2026-01-09  5:10                   ` ron minnich
2026-01-09  5:18                     ` arnold
2026-01-09  6:06                       ` David Leimbach via 9fans
2026-01-09 17:13                         ` ron minnich
2026-01-09 17:39                         ` tlaronde
2026-01-09 19:48                           ` David Leimbach via 9fans
2026-02-05 21:30                             ` Alyssa M via 9fans
2026-02-08 14:18                               ` Ethan Azariah
2026-02-08 15:10                                 ` Alyssa M via 9fans
2026-02-08 20:43                                   ` Ethan Azariah
2026-02-09  1:35                                     ` ron minnich
2026-02-09 15:23                                       ` ron minnich
2026-02-09 17:13                                         ` Bakul Shah via 9fans
2026-02-09 21:38                                           ` ron minnich
2026-02-10 10:13                                         ` Alyssa M via 9fans
2026-02-11  1:43                                           ` Ron Minnich
2026-02-11  2:19                                           ` Bakul Shah via 9fans
2026-02-11  3:21                                           ` Ori Bernstein
2026-02-11 10:01                                             ` hiro
2026-02-12  1:36                                               ` Dan Cross
2026-02-12  5:39                                                 ` Alyssa M via 9fans
2026-02-12  9:08                                                   ` hiro via 9fans
2026-02-12 13:34                                                   ` Alyssa M via 9fans
2026-02-13 13:48                                                     ` hiro
2026-02-13 17:21                                                     ` ron minnich
2026-02-15 16:12                                                       ` Danny Wilkins via 9fans
2026-02-17  3:13                                                         ` Alyssa M via 9fans
2026-02-17 13:02                                                           ` Dan Cross
2026-02-17 16:00                                                             ` ron minnich
2026-02-17 16:39                                                               ` hiro
2026-02-17 16:56                                                             ` Bakul Shah via 9fans
2026-02-17 17:54                                                               ` hiro
2026-02-17 22:21                                                               ` Alyssa M via 9fans
2026-02-16  2:24                                                       ` Alyssa M via 9fans
2026-02-16  3:17                                                         ` Ori Bernstein
2026-02-16 10:55                                                           ` Frank D. Engel, Jr. [this message]
2026-02-16 13:49                                                             ` Ori Bernstein
2026-02-16 19:40                                                           ` Bakul Shah via 9fans
2026-02-16 19:43                                                             ` Bakul Shah via 9fans
2026-02-16  9:50                                                         ` tlaronde
2026-02-16 12:24                                                         ` hiro via 9fans
2026-02-16 12:33                                                         ` hiro via 9fans
2026-02-11 14:22                                             ` Dan Cross
2026-02-11 18:44                                               ` Ori Bernstein
2026-02-12  1:22                                                 ` Dan Cross
2026-02-12  4:26                                                   ` Ori Bernstein
2026-02-12  4:34                                                     ` Dan Cross
2026-02-12  3:12                                             ` Alyssa M via 9fans
2026-02-12  4:52                                               ` Dan Cross
2026-02-12  8:37                                                 ` Alyssa M via 9fans
2026-02-12 12:37                                                   ` hiro via 9fans
2026-02-13  1:36                                                   ` Dan Cross
2026-02-14  3:35                                                     ` Alyssa M via 9fans
2026-02-14 14:26                                                       ` Dan Cross
2026-02-15  4:34                                                   ` Bakul Shah via 9fans
2026-02-15 10:19                                                     ` hiro
2026-02-10 16:49                                         ` wb.kloke
2026-02-08 14:08                             ` Ethan Azariah
2026-01-07 21:40               ` ori
2026-01-07 16:52           ` ori
2026-01-07 17:37             ` wb.kloke
2026-01-07 17:46               ` Noam Preil
2026-01-07 17:56                 ` wb.kloke
2026-01-07 18:07                   ` Noam Preil
2026-01-07 18:58                     ` wb.kloke
2026-01-07 14:57       ` Thaddeus Woskowiak
2026-01-07 16:07         ` Wes Kussmaul
2026-01-07 16:22           ` Noam Preil
2026-01-07 17:31             ` Wes Kussmaul
2026-01-07 16:13         ` Noam Preil
2026-01-02 21:01   ` ori
2026-01-08 15:59     ` wb.kloke
2026-02-11 23:19       ` red

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0135221-5344-4d1a-aff2-e88f865b24c8@fjrhome.net \
    --to=fde101@fjrhome.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).