Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)
@ 2018-10-10 23:58 cinap_lenrek
  2018-10-11  0:56 ` Dan Cross
  0 siblings, 1 reply; 6+ messages in thread
From: cinap_lenrek @ 2018-10-10 23:58 UTC (permalink / raw)
  To: 9fans

> Fundamentally zero-copy requires that the kernel and user process
> share the same virtual address space mapped for the given operation.

and it is. this doesnt make your point clear. the kernel is always mapped.
(you ment 1:1 identity mapping *PHYSICAL* pages to make the lookup cheap?)

the difference is that *USER* pages are (unless you use special segments)
scattered randomly in physical memory or not even realized and you need
to lookup the pages in the virtual page table to get to the physical
addresses needed to hand them to the hardware for DMA.

now the *INTERESTING* thing is what happens to the original virtual
address space that covered the I/O when someone touches into it while
the I/O is in flight. so do we cut it out of the TLB's of ALL processes
*SHARING* the segment? and then have the pagefault handler wait until
the I/O is finished? fuck your go routines... he wants the D.

> This can't always be done and the kernel will be forced to perform a
> copy anyway.

explain *WHEN*, that would be an insight in what you'r trying to
explain.

> To wit, one of the things I added to the exynos kernel
> early on was a 1:1 mapping of the virtual kernel address space such
> that something like zero-copy could be possible in the future (it was
> also very convenient to limit MMU swaps on the Cortex-A15). That said,
> the problem gets harder when you're working on something more general
> that can handle the entire address space. In the end, you trade the
> complexity/performance hit of MMU management versus making a copy.

don't forget the code complexity with dealing with these scattered
pages in the *DRIVERS*.

> Believe it or not, sometimes copies can be faster, especially on
> larger NUMA systems.

--
cinap

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)
  2018-10-10 23:58 [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!) cinap_lenrek
@ 2018-10-11  0:56 ` Dan Cross
  2018-10-11  2:26   ` Steven Stallion
  2018-10-11  2:30   ` Bakul Shah
  0 siblings, 2 replies; 6+ messages in thread
From: Dan Cross @ 2018-10-11  0:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 4065 bytes --]

On Wed, Oct 10, 2018 at 7:58 PM <cinap_lenrek@felloff.net> wrote:

> > Fundamentally zero-copy requires that the kernel and user process
> > share the same virtual address space mapped for the given operation.
>
> and it is. this doesnt make your point clear. the kernel is always mapped.
>

Meltdown has shown this to be a bad idea.

(you ment 1:1 identity mapping *PHYSICAL* pages to make the lookup cheap?)
>

plan9 doesn't use an identity mapping; it uses an offset mapping for most
of the address space and on 64-bit systems a separate mapping for the
kernel. An identity mapping from P to V is a function f such that f(a) = a.
But on 32-bit plan9, VADDR(p) = p + KZERO and PADDR(v) = v - KZERO. On
64-bit plan9 systems it's a little more complex because of the two
mappings, which vary between sub-projects: 9front appears to map the kernel
into the top 2 gigs of the address space which means that, on large
machines, the entire physical address space can't fit into the kernel.  Of
course in such situations one maps the top part of the canonical address
space for the exclusive use of supervisor code, so in that way it's a
distinction without a difference.

Of course, there are tricks to make lookups of arbitrary addresses
relatively cheap by using the MMU hardware and dedicating part of the
address space to a recursive self-map. That is, if you don't want to walk
page tables yourself, or keep a more elaborate data structure to describe
the address space.

the difference is that *USER* pages are (unless you use special segments)
> scattered randomly in physical memory or not even realized and you need
> to lookup the pages in the virtual page table to get to the physical
> addresses needed to hand them to the hardware for DMA.
>

So...walking page tables is hard? Ok....

now the *INTERESTING* thing is what happens to the original virtual
> address space that covered the I/O when someone touches into it while
> the I/O is in flight. so do we cut it out of the TLB's of ALL processes
> *SHARING* the segment? and then have the pagefault handler wait until
> the I/O is finished?

You seem to be mixing multiple things here. The physical page has to be
pinned while the DMA operation is active (unless it can be reliably
canceled). This can be done any number of ways; but so what? It's not new
and it's not black magic. Who cares about the virtual address space? If
some other processor (nb, not process -- processes don't have TLB entries,
processors do) might have a TLB entry for that mapping that you just
changed you need to shoot it down anyway: what's that have to do with
making things wait for page faulting?

The simplicity of the current scheme comes from the fact that the kernel
portion of the address *space* is effectively immutable once the kernel
gets going. That's easy, but it's not particularly flexible and other
systems do things differently (not just Linux and its ilk). I'm not saying
you *should* do it in plan9, but it's not like it hasn't been done
elegantly before.

> fuck your go routines... he wants the D.
>
> > This can't always be done and the kernel will be forced to perform a
> > copy anyway.
>
> explain *WHEN*, that would be an insight in what you'r trying to
> explain.
>
> > To wit, one of the things I added to the exynos kernel
> > early on was a 1:1 mapping of the virtual kernel address space such
> > that something like zero-copy could be possible in the future (it was
> > also very convenient to limit MMU swaps on the Cortex-A15). That said,
> > the problem gets harder when you're working on something more general
> > that can handle the entire address space. In the end, you trade the
> > complexity/performance hit of MMU management versus making a copy.
>
> don't forget the code complexity with dealing with these scattered
> pages in the *DRIVERS*.
>

It's really not that hard. The way Linux does it is pretty bad, but it's
not like that's the only way to do it.

Or don't.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 5184 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)
  2018-10-11  0:56 ` Dan Cross
@ 2018-10-11  2:26   ` Steven Stallion
  2018-10-11  2:30   ` Bakul Shah
  1 sibling, 0 replies; 6+ messages in thread
From: Steven Stallion @ 2018-10-11  2:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Oct 10, 2018 at 8:20 PM Dan Cross <crossd@gmail.com> wrote:
>> don't forget the code complexity with dealing with these scattered
>> pages in the *DRIVERS*.
>
> It's really not that hard. The way Linux does it is pretty bad, but it's not like that's the only way to do it.

SunOS and Win32 (believe it or not) managed to get this "right";
dealing with zero-copy in those kernels was a non-event. I'm not sure
I understand the assertion how this would affect constituent drivers.
This sort of detail is handled at a higher level - the driver
generally operates on a buffer that gets jammed into a ring for DMA
transfer. Apart from grabbing the physical address, the worst you may
have to do is pin/unpin the block for the DMA operation. From the
driver's perspective, it's memory. It doesn't matter where it came
from (or who owns it for that matter).

> Or don't.

There's a lot to be said for keeping it simple...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)
  2018-10-11  0:56 ` Dan Cross
  2018-10-11  2:26   ` Steven Stallion
@ 2018-10-11  2:30   ` Bakul Shah
  2018-10-11  3:20     ` Steven Stallion
  1 sibling, 1 reply; 6+ messages in thread
From: Bakul Shah @ 2018-10-11  2:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, 10 Oct 2018 20:56:20 -0400 Dan Cross <crossd@gmail.com> wrote:
>
> On Wed, Oct 10, 2018 at 7:58 PM <cinap_lenrek@felloff.net> wrote:
>
> > > Fundamentally zero-copy requires that the kernel and user process
> > > share the same virtual address space mapped for the given operation.
> >
> > and it is. this doesnt make your point clear. the kernel is always mapped.
> >
>
> Meltdown has shown this to be a bad idea.

People still do this.

> > (you ment 1:1 identity mapping *PHYSICAL* pages to make the lookup cheap?)

Steve wrote "1:1 mapping of the virtual kernel address space such
that something like zero-copy could be possible"

Not sure what he meant. For zero copy you need to *directly*
write to the memory allocated to a process. 1:1 mapping is
really not needed.

> plan9 doesn't use an identity mapping; it uses an offset mapping for most
> of the address space and on 64-bit systems a separate mapping for the
> kernel. An identity mapping from P to V is a function f such that f(a) = a.
> But on 32-bit plan9, VADDR(p) = p + KZERO and PADDR(v) = v - KZERO. On
> 64-bit plan9 systems it's a little more complex because of the two
> mappings, which vary between sub-projects: 9front appears to map the kernel
> into the top 2 gigs of the address space which means that, on large
> machines, the entire physical address space can't fit into the kernel.  Of
> course in such situations one maps the top part of the canonical address
> space for the exclusive use of supervisor code, so in that way it's a
> distinction without a difference.
>
> Of course, there are tricks to make lookups of arbitrary addresses
> relatively cheap by using the MMU hardware and dedicating part of the
> address space to a recursive self-map. That is, if you don't want to walk
> page tables yourself, or keep a more elaborate data structure to describe
> the address space.
>
> > the difference is that *USER* pages are (unless you use special segments)
> > scattered randomly in physical memory or not even realized and you need
> > to lookup the pages in the virtual page table to get to the physical
> > addresses needed to hand them to the hardware for DMA.

If you don't copy, you do need to find all the physical pages.
This is not really expensive and many OSes do precisely this.

If you copy, you can avoid walking the page table. But for
that to work, the kernel virtual space needs to mapped 1:1 in
*every* process -- this is because any cached data will be in
kernel space and must be availabele in all processes.

In fact the *main* reason this was done was to facilitate such
copying. Had we always done zero-copy, we could've avoided
Meltdown altogether. copyin/copyout of syscall arguments
shouldn't be expensive.

> So...walking page tables is hard? Ok....
>
> > now the *INTERESTING* thing is what happens to the original virtual
> > address space that covered the I/O when someone touches into it while
> > the I/O is in flight. so do we cut it out of the TLB's of ALL processes
> > *SHARING* the segment? and then have the pagefault handler wait until
> > the I/O is finished?

In general, the way this works is a bit different. In an
mmap() scenario, the initial mapping simply allocates the
necessary PTEs and marks them so that *any* read/write access
will incur a page fault.  At this time if the underlying page
is found to be cached, it is linked to the PTE and the
relevant access bit changed to allow the access. if not, the
process has to wait until the page is read in, at which time
it be linked with the relevant PTE(s). Even if the same file
page is mapped in N processes, the same thing happens. The
kernel does have to do some bookkeeping as the same file data
may be referenced from multiple places.

> You seem to be mixing multiple things here. The physical page has to be
> pinned while the DMA operation is active (unless it can be reliably
> canceled). This can be done any number of ways; but so what? It's not new
> and it's not black magic. Who cares about the virtual address space? If
> some other processor (nb, not process -- processes don't have TLB entries,
> processors do) might have a TLB entry for that mapping that you just
> changed you need to shoot it down anyway: what's that have to do with
> making things wait for page faulting?

Indeed.

> The simplicity of the current scheme comes from the fact that the kernel
> portion of the address *space* is effectively immutable once the kernel
> gets going. That's easy, but it's not particularly flexible and other
> systems do things differently (not just Linux and its ilk). I'm not saying
> you *should* do it in plan9, but it's not like it hasn't been done
> elegantly before.
>
>
> > fuck your go routines... he wants the D.

What?!

> > > This can't always be done and the kernel will be forced to perform a
> > > copy anyway.

In general this is wrong. None of this is new. By decades.
Theoreticlly even regular read/write can use mapping behind
the scenes. [Save the old V->P map to deal with any IO error,
remove the same from caller's pagetable, read in first (few)
pages) and mark the rest as if newly allocated, commence a
fetch in the background mode and return]

But note that the io driver should *not* do any prefetch --
that is left upto the caching or FS layer.

> > explain *WHEN*, that would be an insight in what you'r trying to
> > explain.
> >
> > > To wit, one of the things I added to the exynos kernel
> > > early on was a 1:1 mapping of the virtual kernel address space such
> > > that something like zero-copy could be possible in the future (it was
> > > also very convenient to limit MMU swaps on the Cortex-A15). That said,
> > > the problem gets harder when you're working on something more general
> > > that can handle the entire address space. In the end, you trade the
> > > complexity/performance hit of MMU management versus making a copy.
> >
> > don't forget the code complexity with dealing with these scattered
> > pages in the *DRIVERS*.
> >
>
> It's really not that hard. The way Linux does it is pretty bad, but it's
> not like that's the only way to do it.
>
> Or don't.

People should think about how things were done prior to Linux
so as to avoid its reality distortion field.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)
  2018-10-11  2:30   ` Bakul Shah
@ 2018-10-11  3:20     ` Steven Stallion
  2018-10-11 14:21       ` [9fans] ..... UNSUBSCRIBE_HELP NEEDED DHAN HURLEY
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Stallion @ 2018-10-11  3:20 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Oct 10, 2018 at 9:32 PM Bakul Shah <bakul@bitblocks.com> wrote:
> Steve wrote "1:1 mapping of the virtual kernel address space such
> that something like zero-copy could be possible"
>
> Not sure what he meant. For zero copy you need to *directly*
> write to the memory allocated to a process. 1:1 mapping is
> really not needed.

Ugh. I could have worded that better. That was a (very) clumsy attempt
at stating that the kernel would have to support remapping the user
buffer to virtual kernel space. Fortunately Plan 9 doesn't page out
kernel memory, so pinning wouldn't be required.

Cheers,
Steve



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [9fans] ..... UNSUBSCRIBE_HELP NEEDED
  2018-10-11  3:20     ` Steven Stallion
@ 2018-10-11 14:21       ` DHAN HURLEY
  0 siblings, 0 replies; 6+ messages in thread
From: DHAN HURLEY @ 2018-10-11 14:21 UTC (permalink / raw)

Hi Steven,
Could you please tell me how to subscribe from this EMAIL LIST.
I don't have the time,or my own computer,to find the Webpage.
Today i have nearly 20 messages from the PLAN 9 LIST.
I am also getting a lot from the HAIKU LIST.
Many thanks,
Dhan Hurley

DIASPORA COMMUNITY,MY PROFILE PAGE :-

IRISH, MUSICIAN, POET, PHYSICS, "FREE-ENERGY", LINUX PROFI, ALTERNATIVE SYSTEMS, GEOPOLYMERS, ALTERNATIVE HEALTH-FARMING-MATERIALS, MYSTIC, etc.

https://despora.de/people/6d39a7e04a610132027a42cdb1fcde73

      From: Steven Stallion <sstallion at gmail.com>
 To: Fans of the OS Plan 9 from Bell Labs <9fans at 9fans.net> 
 Sent: Thursday, October 11, 2018 5:21 AM
 Subject: Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)

On Wed, Oct 10, 2018 at 9:32 PM Bakul Shah <bakul at bitblocks.com> wrote:
> Steve wrote "1:1 mapping of the virtual kernel address space such
> that something like zero-copy could be possible"
>
> Not sure what he meant. For zero copy you need to *directly*
> write to the memory allocated to a process. 1:1 mapping is
> really not needed.

Ugh. I could have worded that better. That was a (very) clumsy attempt
at stating that the kernel would have to support remapping the user
buffer to virtual kernel space. Fortunately Plan 9 doesn't page out
kernel memory, so pinning wouldn't be required.

Cheers,
Steve

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.9fans.net/private/9fans/attachments/20181011/2d493f13/attachment.html>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-11 14:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-10 23:58 [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!) cinap_lenrek
2018-10-11  0:56 ` Dan Cross
2018-10-11  2:26   ` Steven Stallion
2018-10-11  2:30   ` Bakul Shah
2018-10-11  3:20     ` Steven Stallion
2018-10-11 14:21       ` [9fans] ..... UNSUBSCRIBE_HELP NEEDED DHAN HURLEY

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).