From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
References: <3C62CE67D1FF8260C5450B3D0AE7AA1A@felloff.net>
In-Reply-To: <3C62CE67D1FF8260C5450B3D0AE7AA1A@felloff.net>
From: Dan Cross <crossd@gmail.com>
Date: Wed, 10 Oct 2018 20:56:20 -0400
Message-ID: <CAEoi9W62qFgB2tj=aKEZ=PusHObnTwTDbGBqFBmzgmu=Sfepjg@mail.gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: multipart/alternative; boundary="00000000000088ee5b0577e9725f"
Subject: Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy
	negativity!)
Topicbox-Message-UUID: ebafce90-ead9-11e9-9d60-3106f5b1d025

--00000000000088ee5b0577e9725f
Content-Type: text/plain; charset="UTF-8"

On Wed, Oct 10, 2018 at 7:58 PM <cinap_lenrek@felloff.net> wrote:

> > Fundamentally zero-copy requires that the kernel and user process
> > share the same virtual address space mapped for the given operation.
>
> and it is. this doesnt make your point clear. the kernel is always mapped.
>

Meltdown has shown this to be a bad idea.

(you ment 1:1 identity mapping *PHYSICAL* pages to make the lookup cheap?)
>

plan9 doesn't use an identity mapping; it uses an offset mapping for most
of the address space and on 64-bit systems a separate mapping for the
kernel. An identity mapping from P to V is a function f such that f(a) = a.
But on 32-bit plan9, VADDR(p) = p + KZERO and PADDR(v) = v - KZERO. On
64-bit plan9 systems it's a little more complex because of the two
mappings, which vary between sub-projects: 9front appears to map the kernel
into the top 2 gigs of the address space which means that, on large
machines, the entire physical address space can't fit into the kernel.  Of
course in such situations one maps the top part of the canonical address
space for the exclusive use of supervisor code, so in that way it's a
distinction without a difference.

Of course, there are tricks to make lookups of arbitrary addresses
relatively cheap by using the MMU hardware and dedicating part of the
address space to a recursive self-map. That is, if you don't want to walk
page tables yourself, or keep a more elaborate data structure to describe
the address space.

the difference is that *USER* pages are (unless you use special segments)
> scattered randomly in physical memory or not even realized and you need
> to lookup the pages in the virtual page table to get to the physical
> addresses needed to hand them to the hardware for DMA.
>

So...walking page tables is hard? Ok....

now the *INTERESTING* thing is what happens to the original virtual
> address space that covered the I/O when someone touches into it while
> the I/O is in flight. so do we cut it out of the TLB's of ALL processes
> *SHARING* the segment? and then have the pagefault handler wait until
> the I/O is finished?


You seem to be mixing multiple things here. The physical page has to be
pinned while the DMA operation is active (unless it can be reliably
canceled). This can be done any number of ways; but so what? It's not new
and it's not black magic. Who cares about the virtual address space? If
some other processor (nb, not process -- processes don't have TLB entries,
processors do) might have a TLB entry for that mapping that you just
changed you need to shoot it down anyway: what's that have to do with
making things wait for page faulting?

The simplicity of the current scheme comes from the fact that the kernel
portion of the address *space* is effectively immutable once the kernel
gets going. That's easy, but it's not particularly flexible and other
systems do things differently (not just Linux and its ilk). I'm not saying
you *should* do it in plan9, but it's not like it hasn't been done
elegantly before.


> fuck your go routines... he wants the D.
>
> > This can't always be done and the kernel will be forced to perform a
> > copy anyway.
>
> explain *WHEN*, that would be an insight in what you'r trying to
> explain.
>
> > To wit, one of the things I added to the exynos kernel
> > early on was a 1:1 mapping of the virtual kernel address space such
> > that something like zero-copy could be possible in the future (it was
> > also very convenient to limit MMU swaps on the Cortex-A15). That said,
> > the problem gets harder when you're working on something more general
> > that can handle the entire address space. In the end, you trade the
> > complexity/performance hit of MMU management versus making a copy.
>
> don't forget the code complexity with dealing with these scattered
> pages in the *DRIVERS*.
>

It's really not that hard. The way Linux does it is pretty bad, but it's
not like that's the only way to do it.

Or don't.

        - Dan C.

--00000000000088ee5b0577e9725f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Oct 10=
, 2018 at 7:58 PM &lt;<a href=3D"mailto:cinap_lenrek@felloff.net">cinap_len=
rek@felloff.net</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">&gt;=
 Fundamentally zero-copy requires that the kernel and user process<br>
&gt; share the same virtual address space mapped for the given operation.<b=
r>
<br>
and it is. this doesnt make your point clear. the kernel is always mapped.<=
br></blockquote><div><br></div><div>Meltdown has shown this to be a bad ide=
a.</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
(you ment 1:1 identity mapping *PHYSICAL* pages to make the lookup cheap?)<=
br></blockquote><div><br></div><div>plan9 doesn&#39;t use an identity mappi=
ng; it uses an offset mapping for most of the address space and on 64-bit s=
ystems a separate mapping for the kernel. An identity mapping from P to V i=
s a function f such that f(a) =3D a. But on 32-bit plan9, VADDR(p) =3D p + =
KZERO and PADDR(v) =3D v - KZERO. On 64-bit plan9 systems it&#39;s a little=
 more complex because of the two mappings, which vary between sub-projects:=
 9front appears to map the kernel into the top 2 gigs of the address space =
which means that, on large machines, the entire physical address space can&=
#39;t fit into the kernel.=C2=A0 Of course in such situations one maps the =
top part of the canonical address space for the exclusive use of supervisor=
 code, so in that way it&#39;s a distinction without a difference.</div><di=
v><br></div><div>Of course, there are tricks to make lookups of arbitrary a=
ddresses relatively cheap by using the MMU hardware and dedicating part of =
the address space to a recursive self-map. That is, if you don&#39;t want t=
o walk page tables yourself, or keep a more elaborate data structure to des=
cribe the address space.</div><div><br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>
the difference is that *USER* pages are (unless you use special segments)<b=
r>
scattered randomly in physical memory or not even realized and you need<br>
to lookup the pages in the virtual page table to get to the physical<br>
addresses needed to hand them to the hardware for DMA.<br></blockquote><div=
><br></div><div>So...walking page tables is hard? Ok....</div><div><br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">
now the *INTERESTING* thing is what happens to the original virtual<br>
address space that covered the I/O when someone touches into it while<br>
the I/O is in flight. so do we cut it out of the TLB&#39;s of ALL processes=
<br>
*SHARING* the segment? and then have the pagefault handler wait until<br>
the I/O is finished?</blockquote><div><br></div><div>You seem to be mixing =
multiple things here. The physical page has to be pinned while the DMA oper=
ation is active (unless it can be reliably canceled). This can be done any =
number of ways; but so what? It&#39;s not new and it&#39;s not black magic.=
 Who cares about the virtual address space? If some other processor (nb, no=
t process -- processes don&#39;t have TLB entries, processors do) might hav=
e a TLB entry for that mapping that you just changed you need to shoot it d=
own anyway: what&#39;s that have to do with making things wait for page fau=
lting?</div><div><br></div><div>The simplicity of the current scheme comes =
from the fact that the kernel portion of the address *space* is effectively=
 immutable once the kernel gets going. That&#39;s easy, but it&#39;s not pa=
rticularly flexible and other systems do things differently (not just Linux=
 and its ilk). I&#39;m not saying you *should* do it in plan9, but it&#39;s=
 not like it hasn&#39;t been done elegantly before.</div><div>=C2=A0</div><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex">fuck your go routines... he wants the D.<br>
<br>
&gt; This can&#39;t always be done and the kernel will be forced to perform=
 a<br>
&gt; copy anyway.<br>
<br>
explain *WHEN*, that would be an insight in what you&#39;r trying to<br>
explain.<br>
<br>
&gt; To wit, one of the things I added to the exynos kernel<br>
&gt; early on was a 1:1 mapping of the virtual kernel address space such<br=
>
&gt; that something like zero-copy could be possible in the future (it was<=
br>
&gt; also very convenient to limit MMU swaps on the Cortex-A15). That said,=
<br>
&gt; the problem gets harder when you&#39;re working on something more gene=
ral<br>
&gt; that can handle the entire address space. In the end, you trade the<br=
>
&gt; complexity/performance hit of MMU management versus making a copy.<br>
<br>
don&#39;t forget the code complexity with dealing with these scattered<br>
pages in the *DRIVERS*.<br></blockquote><div><br></div><div>It&#39;s really=
 not that hard. The way Linux does it is pretty bad, but it&#39;s not like =
that&#39;s the only way to do it.</div><div><br></div><div>Or don&#39;t.</d=
iv><div><br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 - Dan C.</div><div><br><=
/div></div></div>

--00000000000088ee5b0577e9725f--