From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: <7aa20c07322147e6@orthanc.ca> <20181011230430.8ec148a3cb2a4d95180a4ad2@eigenstate.org> <4A1686D0-C80C-417F-A3D6-3F9EA327D35F@lsub.org> In-Reply-To: From: Charles Forsyth Date: Mon, 15 Oct 2018 17:48:34 +0100 Message-ID: To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: multipart/alternative; boundary="000000000000e709af0578473542" Subject: Re: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!) Topicbox-Message-UUID: edc3beda-ead9-11e9-9d60-3106f5b1d025 --000000000000e709af0578473542 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It's useful internally in protocol implementation, specifically to avoid copying in transport protocols (for later retransmission), and the modifications aren't vast. A few changes were trickier, often because of small bugs in the original code. icmp does some odd things i think. Btw, "zero copy" isn't the right term and I preferred another term that I've now forgotten. Minimal copying, perhaps. For one thing, messages can eventually end up being copied to contiguous blocks for devices without decent scatter-gather DMA. Messages are a tuple (mutable header stack, immutable slices of immutable data). Originally the data was organised as a tree, but nemo suggested using just an array, so I changed it. It's important that it's (logically) immutable. Headers are pushed onto and popped from the header stack, and the current stack top is mutable. There were new readmsg and writemsg system calls to carry message structures between kernel and user level. The message was immutable on writemsg. Between processes in the same program, message transfers could be done by exchanging pointers into a shared region. I'll see if I wrote up some of it. I think there were manual pages for the Messages replacing Blocks. My mcs lock implementation was probably more useful, and I use that in my copy of the kernel known as 9k Also, NUMA effects are more important in practice on big multicores. Some of the off-chip delays are brutal. On Sun, 14 Oct 2018 at 09:50, hiro <23hiro@gmail.com> wrote: > thanks, this will allow us to know where to look more closely. > > On 10/14/18, Francisco J Ballesteros wrote: > > Pure "producer/cosumer" stuff, like sending things through a pipe as > long as > > the source didn't need to touch the data ever more. > > Regarding bugs, I meant "producing bugs" not "fixing bugs", btw. > > > >> On 14 Oct 2018, at 09:34, hiro <23hiro@gmail.com> wrote: > >> > >> well, finding bugs is always good :) > >> but since i got curious could you also tell which things exactly got > >> much faster, so that we know what might be possible? > >> > >> On 10/14/18, FJ Ballesteros wrote: > >>> yes. bugs, on my side at least. > >>> The copy isolates from others. > >>> But some experiments in nix and in a thing I wrote for leanxcale show > >>> that > >>> some things can be much faster. > >>> It=E2=80=99s fun either way. > >>> > >>>> El 13 oct 2018, a las 23:11, hiro <23hiro@gmail.com> escribi=C3=B3: > >>>> > >>>> and, did it improve anything noticeably? > >>>> > >>>>> On 10/13/18, Charles Forsyth wrote: > >>>>> I did several versions of one part of zero copy, inspired by severa= l > >>>>> things > >>>>> in x-kernel, replacing Blocks by another structure throughout the > >>>>> network > >>>>> stacks and kernel, then made messages visible to user level. Nemo d= id > >>>>> another part, on his way to Clive > >>>>> > >>>>>> On Fri, 12 Oct 2018, 07:05 Ori Bernstein, > wrote: > >>>>>> > >>>>>> On Thu, 11 Oct 2018 13:43:00 -0700, Lyndon Nerenberg > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Another case to ponder ... We're handling the incoming I/Q data > >>>>>>> stream, but need to fan that out to many downstream consumers. I= f > >>>>>>> we already read the data into a page, then flip it to the first > >>>>>>> consumer, is there a benefit to adding a reference counter to tha= t > >>>>>>> read-only page and leaving the page live until the counter expire= s? > >>>>>>> > >>>>>>> Hiro clamours for benchmarks. I agree. Some basic searches I've > >>>>>>> done don't show anyone trying this out with P9 (and publishing > >>>>>>> their results). Anybody have hints/references to prior work? > >>>>>>> > >>>>>>> --lyndon > >>>>>>> > >>>>>> > >>>>>> I don't believe anyone has done the work yet. I'd be interested > >>>>>> to see what you come up with. > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Ori Bernstein > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >>> > >> > > > > > > > > --000000000000e709af0578473542 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
It's useful internally in protocol implementation, spe= cifically to avoid copying in transport protocols (for later retransmission= ), and the modifications aren't vast.
A few changes were trickier, = often because of small bugs in the original code. icmp does some odd things= i think.

Btw, "zero copy" isn't the right= term and I preferred another term that I've now forgotten. Minimal cop= ying, perhaps.
For one thing, messages can eventually end up bein= g copied to contiguous blocks for devices without decent scatter-gather DMA= .

Messages are a tuple (mutable header s= tack, immutable slices of immutable data).
Originally the data wa= s organised as a tree, but nemo suggested using just an array, so I changed= it.
It's important that it's (logically) immutable. Head= ers are pushed onto and popped from the header stack, and the current stack= top is mutable.

There were new readmsg and writem= sg system calls to carry message structures between kernel and user level.<= /div>
The message was immutable on writemsg. Between processes in the s= ame program, message transfers could be done by exchanging pointers into a = shared region.

I'll see if I wrote up some of = it. I think there were manual pages for the Messages replacing Blocks.

My mcs lock implementation was probably more useful, a= nd I use that in my copy of the kernel known as 9k
<= div>
Also, NUMA effects are more important in practice on big= multicores. Some of the off-chip delays are brutal.

On Sun, 14 Oct 2018 at 09:50, hiro <= ;23hiro@gmail.com> wrote:
thanks, this will allow us to know where = to look more closely.

On 10/14/18, Francisco J Ballesteros <nemo@lsub.org> wrote:
> Pure "producer/cosumer" stuff, like sending things through a= pipe as long as
> the source didn't need to touch the data ever more.
> Regarding bugs, I meant "producing bugs" not "fixing bu= gs", btw.
>
>> On 14 Oct 2018, at 09:34, hiro <23hiro@gmail.com> wrote:
>>
>> well, finding bugs is always good :)
>> but since i got curious could you also tell which things exactly g= ot
>> much faster, so that we know what might be possible?
>>
>> On 10/14/18, FJ Ballesteros <nemo@lsub.org> wrote:
>>> yes. bugs, on my side at least.
>>> The copy isolates from others.
>>> But some experiments in nix and in a thing I wrote for leanxca= le show
>>> that
>>> some things can be much faster.
>>> It=E2=80=99s fun either way.
>>>
>>>> El 13 oct 2018, a las 23:11, hiro <23hiro@gmail.com> escribi=C3=B3: >>>>
>>>> and, did it improve anything noticeably?
>>>>
>>>>> On 10/13/18, Charles Forsyth <charles.forsyth@gmail.com>= wrote:
>>>>> I did several versions of one part of zero copy, inspi= red by several
>>>>> things
>>>>> in x-kernel, replacing Blocks by another structure thr= oughout the
>>>>> network
>>>>> stacks and kernel, then made messages visible to user = level. Nemo did
>>>>> another part, on his way to Clive
>>>>>
>>>>>> On Fri, 12 Oct 2018, 07:05 Ori Bernstein, <ori@eigenstate.org&= gt; wrote:
>>>>>>
>>>>>> On Thu, 11 Oct 2018 13:43:00 -0700, Lyndon Nerenbe= rg
>>>>>> <lyndon@orthanc.ca>
>>>>>> wrote:
>>>>>>
>>>>>>> Another case to ponder ...=C2=A0 =C2=A0We'= re handling the incoming I/Q data
>>>>>>> stream, but need to fan that out to many downs= tream consumers.=C2=A0 If
>>>>>>> we already read the data into a page, then fli= p it to the first
>>>>>>> consumer, is there a benefit to adding a refer= ence counter to that
>>>>>>> read-only page and leaving the page live until= the counter expires?
>>>>>>>
>>>>>>> Hiro clamours for benchmarks.=C2=A0 I agree.= =C2=A0 Some basic searches I've
>>>>>>> done don't show anyone trying this out wit= h P9 (and publishing
>>>>>>> their results).=C2=A0 Anybody have hints/refer= ences to prior work?
>>>>>>>
>>>>>>> --lyndon
>>>>>>>
>>>>>>
>>>>>> I don't believe anyone has done the work yet. = I'd be interested
>>>>>> to see what you come up with.
>>>>>>
>>>>>>
>>>>>> --
>>>>>>=C2=A0 =C2=A0Ori Bernstein
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>

--000000000000e709af0578473542--