From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
References: <F022808EFD5F0B8A438B332081972DD2@felloff.net>
	<CAGGHmKGPVcmq2XkYe21rmPMf2JjdYbUn4GgjanMrQEWg0TW41A@mail.gmail.com>
	<CAHqDL_-zc16LegTzxKABnDmxaMonFUf0D513D_usha53KmQzcw@mail.gmail.com>
	<CAFSF3XPbCOHb6Wbobud7S1dn124ekN9uJMO=KBx-c5fifWz=oQ@mail.gmail.com>
In-Reply-To: <CAFSF3XPbCOHb6Wbobud7S1dn124ekN9uJMO=KBx-c5fifWz=oQ@mail.gmail.com>
From: Ole-Hjalmar Kristensen <ole.hjalmar.kristensen@gmail.com>
Date: Sun, 14 Oct 2018 19:34:35 +0200
Message-ID: <CAHqDL__cjaCwSm86XW5vcQvUWm6aYyogD6jrAAxoOM6oY04maw@mail.gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: multipart/alternative; boundary="000000000000ad90b3057833bc0d"
Subject: Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Topicbox-Message-UUID: eda4f5a4-ead9-11e9-9d60-3106f5b1d025

--000000000000ad90b3057833bc0d
Content-Type: text/plain; charset="UTF-8"

OK, that makes sense. So it would not stop a client from for example first
read an index block in a B-tree, wait for the result, and then issue read
operations for all the data blocks in parallel. That's exactly the same as
any asynchronous disk subsystem I am acquainted with. Reordering is the
norm.

On Sun, Oct 14, 2018 at 1:21 PM hiro <23hiro@gmail.com> wrote:

> there's no tyranny involved.
>
> a client that is fine with the *responses* coming in reordered could
> remember the tag obviously and do whatever you imagine.
>
> the problem is potential reordering of the messages in the kernel
> before responding, even if the 9p transport has guaranteed ordering.
>
> On 10/14/18, Ole-Hjalmar Kristensen <ole.hjalmar.kristensen@gmail.com>
> wrote:
> > I'm not going to argue with someone who has got his hands dirty by
> actually
> > doing this but I don't really get this about the tyranny of 9p. Isn't the
> > point of the tag field to identify the request? What is stopping the
> client
> > from issuing multiple requests and match the replies based on the tag?
> From
> > the manual:
> >
> > Each T-message has a tag field, chosen and used by the
> >           client to identify the message.  The reply to the message
> >           will have the same tag.  Clients must arrange that no two
> >           outstanding messages on the same connection have the same
> >           tag.  An exception is the tag NOTAG, defined as (ushort)~0
> >           in <fcall.h>: the client can use it, when establishing a
> >           connection, to override tag matching in version messages.
> >
> >
> >
> > Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion <sstallion@gmail.com
> >:
> >
> >> As the guy who wrote the majority of the code that pushed those 1M 4K
> >> random IOPS erik mentioned, this thread annoys the shit out of me. You
> >> don't get an award for writing a driver. In fact, it's probably better
> >> not to be known at all considering the bloody murder one has to commit
> >> to marry hardware and software together.
> >>
> >> Let's be frank, the I/O handling in the kernel is anachronistic. To
> >> hit those rates, I had to add support for asynchronous and vectored
> >> I/O not to mention a sizable bit of work by a co-worker to properly
> >> handle NUMA on our appliances to hit those speeds. As I recall, we had
> >> to rewrite the scheduler and re-implement locking, which even Charles
> >> Forsyth had a hand in. Had we the time and resources to implement
> >> something like zero-copy we'd have done it in a heartbeat.
> >>
> >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9
> >> - as soon as you put a 9P-based filesystem on it, it's going to be
> >> limited to a single outstanding operation. This is the tyranny of 9P.
> >> We (Coraid) got around this by avoiding filesystems altogether.
> >>
> >> Go solve that problem first.
> >> On Wed, Oct 10, 2018 at 12:36 PM <cinap_lenrek@felloff.net> wrote:
> >> >
> >> > > But the reason I want this is to reduce latency to the first
> >> > > access, especially for very large files. With read() I have
> >> > > to wait until the read completes. With mmap() processing can
> >> > > start much earlier and can be interleaved with background
> >> > > data fetch or prefetch. With read() a lot more resources
> >> > > are tied down. If I need random access and don't need to
> >> > > read all of the data, the application has to do pread(),
> >> > > pwrite() a lot thus complicating it. With mmap() I can just
> >> > > map in the whole file and excess reading (beyond what the
> >> > > app needs) will not be a large fraction.
> >> >
> >> > you think doing single 4K page sized reads in the pagefault
> >> > handler is better than doing precise >4K reads from your
> >> > application? possibly in a background thread so you can
> >> > overlap processing with data fetching?
> >> >
> >> > the advantage of mmap is not prefetch. its about not to do
> >> > any I/O when data is already in the *SHARED* buffer cache!
> >> > which plan9 does not have (except the mntcache, but that is
> >> > optional and only works for the disk fileservers that maintain
> >> > ther file qid ver info consistently). its *IS* really a linux
> >> > thing where all block device i/o goes thru the buffer cache.
> >> >
> >> > --
> >> > cinap
> >> >
> >>
> >>
> >
>
>

--000000000000ad90b3057833bc0d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">OK, that makes sense. So it would not stop a client from f=
or example first read an index block in a B-tree, wait for the result, and =
then issue read operations for all the data blocks in parallel. That&#39;s =
exactly the same as any asynchronous disk subsystem I am acquainted with. R=
eordering is the norm.<br></div><br><div class=3D"gmail_quote"><div dir=3D"=
ltr">On Sun, Oct 14, 2018 at 1:21 PM hiro &lt;<a href=3D"mailto:23hiro@gmai=
l.com">23hiro@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">there&#39;s no tyranny involved.<br>
<br>
a client that is fine with the *responses* coming in reordered could<br>
remember the tag obviously and do whatever you imagine.<br>
<br>
the problem is potential reordering of the messages in the kernel<br>
before responding, even if the 9p transport has guaranteed ordering.<br>
<br>
On 10/14/18, Ole-Hjalmar Kristensen &lt;<a href=3D"mailto:ole.hjalmar.krist=
ensen@gmail.com" target=3D"_blank">ole.hjalmar.kristensen@gmail.com</a>&gt;=
 wrote:<br>
&gt; I&#39;m not going to argue with someone who has got his hands dirty by=
 actually<br>
&gt; doing this but I don&#39;t really get this about the tyranny of 9p. Is=
n&#39;t the<br>
&gt; point of the tag field to identify the request? What is stopping the c=
lient<br>
&gt; from issuing multiple requests and match the replies based on the tag?=
 From<br>
&gt; the manual:<br>
&gt;<br>
&gt; Each T-message has a tag field, chosen and used by the<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0client to identify the message=
.=C2=A0 The reply to the message<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0will have the same tag.=C2=A0 =
Clients must arrange that no two<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0outstanding messages on the sa=
me connection have the same<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tag.=C2=A0 An exception is the=
 tag NOTAG, defined as (ushort)~0<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0in &lt;fcall.h&gt;: the client=
 can use it, when establishing a<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0connection, to override tag ma=
tching in version messages.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion &lt;<a href=3D"mai=
lto:sstallion@gmail.com" target=3D"_blank">sstallion@gmail.com</a>&gt;:<br>
&gt;<br>
&gt;&gt; As the guy who wrote the majority of the code that pushed those 1M=
 4K<br>
&gt;&gt; random IOPS erik mentioned, this thread annoys the shit out of me.=
 You<br>
&gt;&gt; don&#39;t get an award for writing a driver. In fact, it&#39;s pro=
bably better<br>
&gt;&gt; not to be known at all considering the bloody murder one has to co=
mmit<br>
&gt;&gt; to marry hardware and software together.<br>
&gt;&gt;<br>
&gt;&gt; Let&#39;s be frank, the I/O handling in the kernel is anachronisti=
c. To<br>
&gt;&gt; hit those rates, I had to add support for asynchronous and vectore=
d<br>
&gt;&gt; I/O not to mention a sizable bit of work by a co-worker to properl=
y<br>
&gt;&gt; handle NUMA on our appliances to hit those speeds. As I recall, we=
 had<br>
&gt;&gt; to rewrite the scheduler and re-implement locking, which even Char=
les<br>
&gt;&gt; Forsyth had a hand in. Had we the time and resources to implement<=
br>
&gt;&gt; something like zero-copy we&#39;d have done it in a heartbeat.<br>
&gt;&gt;<br>
&gt;&gt; In the end, it doesn&#39;t matter how &quot;fast&quot; a storage d=
river is in Plan 9<br>
&gt;&gt; - as soon as you put a 9P-based filesystem on it, it&#39;s going t=
o be<br>
&gt;&gt; limited to a single outstanding operation. This is the tyranny of =
9P.<br>
&gt;&gt; We (Coraid) got around this by avoiding filesystems altogether.<br=
>
&gt;&gt;<br>
&gt;&gt; Go solve that problem first.<br>
&gt;&gt; On Wed, Oct 10, 2018 at 12:36 PM &lt;<a href=3D"mailto:cinap_lenre=
k@felloff.net" target=3D"_blank">cinap_lenrek@felloff.net</a>&gt; wrote:<br=
>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; &gt; But the reason I want this is to reduce latency to the f=
irst<br>
&gt;&gt; &gt; &gt; access, especially for very large files. With read() I h=
ave<br>
&gt;&gt; &gt; &gt; to wait until the read completes. With mmap() processing=
 can<br>
&gt;&gt; &gt; &gt; start much earlier and can be interleaved with backgroun=
d<br>
&gt;&gt; &gt; &gt; data fetch or prefetch. With read() a lot more resources=
<br>
&gt;&gt; &gt; &gt; are tied down. If I need random access and don&#39;t nee=
d to<br>
&gt;&gt; &gt; &gt; read all of the data, the application has to do pread(),=
<br>
&gt;&gt; &gt; &gt; pwrite() a lot thus complicating it. With mmap() I can j=
ust<br>
&gt;&gt; &gt; &gt; map in the whole file and excess reading (beyond what th=
e<br>
&gt;&gt; &gt; &gt; app needs) will not be a large fraction.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; you think doing single 4K page sized reads in the pagefault<b=
r>
&gt;&gt; &gt; handler is better than doing precise &gt;4K reads from your<b=
r>
&gt;&gt; &gt; application? possibly in a background thread so you can<br>
&gt;&gt; &gt; overlap processing with data fetching?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; the advantage of mmap is not prefetch. its about not to do<br=
>
&gt;&gt; &gt; any I/O when data is already in the *SHARED* buffer cache!<br=
>
&gt;&gt; &gt; which plan9 does not have (except the mntcache, but that is<b=
r>
&gt;&gt; &gt; optional and only works for the disk fileservers that maintai=
n<br>
&gt;&gt; &gt; ther file qid ver info consistently). its *IS* really a linux=
<br>
&gt;&gt; &gt; thing where all block device i/o goes thru the buffer cache.<=
br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; --<br>
&gt;&gt; &gt; cinap<br>
&gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
<br>
</blockquote></div>

--000000000000ad90b3057833bc0d--