From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
References: <F022808EFD5F0B8A438B332081972DD2@felloff.net>
	<CAGGHmKGPVcmq2XkYe21rmPMf2JjdYbUn4GgjanMrQEWg0TW41A@mail.gmail.com>
In-Reply-To: <CAGGHmKGPVcmq2XkYe21rmPMf2JjdYbUn4GgjanMrQEWg0TW41A@mail.gmail.com>
From: Skip Tavakkolian <skip.tavakkolian@gmail.com>
Date: Wed, 10 Oct 2018 17:26:55 -0700
Message-ID: <CAJSxfm+0Jkfkny2aj1bQRFmxjeW5XymNZy=X7Pmb7PqcpNohZQ@mail.gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: multipart/alternative; boundary="000000000000e87e560577e9077f"
Subject: Re: [9fans] PDP11 (Was: Re: what heavy negativity!)
Topicbox-Message-UUID: eba335f4-ead9-11e9-9d60-3106f5b1d025

--000000000000e87e560577e9077f
Content-Type: text/plain; charset="UTF-8"

For operations that matter in this context (read, write), there can be
multiple outstanding tags. A while back rsc implemented fcp, partly to
prove this point.

On Wed, Oct 10, 2018 at 2:54 PM Steven Stallion <sstallion@gmail.com> wrote:

> As the guy who wrote the majority of the code that pushed those 1M 4K
> random IOPS erik mentioned, this thread annoys the shit out of me. You
> don't get an award for writing a driver. In fact, it's probably better
> not to be known at all considering the bloody murder one has to commit
> to marry hardware and software together.
>
> Let's be frank, the I/O handling in the kernel is anachronistic. To
> hit those rates, I had to add support for asynchronous and vectored
> I/O not to mention a sizable bit of work by a co-worker to properly
> handle NUMA on our appliances to hit those speeds. As I recall, we had
> to rewrite the scheduler and re-implement locking, which even Charles
> Forsyth had a hand in. Had we the time and resources to implement
> something like zero-copy we'd have done it in a heartbeat.
>
> In the end, it doesn't matter how "fast" a storage driver is in Plan 9
> - as soon as you put a 9P-based filesystem on it, it's going to be
> limited to a single outstanding operation. This is the tyranny of 9P.
> We (Coraid) got around this by avoiding filesystems altogether.
>
> Go solve that problem first.
> On Wed, Oct 10, 2018 at 12:36 PM <cinap_lenrek@felloff.net> wrote:
> >
> > > But the reason I want this is to reduce latency to the first
> > > access, especially for very large files. With read() I have
> > > to wait until the read completes. With mmap() processing can
> > > start much earlier and can be interleaved with background
> > > data fetch or prefetch. With read() a lot more resources
> > > are tied down. If I need random access and don't need to
> > > read all of the data, the application has to do pread(),
> > > pwrite() a lot thus complicating it. With mmap() I can just
> > > map in the whole file and excess reading (beyond what the
> > > app needs) will not be a large fraction.
> >
> > you think doing single 4K page sized reads in the pagefault
> > handler is better than doing precise >4K reads from your
> > application? possibly in a background thread so you can
> > overlap processing with data fetching?
> >
> > the advantage of mmap is not prefetch. its about not to do
> > any I/O when data is already in the *SHARED* buffer cache!
> > which plan9 does not have (except the mntcache, but that is
> > optional and only works for the disk fileservers that maintain
> > ther file qid ver info consistently). its *IS* really a linux
> > thing where all block device i/o goes thru the buffer cache.
> >
> > --
> > cinap
> >
>
>

--000000000000e87e560577e9077f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">For operations that matter in this context (read, write), =
there can be multiple outstanding tags. A while back rsc implemented fcp, p=
artly to prove this point.</div><br><div class=3D"gmail_quote"><div dir=3D"=
ltr">On Wed, Oct 10, 2018 at 2:54 PM Steven Stallion &lt;<a href=3D"mailto:=
sstallion@gmail.com">sstallion@gmail.com</a>&gt; wrote:<br></div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">As the guy who wrote the majority of the code that pus=
hed those 1M 4K<br>
random IOPS erik mentioned, this thread annoys the shit out of me. You<br>
don&#39;t get an award for writing a driver. In fact, it&#39;s probably bet=
ter<br>
not to be known at all considering the bloody murder one has to commit<br>
to marry hardware and software together.<br>
<br>
Let&#39;s be frank, the I/O handling in the kernel is anachronistic. To<br>
hit those rates, I had to add support for asynchronous and vectored<br>
I/O not to mention a sizable bit of work by a co-worker to properly<br>
handle NUMA on our appliances to hit those speeds. As I recall, we had<br>
to rewrite the scheduler and re-implement locking, which even Charles<br>
Forsyth had a hand in. Had we the time and resources to implement<br>
something like zero-copy we&#39;d have done it in a heartbeat.<br>
<br>
In the end, it doesn&#39;t matter how &quot;fast&quot; a storage driver is =
in Plan 9<br>
- as soon as you put a 9P-based filesystem on it, it&#39;s going to be<br>
limited to a single outstanding operation. This is the tyranny of 9P.<br>
We (Coraid) got around this by avoiding filesystems altogether.<br>
<br>
Go solve that problem first.<br>
On Wed, Oct 10, 2018 at 12:36 PM &lt;<a href=3D"mailto:cinap_lenrek@felloff=
.net" target=3D"_blank">cinap_lenrek@felloff.net</a>&gt; wrote:<br>
&gt;<br>
&gt; &gt; But the reason I want this is to reduce latency to the first<br>
&gt; &gt; access, especially for very large files. With read() I have<br>
&gt; &gt; to wait until the read completes. With mmap() processing can<br>
&gt; &gt; start much earlier and can be interleaved with background<br>
&gt; &gt; data fetch or prefetch. With read() a lot more resources<br>
&gt; &gt; are tied down. If I need random access and don&#39;t need to<br>
&gt; &gt; read all of the data, the application has to do pread(),<br>
&gt; &gt; pwrite() a lot thus complicating it. With mmap() I can just<br>
&gt; &gt; map in the whole file and excess reading (beyond what the<br>
&gt; &gt; app needs) will not be a large fraction.<br>
&gt;<br>
&gt; you think doing single 4K page sized reads in the pagefault<br>
&gt; handler is better than doing precise &gt;4K reads from your<br>
&gt; application? possibly in a background thread so you can<br>
&gt; overlap processing with data fetching?<br>
&gt;<br>
&gt; the advantage of mmap is not prefetch. its about not to do<br>
&gt; any I/O when data is already in the *SHARED* buffer cache!<br>
&gt; which plan9 does not have (except the mntcache, but that is<br>
&gt; optional and only works for the disk fileservers that maintain<br>
&gt; ther file qid ver info consistently). its *IS* really a linux<br>
&gt; thing where all block device i/o goes thru the buffer cache.<br>
&gt;<br>
&gt; --<br>
&gt; cinap<br>
&gt;<br>
<br>
</blockquote></div>

--000000000000e87e560577e9077f--