From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 References: In-Reply-To: From: Skip Tavakkolian Date: Wed, 10 Oct 2018 17:26:55 -0700 Message-ID: To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: multipart/alternative; boundary="000000000000e87e560577e9077f" Subject: Re: [9fans] PDP11 (Was: Re: what heavy negativity!) Topicbox-Message-UUID: eba335f4-ead9-11e9-9d60-3106f5b1d025 --000000000000e87e560577e9077f Content-Type: text/plain; charset="UTF-8" For operations that matter in this context (read, write), there can be multiple outstanding tags. A while back rsc implemented fcp, partly to prove this point. On Wed, Oct 10, 2018 at 2:54 PM Steven Stallion wrote: > As the guy who wrote the majority of the code that pushed those 1M 4K > random IOPS erik mentioned, this thread annoys the shit out of me. You > don't get an award for writing a driver. In fact, it's probably better > not to be known at all considering the bloody murder one has to commit > to marry hardware and software together. > > Let's be frank, the I/O handling in the kernel is anachronistic. To > hit those rates, I had to add support for asynchronous and vectored > I/O not to mention a sizable bit of work by a co-worker to properly > handle NUMA on our appliances to hit those speeds. As I recall, we had > to rewrite the scheduler and re-implement locking, which even Charles > Forsyth had a hand in. Had we the time and resources to implement > something like zero-copy we'd have done it in a heartbeat. > > In the end, it doesn't matter how "fast" a storage driver is in Plan 9 > - as soon as you put a 9P-based filesystem on it, it's going to be > limited to a single outstanding operation. This is the tyranny of 9P. > We (Coraid) got around this by avoiding filesystems altogether. > > Go solve that problem first. > On Wed, Oct 10, 2018 at 12:36 PM wrote: > > > > > But the reason I want this is to reduce latency to the first > > > access, especially for very large files. With read() I have > > > to wait until the read completes. With mmap() processing can > > > start much earlier and can be interleaved with background > > > data fetch or prefetch. With read() a lot more resources > > > are tied down. If I need random access and don't need to > > > read all of the data, the application has to do pread(), > > > pwrite() a lot thus complicating it. With mmap() I can just > > > map in the whole file and excess reading (beyond what the > > > app needs) will not be a large fraction. > > > > you think doing single 4K page sized reads in the pagefault > > handler is better than doing precise >4K reads from your > > application? possibly in a background thread so you can > > overlap processing with data fetching? > > > > the advantage of mmap is not prefetch. its about not to do > > any I/O when data is already in the *SHARED* buffer cache! > > which plan9 does not have (except the mntcache, but that is > > optional and only works for the disk fileservers that maintain > > ther file qid ver info consistently). its *IS* really a linux > > thing where all block device i/o goes thru the buffer cache. > > > > -- > > cinap > > > > --000000000000e87e560577e9077f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
For operations that matter in this context (read, write), = there can be multiple outstanding tags. A while back rsc implemented fcp, p= artly to prove this point.

On Wed, Oct 10, 2018 at 2:54 PM Steven Stallion <sstallion@gmail.com> wrote:
As the guy who wrote the majority of the code that pus= hed those 1M 4K
random IOPS erik mentioned, this thread annoys the shit out of me. You
don't get an award for writing a driver. In fact, it's probably bet= ter
not to be known at all considering the bloody murder one has to commit
to marry hardware and software together.

Let's be frank, the I/O handling in the kernel is anachronistic. To
hit those rates, I had to add support for asynchronous and vectored
I/O not to mention a sizable bit of work by a co-worker to properly
handle NUMA on our appliances to hit those speeds. As I recall, we had
to rewrite the scheduler and re-implement locking, which even Charles
Forsyth had a hand in. Had we the time and resources to implement
something like zero-copy we'd have done it in a heartbeat.

In the end, it doesn't matter how "fast" a storage driver is = in Plan 9
- as soon as you put a 9P-based filesystem on it, it's going to be
limited to a single outstanding operation. This is the tyranny of 9P.
We (Coraid) got around this by avoiding filesystems altogether.

Go solve that problem first.
On Wed, Oct 10, 2018 at 12:36 PM <cinap_lenrek@felloff.net> wrote:
>
> > But the reason I want this is to reduce latency to the first
> > access, especially for very large files. With read() I have
> > to wait until the read completes. With mmap() processing can
> > start much earlier and can be interleaved with background
> > data fetch or prefetch. With read() a lot more resources
> > are tied down. If I need random access and don't need to
> > read all of the data, the application has to do pread(),
> > pwrite() a lot thus complicating it. With mmap() I can just
> > map in the whole file and excess reading (beyond what the
> > app needs) will not be a large fraction.
>
> you think doing single 4K page sized reads in the pagefault
> handler is better than doing precise >4K reads from your
> application? possibly in a background thread so you can
> overlap processing with data fetching?
>
> the advantage of mmap is not prefetch. its about not to do
> any I/O when data is already in the *SHARED* buffer cache!
> which plan9 does not have (except the mntcache, but that is
> optional and only works for the disk fileservers that maintain
> ther file qid ver info consistently). its *IS* really a linux
> thing where all block device i/o goes thru the buffer cache.
>
> --
> cinap
>

--000000000000e87e560577e9077f--