9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Bakul Shah <bakul@bitblocks.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: [9fans] zero copy & 9p (was Re: PDP11 (Was: Re: what heavy negativity!)
Date: Wed, 10 Oct 2018 15:26:04 -0700	[thread overview]
Message-ID: <B48DC5FB-1125-48A4-A685-BBB7854F305C@bitblocks.com> (raw)
In-Reply-To: <CAGGHmKGPVcmq2XkYe21rmPMf2JjdYbUn4GgjanMrQEWg0TW41A@mail.gmail.com>

Excellent response! Just what I was hoping for!

On Oct 10, 2018, at 2:54 PM, Steven Stallion <sstallion@gmail.com> wrote:
>
> As the guy who wrote the majority of the code that pushed those 1M 4K
> random IOPS erik mentioned, this thread annoys the shit out of me. You
> don't get an award for writing a driver. In fact, it's probably better
> not to be known at all considering the bloody murder one has to commit
> to marry hardware and software together.
>
> Let's be frank, the I/O handling in the kernel is anachronistic. To
> hit those rates, I had to add support for asynchronous and vectored
> I/O not to mention a sizable bit of work by a co-worker to properly
> handle NUMA on our appliances to hit those speeds. As I recall, we had
> to rewrite the scheduler and re-implement locking, which even Charles
> Forsyth had a hand in. Had we the time and resources to implement
> something like zero-copy we'd have done it in a heartbeat.
>
> In the end, it doesn't matter how "fast" a storage driver is in Plan 9
> - as soon as you put a 9P-based filesystem on it, it's going to be
> limited to a single outstanding operation. This is the tyranny of 9P.
> We (Coraid) got around this by avoiding filesystems altogether.
>
> Go solve that problem first.

You seem to be saying zero-copy wouldn't buy anything until these
other problems are solved, right?

Suppose you could replace 9p based FS with something of your choice.
Would it have made your jobs easier? Code less grotty? In other
words, is the complexity of the driver to achieve high throughput
due to the complexity of hardware or is it due to 9p's RPC model?
For streaming data you pretty much have to have some sort of
windowing protocol (data prefetch or write behind with mmap is a
similar thing).

Looks like people who have worked on the plan9 kernel have learned
a lot of lessons and have a lot of good advice to offer. I'd love
to learn from that. Except usually I rarely see anyone criticizing
plan9.


> On Wed, Oct 10, 2018 at 12:36 PM <cinap_lenrek@felloff.net> wrote:
>>
>>> But the reason I want this is to reduce latency to the first
>>> access, especially for very large files. With read() I have
>>> to wait until the read completes. With mmap() processing can
>>> start much earlier and can be interleaved with background
>>> data fetch or prefetch. With read() a lot more resources
>>> are tied down. If I need random access and don't need to
>>> read all of the data, the application has to do pread(),
>>> pwrite() a lot thus complicating it. With mmap() I can just
>>> map in the whole file and excess reading (beyond what the
>>> app needs) will not be a large fraction.
>>
>> you think doing single 4K page sized reads in the pagefault
>> handler is better than doing precise >4K reads from your
>> application? possibly in a background thread so you can
>> overlap processing with data fetching?
>>
>> the advantage of mmap is not prefetch. its about not to do
>> any I/O when data is already in the *SHARED* buffer cache!
>> which plan9 does not have (except the mntcache, but that is
>> optional and only works for the disk fileservers that maintain
>> ther file qid ver info consistently). its *IS* really a linux
>> thing where all block device i/o goes thru the buffer cache.
>>
>> --
>> cinap
>>
>




  reply	other threads:[~2018-10-10 22:26 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10 17:34 [9fans] " cinap_lenrek
2018-10-10 21:54 ` Steven Stallion
2018-10-10 22:26   ` Bakul Shah [this message]
2018-10-10 22:52     ` [9fans] zero copy & 9p (was " Steven Stallion
2018-10-11 20:43     ` Lyndon Nerenberg
2018-10-11 22:28       ` hiro
2018-10-12  6:04       ` Ori Bernstein
2018-10-13 18:01         ` Charles Forsyth
2018-10-13 21:11           ` hiro
2018-10-14  5:25             ` FJ Ballesteros
2018-10-14  7:34               ` hiro
2018-10-14  7:38                 ` Francisco J Ballesteros
2018-10-14  8:00                   ` hiro
2018-10-15 16:48                     ` Charles Forsyth
2018-10-15 17:01                       ` hiro
2018-10-15 17:29                       ` hiro
2018-10-15 23:06                         ` Charles Forsyth
2018-10-16  0:09                       ` erik quanstrom
2018-10-17 18:14                       ` Charles Forsyth
2018-10-10 22:29   ` [9fans] " Kurt H Maier
2018-10-10 22:55     ` Steven Stallion
2018-10-11 11:19       ` Aram Hăvărneanu
2018-10-11  0:26   ` Skip Tavakkolian
2018-10-11  1:03     ` Steven Stallion
2018-10-14  9:46   ` Ole-Hjalmar Kristensen
2018-10-14 10:37     ` hiro
2018-10-14 17:34       ` Ole-Hjalmar Kristensen
2018-10-14 19:17         ` hiro
2018-10-15  9:29         ` Giacomo Tesio
2018-10-10 23:58 [9fans] zero copy & 9p (was " cinap_lenrek
2018-10-11  0:56 ` Dan Cross
2018-10-11  2:26   ` Steven Stallion
2018-10-11  2:30   ` Bakul Shah
2018-10-11  3:20     ` Steven Stallion

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B48DC5FB-1125-48A4-A685-BBB7854F305C@bitblocks.com \
    --to=bakul@bitblocks.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).