OK, that makes sense. So it would not stop a client from for example first read an index block in a B-tree, wait for the result, and then issue read operations for all the data blocks in parallel. That's exactly the same as any asynchronous disk subsystem I am acquainted with. Reordering is the norm. On Sun, Oct 14, 2018 at 1:21 PM hiro <23hiro@gmail.com> wrote: > there's no tyranny involved. > > a client that is fine with the *responses* coming in reordered could > remember the tag obviously and do whatever you imagine. > > the problem is potential reordering of the messages in the kernel > before responding, even if the 9p transport has guaranteed ordering. > > On 10/14/18, Ole-Hjalmar Kristensen > wrote: > > I'm not going to argue with someone who has got his hands dirty by > actually > > doing this but I don't really get this about the tyranny of 9p. Isn't the > > point of the tag field to identify the request? What is stopping the > client > > from issuing multiple requests and match the replies based on the tag? > From > > the manual: > > > > Each T-message has a tag field, chosen and used by the > > client to identify the message. The reply to the message > > will have the same tag. Clients must arrange that no two > > outstanding messages on the same connection have the same > > tag. An exception is the tag NOTAG, defined as (ushort)~0 > > in : the client can use it, when establishing a > > connection, to override tag matching in version messages. > > > > > > > > Den ons. 10. okt. 2018, 23.56 skrev Steven Stallion >: > > > >> As the guy who wrote the majority of the code that pushed those 1M 4K > >> random IOPS erik mentioned, this thread annoys the shit out of me. You > >> don't get an award for writing a driver. In fact, it's probably better > >> not to be known at all considering the bloody murder one has to commit > >> to marry hardware and software together. > >> > >> Let's be frank, the I/O handling in the kernel is anachronistic. To > >> hit those rates, I had to add support for asynchronous and vectored > >> I/O not to mention a sizable bit of work by a co-worker to properly > >> handle NUMA on our appliances to hit those speeds. As I recall, we had > >> to rewrite the scheduler and re-implement locking, which even Charles > >> Forsyth had a hand in. Had we the time and resources to implement > >> something like zero-copy we'd have done it in a heartbeat. > >> > >> In the end, it doesn't matter how "fast" a storage driver is in Plan 9 > >> - as soon as you put a 9P-based filesystem on it, it's going to be > >> limited to a single outstanding operation. This is the tyranny of 9P. > >> We (Coraid) got around this by avoiding filesystems altogether. > >> > >> Go solve that problem first. > >> On Wed, Oct 10, 2018 at 12:36 PM wrote: > >> > > >> > > But the reason I want this is to reduce latency to the first > >> > > access, especially for very large files. With read() I have > >> > > to wait until the read completes. With mmap() processing can > >> > > start much earlier and can be interleaved with background > >> > > data fetch or prefetch. With read() a lot more resources > >> > > are tied down. If I need random access and don't need to > >> > > read all of the data, the application has to do pread(), > >> > > pwrite() a lot thus complicating it. With mmap() I can just > >> > > map in the whole file and excess reading (beyond what the > >> > > app needs) will not be a large fraction. > >> > > >> > you think doing single 4K page sized reads in the pagefault > >> > handler is better than doing precise >4K reads from your > >> > application? possibly in a background thread so you can > >> > overlap processing with data fetching? > >> > > >> > the advantage of mmap is not prefetch. its about not to do > >> > any I/O when data is already in the *SHARED* buffer cache! > >> > which plan9 does not have (except the mntcache, but that is > >> > optional and only works for the disk fileservers that maintain > >> > ther file qid ver info consistently). its *IS* really a linux > >> > thing where all block device i/o goes thru the buffer cache. > >> > > >> > -- > >> > cinap > >> > > >> > >> > > > >