From mboxrd@z Thu Jan  1 00:00:00 1970
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
In-reply-to: Your message of "Thu, 29 Apr 2010 13:23:00 EDT."
	<fc79826ac40b633c895201470733cf2f@kw.quanstro.net>
References: <5fa9fbfe115a9cd5a81d0feefe413192@quintile.net>
	<Pine.BSI.4.64.1004280750270.14121@malasada.lava.net>
	<4fa1305e0f56a0ef89c2e05320fa5997@coraid.com>
	<t2y13426df11004281142j3c735545k690dcfae6166cd4d@mail.gmail.com>
	<z2gdf49a7371004290540g29060151k700bac13f78f34dc@mail.gmail.com>
	<t2j3e1162e61004290554y3a1ea10br5f6ec89cc46ad4c1@mail.gmail.com>
	<40cf59cfc2735e232f0fd67df725e65d@kw.quanstro.net>
	<w2w3e1162e61004290806ub44f84cczcc728dc93ecbfe11@mail.gmail.com>
	<046b9c874815d108c6ffb7a857e847a7@kw.quanstro.net>
	<20100429170859.2EBE05B78@mail.bitblocks.com>
	<fc79826ac40b633c895201470733cf2f@kw.quanstro.net>
From: Bakul Shah <bakul+plan9@bitblocks.com>
Date: Thu, 29 Apr 2010 20:47:37 -0700
Message-Id: <20100430034737.577235B81@mail.bitblocks.com>
Subject: Re: [9fans] A simple experiment
Topicbox-Message-UUID: 143fd174-ead6-11e9-9d60-3106f5b1d025

On Thu, 29 Apr 2010 13:23:00 EDT erik quanstrom <quanstro@quanstro.net>  wrote:
> > > 9p, like aoe, is a ping-pong protocol.  each message requires an ack.
> > > therefore, the transport layer doesn't need flow control.
> >
> > Therefore, it is also not able to utilise bandwidth
> > effectively over longhaul links. As an example, US coast
> > to coast round trip time latency is about 100ms. Now consider
> > fcp. Each worker thread of fcp does 8K read/writes. Due to
> > pingponging, the *most* a thread can xfer coast to coast is
> > 80KBps (for 16 threads, 1.28MBps).  It is actually much worse
> > since each thread doesn't even overlap reads with writes.
>
> i think you are conflating a ping-pong protocol with the
> limitation of a single outstanding message.  there is no
> reason that one needs a 1:1 correspondence between threads
> and outstanding messages.  nor does the protocol inherently
> prohibit 1 write from becoming multiple messages.

What I am saying is that RPC (which is how 9p is mostly used)
has this inherent performance limitation due to latency.  To
get around that if you use multiple outstanding messages
(regardless of one per thread or many per thread), you *will*
require flow control or you run into problems -- imagine
having thousands of outstanding messages. Or imagine having a
low bandwidth link for fcp.  Fcp will hog the link, to the
detriment of other programs.  So on the high end fcp
throughput is limited 1.28MBps, on the low end it will hog
the link completey (unless your kernel implements some sort
of fair queuing)!  If you want to make optimum use of the
available bandwidth while adjusting to changing conditions
and not competely clog up the pipe, you need a feedback
mechanism and window opening algorithm ALA TCP or something.

And anyway why do all this in fcp or every other program that
needs streaming?  It should be factored out.

> > Short of a sliding window that is as large as the capacity of
> > the pipe, you can't expect to keep it full.  As usual one has
> > to trade off simplicity vs performance.
>
> i don't think so.  the only thing a sliding window gives you
> is < 1 ack per message sent.

If the sender->receiver pipe can hold N bytes and the sender
is streaming (that is, keeping the pipe full), the sender
*will* be ahead of the receiver by N bytes. So a *streaming*
protcol has to allow it to be N bytes ahead.

Even if you have one ack per message, in a sliding window of
N messages (or bytes), the sender is allowed to get ahead of
the receiver by upto N more messages (or bytes). Here I am
not concerned about in-order delivery (though typically
people assume this when they talk about sliding windows).

If you assume in-order delivery you can coalesce multiple
acks but that can be seen as an ack optimization (but then
either you throw away out of order deliveries or have to add
selective ack or something).