9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] Nagle algorithm
@ 2001-11-22 13:24 forsyth
  2001-11-22 13:29 ` Boyd Roberts
  2001-11-23  9:34 ` Thomas Bushnell, BSG
  0 siblings, 2 replies; 37+ messages in thread
From: forsyth @ 2001-11-22 13:24 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

i think it's fundamentally misconceived and should at least never be
on by default.  dhog's description was ``[it] attempts to buffer up
multiple writes and send them as a single TCP packet -- where the
writes are small''.  what that actually means is that if the writes
are small it typically delays sending out anything at all (subject to
certain specific conditions) until a given interval has elapsed or
its idea of enough data has arrived, and that interval can be large.  in other
words, give it a few bytes you wish to send now and it demands more,
delaying if that demand is not met.  it's not just RPC that's affected.

i claim it's not the TCP/IP subsystem's responsibility to delay
sending something so that it can buffer up writes to make larger
packets.  that's for stdio or bio (or local OS equivalent).  (an
interesting variant was in a version of the Unix streams subsystem
where there was a buffering module that could be pushed onto a tcp/ip
stream.)  it's fine for TCP/IP to adapt to the network and adjust the
rate at which bytes are shuffled in and out, and adapt retransmission
efforts to apparent error rates, and since TCP/IP doesn't preserve
delimiters coalesce new data (from writes large or small) into already
queued data to make larger output packets.  those are all within its
responsibility, i think, and any delays that result are arguably
inherent in the network.

if applications are stuttering bytes needlessly, deal with it there.


[-- Attachment #2: Type: message/rfc822, Size: 1994 bytes --]

To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Thu, 22 Nov 2001 09:57:11 GMT
Message-ID: <87itc31syv.fsf@becket.becket.net>

andrew@mbmnz.co.nz (Andrew Simmons) writes:

> That makes sense, thanks. Not a problem for my application, but it's good
> to be aware of a potential problem.

(It should be noted that the same thing proved true of X applications;
Nagle isn't just bad for Plan 9 RPC's; it's bad for *any* RPC
application.)

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-29 17:33 jmk
  0 siblings, 0 replies; 37+ messages in thread
From: jmk @ 2001-11-29 17:33 UTC (permalink / raw)
  To: 9fans

On Thu Nov 29 11:56:19 EST 2001, matt@proweb.co.uk wrote:
> XP tries tcp managing :
> http://theregister.co.uk/content/4/23090.html

http://www.theregister.co.uk/content/28/23081.html

is much more useful.


^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-27 11:28 forsyth
  0 siblings, 0 replies; 37+ messages in thread
From: forsyth @ 2001-11-27 11:28 UTC (permalink / raw)
  To: 9fans

>>nagle delay will do sod all.   as the rfc defines it, it isn't invoked,

sorry, that got posted before i could add this: ``provided the data continues
to arrive in a timely way, as is normal, because the MSS is often
not an exact multiple of the block size''.  actually monitoring various
protocols here i don't see much of a problem.



^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-27 11:00 forsyth
  0 siblings, 0 replies; 37+ messages in thread
From: forsyth @ 2001-11-27 11:00 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]

and that's what happened even with nagle enabled?

that's one reason i asked about the source of the ``vast majority''
of packets on the net.  if they are as you say, then i can't see nagle
being invoked by any of those applications.   they already do their
own buffering (i was looking at the code for them just now).
ftp for instance writes in blocks of 4k (it's fairly old code) and
others do better than that.  that's much larger than the
typical MSS (which is indeed rather smaller than 4k), so the
nagle delay will do sod all.   as the rfc defines it, it isn't invoked,
except for the FTP requests, which are small,
and then all it does is delay them pointlessly, since nothing
will be added to those!

there is buffering in the tcp/ip implementation,
so concurrency isn't affected by the mismatch
between MSS and write buffer size.   (ie, ftp can dump its 4k and read
the next block of the file whilst the previous ones are going out.)
same for http.  does apache dribble the bytes out in titchy quantities?

the second point is that the MSS provided by TCP/IP, if based only on
the host interface's MTU configuration, isn't sufficient anyway.  path
MTU discovery could be used i suppose except that it was added in retrospect,
relies too much on routers, gateways and firewalls all behaving well,
all the way there, and even then there are non-trivial problems
with it and tcp/ip.


[-- Attachment #2: Type: message/rfc822, Size: 1925 bytes --]

To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Tue, 27 Nov 2001 10:16:18 GMT
Message-ID: <87g070wwg7.fsf@becket.becket.net>

cross@math.psu.edu (Dan Cross) writes:

> If you wanted to buffer 200 ``a's'' before sending them, why not do that
> explicitly?

Because the only way to know the correct number to buffer involves
knowing transport level information like the MSS and so forth.

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-27  5:31 David Gordon Hogan
  0 siblings, 0 replies; 37+ messages in thread
From: David Gordon Hogan @ 2001-11-27  5:31 UTC (permalink / raw)
  To: 9fans

> > That makes sense, thanks. Not a problem for my application, but it's good
> > to be aware of a potential problem.
>
> (It should be noted that the same thing proved true of X applications;
> Nagle isn't just bad for Plan 9 RPC's; it's bad for *any* RPC
> application.)

Sorry to quibble...  technically, the X protocol isn't an RPC
protocol.  Events and errors are asynchronous; many of
the requests do not require a response.  There are some
so-called ``round trip requests'' which is where the RPC
behaviour comes in.

One such request is the one that fetches a property from
a window; this accounts for the often excrutiatingly slow
startup times for X applications, which have to communicate
with the window manager through a whole slew of XGetProperty
requests and their ilk.  Motif is particularly bad in this regard: the
ICCCM wasn't complex enough for them, they added more
properties, which basically only MWM knows about.

But, anyway, X was deliberately engineered to be as
insensitive to round trip delays as the designers thought
possible, by the liberal use of its asynchronous model.  Plan
9 achieves much the same effect by allowing multiple draw(3)
requests to be sent in a single write.  They are buffered up by
the draw library.  I run rio and acme over my cpu or drawterm
connection from home; they are quite snappy.



^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-26 12:07 Fco.J.Ballesteros
  0 siblings, 0 replies; 37+ messages in thread
From: Fco.J.Ballesteros @ 2001-11-26 12:07 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

It just doesn't know. You'll have to choose a reasonable
value.

The difference is that the tcp code would just
send the message; without trying to guess what the
application needs. For example, your shell may
be happy using `line buffering' instead of the probably
big buffer your cat would use instead. A debugger may
use no buffer at all.


[-- Attachment #2: Type: message/rfc822, Size: 1866 bytes --]

From: smd@clock.org (Sean M. Doran)
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Mon, 26 Nov 2001 03:23:12 -0800 (PST)
Message-ID: <20011126112312.80AC9C7901@cesium.clock.org>

nigel@9fs.org writes:

| Depends on your implementation of write().

and Fco.J.Ballesteros <nemo@plan9.escet.urjc.es> writes:

| If that's a problem for your cat, you could convince
| your cat to use Bio; and retain your abstraction as well.

And how does your write(2) or wrapper around write(2)
know what the present maximum segment size is, given
that it can be altered at any time via a local interface
MTU change or a change in the path MTU?

How does pulling that into a wrapper which is used
almost universally fundamentally differ from retaining
in the TCP code with a system call which allows it to
be turned off on those rare occasions you want to do so?

	Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-26 11:52 Fco.J.Ballesteros
  0 siblings, 0 replies; 37+ messages in thread
From: Fco.J.Ballesteros @ 2001-11-26 11:52 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 113 bytes --]

If that's a problem for your cat, you could convince
your cat to use Bio; and retain your abstraction as well.

[-- Attachment #2: Type: message/rfc822, Size: 2277 bytes --]

From: "Sean M. Doran" <smd@cesium.clock.org>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Mon, 26 Nov 2001 12:07:28 +0100
Message-ID: <ytu1vhu6xb.fsf@cesium.clock.org>

forsyth@caldo.demon.co.uk writes:

> ...  Nagle's algorithm for coalescing short segments,
> which is the one that seems ill-advised to me

So, how many packets should the following code segment generate?

    #include <unistd.h>
    /* for int usleep(useconds_t microseconds); */

    for(i=0; i < 200; i++) {
             write(tcpfd, "a", 1);
             usleep(100);
    }

> i do find that quite a lot of this smacks of trying to
> compensate for inadequate data provided by the protocol or the network.

Abstraction and information-hiding are usually considered helpful.
Who wants to muck around with RTT and pMTU in an application
which just might happen to write to a TCP connection?
Perhaps cat(1) isn't complicated enough yet?

     Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-26 11:23 Sean M. Doran
  2001-11-26 19:28 ` Dan Cross
  0 siblings, 1 reply; 37+ messages in thread
From: Sean M. Doran @ 2001-11-26 11:23 UTC (permalink / raw)
  To: 9fans

nigel@9fs.org writes:

| Depends on your implementation of write().

and Fco.J.Ballesteros <nemo@plan9.escet.urjc.es> writes:

| If that's a problem for your cat, you could convince
| your cat to use Bio; and retain your abstraction as well.

And how does your write(2) or wrapper around write(2)
know what the present maximum segment size is, given
that it can be altered at any time via a local interface
MTU change or a change in the path MTU?

How does pulling that into a wrapper which is used
almost universally fundamentally differ from retaining
in the TCP code with a system call which allows it to
be turned off on those rare occasions you want to do so?

	Sean.


^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-26 11:09 nigel
  0 siblings, 0 replies; 37+ messages in thread
From: nigel @ 2001-11-26 11:09 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 46 bytes --]

Depends on your implementation of write().


[-- Attachment #2: Type: message/rfc822, Size: 2274 bytes --]

From: "Sean M. Doran" <smd@cesium.clock.org>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Mon, 26 Nov 2001 12:07:28 +0100
Message-ID: <ytu1vhu6xb.fsf@cesium.clock.org>

forsyth@caldo.demon.co.uk writes:

> ...  Nagle's algorithm for coalescing short segments,
> which is the one that seems ill-advised to me

So, how many packets should the following code segment generate?

    #include <unistd.h>
    /* for int usleep(useconds_t microseconds); */

    for(i=0; i < 200; i++) {
             write(tcpfd, "a", 1);
             usleep(100);
    }

> i do find that quite a lot of this smacks of trying to
> compensate for inadequate data provided by the protocol or the network.

Abstraction and information-hiding are usually considered helpful.
Who wants to muck around with RTT and pMTU in an application
which just might happen to write to a TCP connection?
Perhaps cat(1) isn't complicated enough yet?

     Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-24 10:48 forsyth
  0 siblings, 0 replies; 37+ messages in thread
From: forsyth @ 2001-11-24 10:48 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

not only that, but the Nagle algorithm explicitly ignores the PSH bit
anyway:

                 The Nagle algorithm is generally as follows:

                      If there is unacknowledged data (i.e., SND.NXT >
                      SND.UNA), then the sending TCP buffers all user
                      data (regardless of the PSH bit), until the
                      outstanding data has been acknowledged or until
                      the TCP can send a full-sized segment (Eff.snd.MSS
                      bytes; see Section 4.2.2.6).


[-- Attachment #2: Type: message/rfc822, Size: 1733 bytes --]

To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Sat, 24 Nov 2001 00:32:31 -0500
Message-ID: <20011124053233.2D3F019A57@mail.cse.psu.edu>

> > i claim it's not the TCP/IP subsystem's responsibility to delay
> > sending something so that it can buffer up writes to make larger
> > packets.  that's for stdio or bio (or local OS equivalent).
>
> If there was a way for the user say when to set TCP's PUSH bit,
> would that do the job?

Why would you want this?  I thought the above
was a very effective argument.

Russ

^ permalink raw reply	[flat|nested] 37+ messages in thread
[parent not found: <rsc@plan9.bell-labs.com>]
* Re: [9fans] Nagle algorithm
@ 2001-11-24  3:26 Scott Schwartz
  0 siblings, 0 replies; 37+ messages in thread
From: Scott Schwartz @ 2001-11-24  3:26 UTC (permalink / raw)
  To: 9fans

> i claim it's not the TCP/IP subsystem's responsibility to delay
> sending something so that it can buffer up writes to make larger
> packets.  that's for stdio or bio (or local OS equivalent).

If there was a way for the user say when to set TCP's PUSH bit,
would that do the job?



^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-23 11:58 forsyth
  2001-11-26 11:07 ` Sean M. Doran
  0 siblings, 1 reply; 37+ messages in thread
From: forsyth @ 2001-11-23 11:58 UTC (permalink / raw)
  To: 9fans

>>Well, this is overstated.  The Nagle algorithm (which we should be
>>calling "slow start") is actually hugely important on the net as a

rfc1122 correctly distinguishes `slow start' (by Van Jacobsen), which
again seems a reasonable responsibility for TCP/IP, from Nagle's algorithm
for coalescing short segments, which is the one that seems ill-advised
to me, however effective in benchmarks.  the same section of the rfc
discusses avoiding silly windows which also seems fine
to me, and perhaps that was due to Nagle too; i haven't checked
a copy of the original paper.  (mind you, checking the original is quite often
sensible because you discover that some idea has taken on
a life of its own out of the original context.)

i do find that quite a lot of this smacks of trying to
compensate for inadequate data provided by the protocol or the network.


^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-23  9:44 forsyth
  2001-11-26  9:59 ` Thomas Bushnell, BSG
  0 siblings, 1 reply; 37+ messages in thread
From: forsyth @ 2001-11-23  9:44 UTC (permalink / raw)
  To: 9fans

>>And, incidentally, it's not optional; the Host Requirements RFC's,
>>IIRC, require it.

no, it says SHOULD not MUST, and the justification isn't particularly
compelling.

>>But for the *vast* majority of IP packets, it's completely right.

and what would those packets be?


^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [9fans] Nagle algorithm
@ 2001-11-21 23:38 David Gordon Hogan
  2001-11-21 23:59 ` Andrew Simmons
  0 siblings, 1 reply; 37+ messages in thread
From: David Gordon Hogan @ 2001-11-21 23:38 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

It's inappropriate for RPC protocols.  Nagle's algorithm
attempts to buffer up multiple writes and send them as
a single TCP packet -- where the writes are small.  As
such, it's fine for telnet conversations where you don't
really want each keystroke generating a separate packet
if it can be avoided.  The mechansim is to delay the
transmission of small packets in the hope that more
data will show up.

Consider 9P over TCP.  Many of the messages are quite
small, and will be delayed.  9P is RPC based, so latency
is a big concern.  The sender of the small message is
typically blocked waiting for a response, so Nagle's
hope that more data will be sent is (in this case) invalid.
All is succeeds in doing is drastically increasing the
effective round trip time.

This was a big issue for drawterm.  Use of the mouse
in drawterm results in small Rread messages being
sent, in response to reads of /dev/mouse at the
other end.  These were being delayed by a significant
fraction of a second, resulting in really sluggish
interactive performance, until we figured out that
Nagle's algorithm was responsible and inserted the
appropriate setsockopt call into drawterm.


[-- Attachment #2: Type: message/rfc822, Size: 1835 bytes --]

From: Andrew Simmons <andrew@mbmnz.co.nz>
To: 9fans@cse.psu.edu
Subject: [9fans] Nagle algorithm
Date: Thu, 22 Nov 2001 12:20:50 +1300
Message-ID: <3.0.6.32.20011122122050.00974e88@pop3.clear.net.nz>

Slightly off-topic, but I seem to detect a certain hostility to the Nagle
algorithm in this group, whereas the conventional wisdom in the Winsock
world where I currently spend my days is that the Nagle algorithm should
almost never be disabled. What's wrong with Nagle? (and please, no cracks
about wisdom and Windows programming being mutually exclusive).

^ permalink raw reply	[flat|nested] 37+ messages in thread
* [9fans] Nagle algorithm
@ 2001-11-21 23:20 Andrew Simmons
  2001-11-26 10:57 ` Sean M. Doran
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Simmons @ 2001-11-21 23:20 UTC (permalink / raw)
  To: 9fans

Slightly off-topic, but I seem to detect a certain hostility to the Nagle
algorithm in this group, whereas the conventional wisdom in the Winsock
world where I currently spend my days is that the Nagle algorithm should
almost never be disabled. What's wrong with Nagle? (and please, no cracks
about wisdom and Windows programming being mutually exclusive).



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2001-11-29 17:33 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-22 13:24 [9fans] Nagle algorithm forsyth
2001-11-22 13:29 ` Boyd Roberts
2001-11-23  9:34 ` Thomas Bushnell, BSG
2001-11-26 19:13   ` Dan Cross
  -- strict thread matches above, loose matches on Subject: below --
2001-11-29 17:33 jmk
2001-11-27 11:28 forsyth
2001-11-27 11:00 forsyth
2001-11-27  5:31 David Gordon Hogan
2001-11-26 12:07 Fco.J.Ballesteros
2001-11-26 11:52 Fco.J.Ballesteros
2001-11-26 11:23 Sean M. Doran
2001-11-26 19:28 ` Dan Cross
2001-11-27  8:57   ` Steve Kilbane
2001-11-27 14:39     ` Boyd Roberts
2001-11-27 19:56       ` Steve Kilbane
2001-11-27 22:26         ` Boyd Roberts
2001-11-29 16:55           ` Matt
2001-11-27 10:16   ` Thomas Bushnell, BSG
2001-11-27 18:59     ` Dan Cross
2001-11-26 11:09 nigel
2001-11-24 10:48 forsyth
     [not found] <rsc@plan9.bell-labs.com>
2001-11-24  5:32 ` Russ Cox
2001-11-24 20:04   ` Scott Schwartz
2001-11-24  3:26 Scott Schwartz
2001-11-23 11:58 forsyth
2001-11-26 11:07 ` Sean M. Doran
2001-11-26 19:22   ` Dan Cross
2001-11-27 10:16     ` Thomas Bushnell, BSG
2001-11-27 18:55       ` Dan Cross
2001-11-23  9:44 forsyth
2001-11-26  9:59 ` Thomas Bushnell, BSG
2001-11-21 23:38 David Gordon Hogan
2001-11-21 23:59 ` Andrew Simmons
2001-11-22  9:57   ` Thomas Bushnell, BSG
2001-11-21 23:20 Andrew Simmons
2001-11-26 10:57 ` Sean M. Doran
2001-11-26 19:11   ` Dan Cross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).