Re: [9fans] Nagle algorithm

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] Nagle algorithm
@ 2001-11-27  5:31 David Gordon Hogan
  0 siblings, 0 replies; 37+ messages in thread
From: David Gordon Hogan @ 2001-11-27  5:31 UTC (permalink / raw)
  To: 9fans

> > That makes sense, thanks. Not a problem for my application, but it's good
> > to be aware of a potential problem.
>
> (It should be noted that the same thing proved true of X applications;
> Nagle isn't just bad for Plan 9 RPC's; it's bad for *any* RPC
> application.)

Sorry to quibble...  technically, the X protocol isn't an RPC
protocol.  Events and errors are asynchronous; many of
the requests do not require a response.  There are some
so-called ``round trip requests'' which is where the RPC
behaviour comes in.

One such request is the one that fetches a property from
a window; this accounts for the often excrutiatingly slow
startup times for X applications, which have to communicate
with the window manager through a whole slew of XGetProperty
requests and their ilk.  Motif is particularly bad in this regard: the
ICCCM wasn't complex enough for them, they added more
properties, which basically only MWM knows about.

But, anyway, X was deliberately engineered to be as
insensitive to round trip delays as the designers thought
possible, by the liberal use of its asynchronous model.  Plan
9 achieves much the same effect by allowing multiple draw(3)
requests to be sent in a single write.  They are buffered up by
the draw library.  I run rio and acme over my cpu or drawterm
connection from home; they are quite snappy.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-29 17:33 jmk
  0 siblings, 0 replies; 37+ messages in thread
From: jmk @ 2001-11-29 17:33 UTC (permalink / raw)
  To: 9fans

On Thu Nov 29 11:56:19 EST 2001, matt@proweb.co.uk wrote:
> XP tries tcp managing :
> http://theregister.co.uk/content/4/23090.html

http://www.theregister.co.uk/content/28/23081.html

is much more useful.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-27 22:26         ` Boyd Roberts
@ 2001-11-29 16:55           ` Matt
  0 siblings, 0 replies; 37+ messages in thread
From: Matt @ 2001-11-29 16:55 UTC (permalink / raw)
  To: 9fans

XP tries tcp managing :
http://theregister.co.uk/content/4/23090.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-27 19:56       ` Steve Kilbane
@ 2001-11-27 22:26         ` Boyd Roberts
  2001-11-29 16:55           ` Matt
  0 siblings, 1 reply; 37+ messages in thread
From: Boyd Roberts @ 2001-11-27 22:26 UTC (permalink / raw)
  To: 9fans

> All true. I was thinking more of just nailing the ones that happen
> to cause the most hassle and ignoring the rest, including when you
> get those protocols on other ports. Very little work, easily
> configurable. Whether it would have any effect is a different matter.

Yes and I would just turn off Nagle -- end of problem :)

This ignores 'the rest' where 'the rest' == all.

A lot of work has gone into 'fixing' TCP.  As Living Colour said,
we should just:

    Leave It Alone

    http://www.masadsign.nl/llama/Lyrics/stain.html#03




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-27 14:39     ` Boyd Roberts
@ 2001-11-27 19:56       ` Steve Kilbane
  2001-11-27 22:26         ` Boyd Roberts
  0 siblings, 1 reply; 37+ messages in thread
From: Steve Kilbane @ 2001-11-27 19:56 UTC (permalink / raw)
  To: 9fans

Boyd:
> Yes that is too simplistic.  Potentially you have large N protocol
> rules and some of those protocols do not behave in ways you can
> predict or monitor without carrying around a huge amount of baggage.

All true. I was thinking more of just nailing the ones that happen
to cause the most hassle and ignoring the rest, including when you
get those protocols on other ports. Very little work, easily
configurable. Whether it would have any effect is a different matter.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-27 10:16   ` Thomas Bushnell, BSG
@ 2001-11-27 18:59     ` Dan Cross
  0 siblings, 0 replies; 37+ messages in thread
From: Dan Cross @ 2001-11-27 18:59 UTC (permalink / raw)
  To: 9fans

In article <87bshowweg.fsf@becket.becket.net> you write:
>cross@math.psu.edu (Dan Cross) writes:
>> It has to do with abstraction, on one hand, and proper seperation
>> of function.  The TCP has no business trying to guess what my
>> application is doing; that's for my application to decide.  TCP
>> should be abstracted away from that stuff.
>
>But now, under your model, in order to get adequate network behavior,
>your application needs to know things like the MSS!

No, it doesn't.  It just has to pick a reasonable default.  I have
yet to hear a justification of why buffering in the user land libraries
or applications has to know about the MSS to get ``adequate network
behavior.''

	- Dan C.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-27 10:16     ` Thomas Bushnell, BSG
@ 2001-11-27 18:55       ` Dan Cross
  0 siblings, 0 replies; 37+ messages in thread
From: Dan Cross @ 2001-11-27 18:55 UTC (permalink / raw)
  To: 9fans

In article <87g070wwg7.fsf@becket.becket.net> you write:
>cross@math.psu.edu (Dan Cross) writes:
>
>> If you wanted to buffer 200 ``a's'' before sending them, why not do that
>> explicitly?
>
>Because the only way to know the correct number to buffer involves
>knowing transport level information like the MSS and so forth.

Uhh, why?  The only reason I can think of is that sizeof buffer % MSS
might be a rather small number, but if sizeof buffer is large enough,
then that gets amortized over the lifetime of the connection.  What's
more, if coalesion can occur without any additional latency (which is
likely over a long haul network, which is where you care anyway), even
that goes away.  What's more, you cut down on OS overhead by buffering
in userland.

	- Dan C.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-27  8:57   ` Steve Kilbane
@ 2001-11-27 14:39     ` Boyd Roberts
  2001-11-27 19:56       ` Steve Kilbane
  0 siblings, 1 reply; 37+ messages in thread
From: Boyd Roberts @ 2001-11-27 14:39 UTC (permalink / raw)
  To: 9fans

> I'm being a little simplistic here, but since there are assigned
> port numbers, why not have the bind/connect subsystems make the
> decision based on port, with a default for anything else?

Yes that is too simplistic.  Potentially you have large N protocol
rules and some of those protocols do not behave in ways you can
predict or monitor without carrying around a huge amount of baggage.

A silly example is HTTP:  It should be on port 80, but lots of
people like using 8080 or other strange port #s.

As other's have said, it should be left to the application.

So far, only TCP/IP has been discussed, but do you build all
those rules etc into all the other protocols as well?

I don't think so.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-27 11:28 forsyth
  0 siblings, 0 replies; 37+ messages in thread
From: forsyth @ 2001-11-27 11:28 UTC (permalink / raw)
  To: 9fans

>>nagle delay will do sod all.   as the rfc defines it, it isn't invoked,

sorry, that got posted before i could add this: ``provided the data continues
to arrive in a timely way, as is normal, because the MSS is often
not an exact multiple of the block size''.  actually monitoring various
protocols here i don't see much of a problem.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-27 11:00 forsyth
  0 siblings, 0 replies; 37+ messages in thread
From: forsyth @ 2001-11-27 11:00 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]

and that's what happened even with nagle enabled?

that's one reason i asked about the source of the ``vast majority''
of packets on the net.  if they are as you say, then i can't see nagle
being invoked by any of those applications.   they already do their
own buffering (i was looking at the code for them just now).
ftp for instance writes in blocks of 4k (it's fairly old code) and
others do better than that.  that's much larger than the
typical MSS (which is indeed rather smaller than 4k), so the
nagle delay will do sod all.   as the rfc defines it, it isn't invoked,
except for the FTP requests, which are small,
and then all it does is delay them pointlessly, since nothing
will be added to those!

there is buffering in the tcp/ip implementation,
so concurrency isn't affected by the mismatch
between MSS and write buffer size.   (ie, ftp can dump its 4k and read
the next block of the file whilst the previous ones are going out.)
same for http.  does apache dribble the bytes out in titchy quantities?

the second point is that the MSS provided by TCP/IP, if based only on
the host interface's MTU configuration, isn't sufficient anyway.  path
MTU discovery could be used i suppose except that it was added in retrospect,
relies too much on routers, gateways and firewalls all behaving well,
all the way there, and even then there are non-trivial problems
with it and tcp/ip.

[-- Attachment #2: Type: message/rfc822, Size: 1925 bytes --]

To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Tue, 27 Nov 2001 10:16:18 GMT
Message-ID: <87g070wwg7.fsf@becket.becket.net>

cross@math.psu.edu (Dan Cross) writes:

> If you wanted to buffer 200 ``a's'' before sending them, why not do that
> explicitly?

Because the only way to know the correct number to buffer involves
knowing transport level information like the MSS and so forth.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-26 19:22   ` Dan Cross
@ 2001-11-27 10:16     ` Thomas Bushnell, BSG
  2001-11-27 18:55       ` Dan Cross
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Bushnell, BSG @ 2001-11-27 10:16 UTC (permalink / raw)
  To: 9fans

cross@math.psu.edu (Dan Cross) writes:

> If you wanted to buffer 200 ``a's'' before sending them, why not do that
> explicitly?

Because the only way to know the correct number to buffer involves
knowing transport level information like the MSS and so forth.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-26 19:28 ` Dan Cross
  2001-11-27  8:57   ` Steve Kilbane
@ 2001-11-27 10:16   ` Thomas Bushnell, BSG
  2001-11-27 18:59     ` Dan Cross
  1 sibling, 1 reply; 37+ messages in thread
From: Thomas Bushnell, BSG @ 2001-11-27 10:16 UTC (permalink / raw)
  To: 9fans

cross@math.psu.edu (Dan Cross) writes:

> It has to do with abstraction, on one hand, and proper seperation
> of function.  The TCP has no business trying to guess what my
> application is doing; that's for my application to decide.  TCP
> should be abstracted away from that stuff.

But now, under your model, in order to get adequate network behavior,
your application needs to know things like the MSS!


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-26 19:28 ` Dan Cross
@ 2001-11-27  8:57   ` Steve Kilbane
  2001-11-27 14:39     ` Boyd Roberts
  2001-11-27 10:16   ` Thomas Bushnell, BSG
  1 sibling, 1 reply; 37+ messages in thread
From: Steve Kilbane @ 2001-11-27  8:57 UTC (permalink / raw)
  To: 9fans

Dan wrote:
> It has to do with abstraction, on one hand, and proper seperation
> of function.  The TCP has no business trying to guess what my
> application is doing; that's for my application to decide.  TCP
> should be abstracted away from that stuff.

I'm being a little simplistic here, but since there are assigned
port numbers, why not have the bind/connect subsystems make the
decision based on port, with a default for anything else?

steve




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-26 11:23 Sean M. Doran
@ 2001-11-26 19:28 ` Dan Cross
  2001-11-27  8:57   ` Steve Kilbane
  2001-11-27 10:16   ` Thomas Bushnell, BSG
  0 siblings, 2 replies; 37+ messages in thread
From: Dan Cross @ 2001-11-26 19:28 UTC (permalink / raw)
  To: 9fans

In article <20011126112312.80AC9C7901@cesium.clock.org> you write:
>| If that's a problem for your cat, you could convince
>| your cat to use Bio; and retain your abstraction as well.
>
>And how does your write(2) or wrapper around write(2)
>know what the present maximum segment size is, given
>that it can be altered at any time via a local interface
>MTU change or a change in the path MTU?

Why does my write(2) care?  If my write-wrapper picks a decent
default, does it matter much?  I suppose one could argue that
there might always be a runt packet on the end when the buffer
is written, depending on, eg, the MTU and size of the buffer,
but I claim that expense will be amortized over the lifetime
of the connection.

>How does pulling that into a wrapper which is used
>almost universally fundamentally differ from retaining
>in the TCP code with a system call which allows it to
>be turned off on those rare occasions you want to do so?

It has to do with abstraction, on one hand, and proper seperation
of function.  The TCP has no business trying to guess what my
application is doing; that's for my application to decide.  TCP
should be abstracted away from that stuff.

	- Dan C.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-26 11:07 ` Sean M. Doran
@ 2001-11-26 19:22   ` Dan Cross
  2001-11-27 10:16     ` Thomas Bushnell, BSG
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Cross @ 2001-11-26 19:22 UTC (permalink / raw)
  To: 9fans

In article <ytu1vhu6xb.fsf@cesium.clock.org> you write:
>forsyth@caldo.demon.co.uk writes:
>> ...  Nagle's algorithm for coalescing short segments,
>> which is the one that seems ill-advised to me
>
>So, how many packets should the following code segment generate?
>
>    #include <unistd.h>
>    /* for int usleep(useconds_t microseconds); */
>
>    for(i=0; i < 200; i++) {
>             write(tcpfd, "a", 1);
>             usleep(100);
>    }

Err, well, 200.  Or less if the TCP implementation can coallese packets
without incurring any additional latency.  But that's what you wanted,
right?  It's what you wrote.

If you wanted to buffer 200 ``a's'' before sending them, why not do that
explicitly?

	#include <stdio.h>
	#include <unistd.h>

	FILE	*fd = fdopen(tcpfd, "w");
	for (i = 0; i < 200; i++) {
		fwrite("a", 1, 1, fd);
		usleep(100);
	}
	fflush(fd);

If I'm not mistaken, that achieves the same effect, with far less
overhead.  The Plan 9 way might be to write a file system that overlays
/net/tcp and introduces a buffering layer, kind of like what ssl(3)
does.

>> i do find that quite a lot of this smacks of trying to
>> compensate for inadequate data provided by the protocol or the network.
>
>Abstraction and information-hiding are usually considered helpful.

Right!  So why do I want my TCP trying to figure out what my application
is doing?  All of a sudden, I'm no longer hiding information from the TCP;
my abstraction is gone.

>Who wants to muck around with RTT and pMTU in an application
>which just might happen to write to a TCP connection?

Why does the application care?  I can pick a reasonable default; say,
8 or 16 or 32 or even 64 KB.

>Perhaps cat(1) isn't complicated enough yet?

cat(1) picks a reasonable default.  :-)

	- Dan C.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-23  9:34 ` Thomas Bushnell, BSG
@ 2001-11-26 19:13   ` Dan Cross
  0 siblings, 0 replies; 37+ messages in thread
From: Dan Cross @ 2001-11-26 19:13 UTC (permalink / raw)
  To: 9fans

In article <87oflu5gez.fsf@becket.becket.net> you write:
>Well, this is overstated.  The Nagle algorithm (which we should be
>calling "slow start") is actually hugely important on the net as a
>whole; it's the Right Thing, albeit it's wrong for certain kinds of
>rapid response RPC transactions.  But for the *vast* majority of IP
>packets, it's completely right.

Well, Nagle is a localized optimization, essentially.  Slow start is
very different.  I fail to see why it's the right thing to put it in
the TCP if the host can handle essentially the same thing at a different
layer.  For instance, bio would accomplish most of what I'd want Nagle
to do, but not force the policy on me.

>And, incidentally, it's not optional; the Host Requirements RFC's,
>IIRC, require it.

As Forsyth said, it's recommended, but not required.

>> i claim it's not the TCP/IP subsystem's responsibility to delay
>> sending something so that it can buffer up writes to make larger
>> packets.  that's for stdio or bio (or local OS equivalent).
>
>Sadly, that's not adequate for lots of good reasons, too much to go
>into here, but it has been considered, and it's just not enough.

I'm sorry, but I'm afraid that explanation is not adequate.

	- Dan C.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-26 10:57 ` Sean M. Doran
@ 2001-11-26 19:11   ` Dan Cross
  0 siblings, 0 replies; 37+ messages in thread
From: Dan Cross @ 2001-11-26 19:11 UTC (permalink / raw)
  To: 9fans

In article <yty9ktu7er.fsf@cesium.clock.org> you write:
>Moving Nagle into the application, as is suggested later
>in the thread, is awkward: it needs to know the segment
>size, which is affected by the receiver's advertised MSS,
>the local MTU and the path MTU.  The latter two of these can
>change during the lifetime of a TCP conversation.

Err, why does it need to know those things?  All it cares about is
sending as much data as possible in a single write() call, right?  It
seems it could pack something like a 64k*o* buffer as full as possible,
and then write that, and achieve the same effect.

>The usual complaint about Nagle is that it punishes
>applications which deliberately exchange short packets.
>However, it is almost always possible to turn Nagle off on
>a per-connection basis, for those (relatively rare)
>applications which are upset by the increased transmit delay.
>
>The problem with simply "eliminating" Nagle is that a
>bulk-transfer application which DELIBERATELY schedules
>lots of small writes that do not queue at the TCP send
>level, will generate lots of small packets.  This causes
>the TCP congestion window to inflate faster, and gives such
>applications an unfair advantage over applications which
>make larger writes that queue at the TCP send level, where
>they can be coalesced.   The inflated congestion window
>means that when congestion does happen (TCP deliberately
>provokes it), the small-write/small-segment application
>will get a larger share of the bottleneck's bandwidth.
>This is perverse behaviour, since the application in
>question is a less efficient bandwidth consumer, since it
>generates more header data per user datum.

Err, if you can turn off Nagle on a per-connection basis, and the
application you describe is being purposely malicious, what's
preventing that application from turning off Nagle and then attempting
the rather antisocial behavior you describe?  A far better solution for
the application writer, if s/he really cares so much about it, is to
write into a buffer, and then send the entire buffer when it's full.
This is what, eg, stdio and bio do.  If nothing else, this will cut
down on operating system overhead incurred from making lots of system
calls, and moving data between user and kernel space, etc, etc, etc.
It seems like a good thing all around, and it really does seem
misguided to have the TCP trying to guess what I'm doing at the
application level.  I (presumably) know what I want my application to
do; I'd rather the OS's network implementation not get in my way.

>The generation of large amounts of small packets should
>always been seen as antisocial.

It seems, then, that certain classes of applications are inherently
antisocial?

	- Dan C.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-26 12:07 Fco.J.Ballesteros
  0 siblings, 0 replies; 37+ messages in thread
From: Fco.J.Ballesteros @ 2001-11-26 12:07 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

It just doesn't know. You'll have to choose a reasonable
value.

The difference is that the tcp code would just
send the message; without trying to guess what the
application needs. For example, your shell may
be happy using `line buffering' instead of the probably
big buffer your cat would use instead. A debugger may
use no buffer at all.

[-- Attachment #2: Type: message/rfc822, Size: 1866 bytes --]

From: smd@clock.org (Sean M. Doran)
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Mon, 26 Nov 2001 03:23:12 -0800 (PST)
Message-ID: <20011126112312.80AC9C7901@cesium.clock.org>

nigel@9fs.org writes:

| Depends on your implementation of write().

and Fco.J.Ballesteros <nemo@plan9.escet.urjc.es> writes:

| If that's a problem for your cat, you could convince
| your cat to use Bio; and retain your abstraction as well.

And how does your write(2) or wrapper around write(2)
know what the present maximum segment size is, given
that it can be altered at any time via a local interface
MTU change or a change in the path MTU?

How does pulling that into a wrapper which is used
almost universally fundamentally differ from retaining
in the TCP code with a system call which allows it to
be turned off on those rare occasions you want to do so?

	Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-26 11:52 Fco.J.Ballesteros
  0 siblings, 0 replies; 37+ messages in thread
From: Fco.J.Ballesteros @ 2001-11-26 11:52 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 113 bytes --]

If that's a problem for your cat, you could convince
your cat to use Bio; and retain your abstraction as well.

[-- Attachment #2: Type: message/rfc822, Size: 2277 bytes --]

From: "Sean M. Doran" <smd@cesium.clock.org>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Mon, 26 Nov 2001 12:07:28 +0100
Message-ID: <ytu1vhu6xb.fsf@cesium.clock.org>

forsyth@caldo.demon.co.uk writes:

> ...  Nagle's algorithm for coalescing short segments,
> which is the one that seems ill-advised to me

So, how many packets should the following code segment generate?

    #include <unistd.h>
    /* for int usleep(useconds_t microseconds); */

    for(i=0; i < 200; i++) {
             write(tcpfd, "a", 1);
             usleep(100);
    }

> i do find that quite a lot of this smacks of trying to
> compensate for inadequate data provided by the protocol or the network.

Abstraction and information-hiding are usually considered helpful.
Who wants to muck around with RTT and pMTU in an application
which just might happen to write to a TCP connection?
Perhaps cat(1) isn't complicated enough yet?

     Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-26 11:23 Sean M. Doran
  2001-11-26 19:28 ` Dan Cross
  0 siblings, 1 reply; 37+ messages in thread
From: Sean M. Doran @ 2001-11-26 11:23 UTC (permalink / raw)
  To: 9fans

nigel@9fs.org writes:

| Depends on your implementation of write().

and Fco.J.Ballesteros <nemo@plan9.escet.urjc.es> writes:

| If that's a problem for your cat, you could convince
| your cat to use Bio; and retain your abstraction as well.

And how does your write(2) or wrapper around write(2)
know what the present maximum segment size is, given
that it can be altered at any time via a local interface
MTU change or a change in the path MTU?

How does pulling that into a wrapper which is used
almost universally fundamentally differ from retaining
in the TCP code with a system call which allows it to
be turned off on those rare occasions you want to do so?

	Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-26 11:09 nigel
  0 siblings, 0 replies; 37+ messages in thread
From: nigel @ 2001-11-26 11:09 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 46 bytes --]

Depends on your implementation of write().

[-- Attachment #2: Type: message/rfc822, Size: 2274 bytes --]

From: "Sean M. Doran" <smd@cesium.clock.org>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Mon, 26 Nov 2001 12:07:28 +0100
Message-ID: <ytu1vhu6xb.fsf@cesium.clock.org>

forsyth@caldo.demon.co.uk writes:

> ...  Nagle's algorithm for coalescing short segments,
> which is the one that seems ill-advised to me

So, how many packets should the following code segment generate?

    #include <unistd.h>
    /* for int usleep(useconds_t microseconds); */

    for(i=0; i < 200; i++) {
             write(tcpfd, "a", 1);
             usleep(100);
    }

> i do find that quite a lot of this smacks of trying to
> compensate for inadequate data provided by the protocol or the network.

Abstraction and information-hiding are usually considered helpful.
Who wants to muck around with RTT and pMTU in an application
which just might happen to write to a TCP connection?
Perhaps cat(1) isn't complicated enough yet?

     Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-23 11:58 forsyth
@ 2001-11-26 11:07 ` Sean M. Doran
  2001-11-26 19:22   ` Dan Cross
  0 siblings, 1 reply; 37+ messages in thread
From: Sean M. Doran @ 2001-11-26 11:07 UTC (permalink / raw)
  To: 9fans

forsyth@caldo.demon.co.uk writes:

> ...  Nagle's algorithm for coalescing short segments,
> which is the one that seems ill-advised to me

So, how many packets should the following code segment generate?

    #include <unistd.h>
    /* for int usleep(useconds_t microseconds); */

    for(i=0; i < 200; i++) {
             write(tcpfd, "a", 1);
             usleep(100);
    }

> i do find that quite a lot of this smacks of trying to
> compensate for inadequate data provided by the protocol or the network.

Abstraction and information-hiding are usually considered helpful.
Who wants to muck around with RTT and pMTU in an application
which just might happen to write to a TCP connection?
Perhaps cat(1) isn't complicated enough yet?

     Sean.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-21 23:20 Andrew Simmons
@ 2001-11-26 10:57 ` Sean M. Doran
  2001-11-26 19:11   ` Dan Cross
  0 siblings, 1 reply; 37+ messages in thread
From: Sean M. Doran @ 2001-11-26 10:57 UTC (permalink / raw)
  To: 9fans

Andrew Simmons <andrew@mbmnz.co.nz> writes:

> What's wrong with Nagle?

What's RIGHT with Nagle:

       - bigger packets, on average
         - more bytes/packet -> less wasted header overhead
         - more bytes/packet -> fewer packets per bit per second

         both of these are very good for the network,
         since the cost of modern day high-speed
         networking is driven by header processing and is
         pps limited, while the cost of practical
         networking (last mile, and in places like
         New Zealand) is driven by total amount of
         traffic, and you don't want to pay for headers
         rather than data useful to the end user

       - Nagle does not inflate the congestion window or
         delay during a bulk transfer: if there are lots
         of bytes being transmitted, the Nagle delay
         doesn't happen

Moving Nagle into the application, as is suggested later
in the thread, is awkward: it needs to know the segment
size, which is affected by the receiver's advertised MSS,
the local MTU and the path MTU.  The latter two of these can
change during the lifetime of a TCP conversation.

The usual complaint about Nagle is that it punishes
applications which deliberately exchange short packets.
However, it is almost always possible to turn Nagle off on
a per-connection basis, for those (relatively rare)
applications which are upset by the increased transmit delay.

The problem with simply "eliminating" Nagle is that a
bulk-transfer application which DELIBERATELY schedules
lots of small writes that do not queue at the TCP send
level, will generate lots of small packets.  This causes
the TCP congestion window to inflate faster, and gives such
applications an unfair advantage over applications which
make larger writes that queue at the TCP send level, where
they can be coalesced.   The inflated congestion window
means that when congestion does happen (TCP deliberately
provokes it), the small-write/small-segment application
will get a larger share of the bottleneck's bandwidth.
This is perverse behaviour, since the application in
question is a less efficient bandwidth consumer, since it
generates more header data per user datum.

The generation of large amounts of small packets should
always been seen as antisocial.

      Sean.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-23  9:44 forsyth
@ 2001-11-26  9:59 ` Thomas Bushnell, BSG
  0 siblings, 0 replies; 37+ messages in thread
From: Thomas Bushnell, BSG @ 2001-11-26  9:59 UTC (permalink / raw)
  To: 9fans

forsyth@caldo.demon.co.uk writes:

> and what would those packets be?

Bulk data transfer, web transactions, SMTP connections, and the like.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-24  5:32 ` Russ Cox
@ 2001-11-24 20:04   ` Scott Schwartz
  0 siblings, 0 replies; 37+ messages in thread
From: Scott Schwartz @ 2001-11-24 20:04 UTC (permalink / raw)
  To: 9fans

| > > i claim it's not the TCP/IP subsystem's responsibility to delay
| > > sending something so that it can buffer up writes to make larger
| > > packets.  that's for stdio or bio (or local OS equivalent).
| >
| > If there was a way for the user say when to set TCP's PUSH bit,
| > would that do the job?
|
| Why would you want this?  I thought the above
| was a very effective argument.

If user level buffering isn't needed, then why use it?  TCP is what it
is; we might as well use what it offers.  The normal unix api doesn't
give the user a way to tell tcp when pending data should be flushed,
but maybe it should have.  I believe that windows and linux have ways
to do that, so apparently some other people think so too.  (But Charles
says that Nagle turns off PSH.  Oh well.)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-24 10:48 forsyth
  0 siblings, 0 replies; 37+ messages in thread
From: forsyth @ 2001-11-24 10:48 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

not only that, but the Nagle algorithm explicitly ignores the PSH bit
anyway:

                 The Nagle algorithm is generally as follows:

                      If there is unacknowledged data (i.e., SND.NXT >
                      SND.UNA), then the sending TCP buffers all user
                      data (regardless of the PSH bit), until the
                      outstanding data has been acknowledged or until
                      the TCP can send a full-sized segment (Eff.snd.MSS
                      bytes; see Section 4.2.2.6).


[-- Attachment #2: Type: message/rfc822, Size: 1733 bytes --]

To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Sat, 24 Nov 2001 00:32:31 -0500
Message-ID: <20011124053233.2D3F019A57@mail.cse.psu.edu>

> > i claim it's not the TCP/IP subsystem's responsibility to delay
> > sending something so that it can buffer up writes to make larger
> > packets.  that's for stdio or bio (or local OS equivalent).
>
> If there was a way for the user say when to set TCP's PUSH bit,
> would that do the job?

Why would you want this?  I thought the above
was a very effective argument.

Russ

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-24  5:32 ` Russ Cox
  2001-11-24 20:04   ` Scott Schwartz
  0 siblings, 1 reply; 37+ messages in thread
From: Russ Cox @ 2001-11-24  5:32 UTC (permalink / raw)
  To: 9fans

> > i claim it's not the TCP/IP subsystem's responsibility to delay
> > sending something so that it can buffer up writes to make larger
> > packets.  that's for stdio or bio (or local OS equivalent).
>
> If there was a way for the user say when to set TCP's PUSH bit,
> would that do the job?

Why would you want this?  I thought the above
was a very effective argument.

Russ


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-24  3:26 Scott Schwartz
  0 siblings, 0 replies; 37+ messages in thread
From: Scott Schwartz @ 2001-11-24  3:26 UTC (permalink / raw)
  To: 9fans

> i claim it's not the TCP/IP subsystem's responsibility to delay
> sending something so that it can buffer up writes to make larger
> packets.  that's for stdio or bio (or local OS equivalent).

If there was a way for the user say when to set TCP's PUSH bit,
would that do the job?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-23 11:58 forsyth
  2001-11-26 11:07 ` Sean M. Doran
  0 siblings, 1 reply; 37+ messages in thread
From: forsyth @ 2001-11-23 11:58 UTC (permalink / raw)
  To: 9fans

>>Well, this is overstated.  The Nagle algorithm (which we should be
>>calling "slow start") is actually hugely important on the net as a

rfc1122 correctly distinguishes `slow start' (by Van Jacobsen), which
again seems a reasonable responsibility for TCP/IP, from Nagle's algorithm
for coalescing short segments, which is the one that seems ill-advised
to me, however effective in benchmarks.  the same section of the rfc
discusses avoiding silly windows which also seems fine
to me, and perhaps that was due to Nagle too; i haven't checked
a copy of the original paper.  (mind you, checking the original is quite often
sensible because you discover that some idea has taken on
a life of its own out of the original context.)

i do find that quite a lot of this smacks of trying to
compensate for inadequate data provided by the protocol or the network.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-23  9:44 forsyth
  2001-11-26  9:59 ` Thomas Bushnell, BSG
  0 siblings, 1 reply; 37+ messages in thread
From: forsyth @ 2001-11-23  9:44 UTC (permalink / raw)
  To: 9fans

>>And, incidentally, it's not optional; the Host Requirements RFC's,
>>IIRC, require it.

no, it says SHOULD not MUST, and the justification isn't particularly
compelling.

>>But for the *vast* majority of IP packets, it's completely right.

and what would those packets be?


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-22 13:24 forsyth
  2001-11-22 13:29 ` Boyd Roberts
@ 2001-11-23  9:34 ` Thomas Bushnell, BSG
  2001-11-26 19:13   ` Dan Cross
  1 sibling, 1 reply; 37+ messages in thread
From: Thomas Bushnell, BSG @ 2001-11-23  9:34 UTC (permalink / raw)
  To: 9fans

forsyth@caldo.demon.co.uk writes:

> i think it's fundamentally misconceived and should at least never be
> on by default.

Well, this is overstated.  The Nagle algorithm (which we should be
calling "slow start") is actually hugely important on the net as a
whole; it's the Right Thing, albeit it's wrong for certain kinds of
rapid response RPC transactions.  But for the *vast* majority of IP
packets, it's completely right.

And, incidentally, it's not optional; the Host Requirements RFC's,
IIRC, require it.

> i claim it's not the TCP/IP subsystem's responsibility to delay
> sending something so that it can buffer up writes to make larger
> packets.  that's for stdio or bio (or local OS equivalent).

Sadly, that's not adequate for lots of good reasons, too much to go
into here, but it has been considered, and it's just not enough.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-22 13:24 forsyth
@ 2001-11-22 13:29 ` Boyd Roberts
  2001-11-23  9:34 ` Thomas Bushnell, BSG
  1 sibling, 0 replies; 37+ messages in thread
From: Boyd Roberts @ 2001-11-22 13:29 UTC (permalink / raw)
  To: 9fans

> i claim it's not the TCP/IP subsystem's responsibility to delay
> sending something so that it can buffer up writes to make larger
> packets.  that's for stdio or bio (or local OS equivalent).  (an
> interesting variant was in a version of the Unix streams subsystem
> where there was a buffering module that could be pushed onto a tcp/ip
> stream.)

I'd totally agree.  That was the 8th Edition bufld (sp?).


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-22 13:24 forsyth
  2001-11-22 13:29 ` Boyd Roberts
  2001-11-23  9:34 ` Thomas Bushnell, BSG
  0 siblings, 2 replies; 37+ messages in thread
From: forsyth @ 2001-11-22 13:24 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

i think it's fundamentally misconceived and should at least never be
on by default.  dhog's description was ``[it] attempts to buffer up
multiple writes and send them as a single TCP packet -- where the
writes are small''.  what that actually means is that if the writes
are small it typically delays sending out anything at all (subject to
certain specific conditions) until a given interval has elapsed or
its idea of enough data has arrived, and that interval can be large.  in other
words, give it a few bytes you wish to send now and it demands more,
delaying if that demand is not met.  it's not just RPC that's affected.

i claim it's not the TCP/IP subsystem's responsibility to delay
sending something so that it can buffer up writes to make larger
packets.  that's for stdio or bio (or local OS equivalent).  (an
interesting variant was in a version of the Unix streams subsystem
where there was a buffering module that could be pushed onto a tcp/ip
stream.)  it's fine for TCP/IP to adapt to the network and adjust the
rate at which bytes are shuffled in and out, and adapt retransmission
efforts to apparent error rates, and since TCP/IP doesn't preserve
delimiters coalesce new data (from writes large or small) into already
queued data to make larger output packets.  those are all within its
responsibility, i think, and any delays that result are arguably
inherent in the network.

if applications are stuttering bytes needlessly, deal with it there.

[-- Attachment #2: Type: message/rfc822, Size: 1994 bytes --]

To: 9fans@cse.psu.edu
Subject: Re: [9fans] Nagle algorithm
Date: Thu, 22 Nov 2001 09:57:11 GMT
Message-ID: <87itc31syv.fsf@becket.becket.net>

andrew@mbmnz.co.nz (Andrew Simmons) writes:

> That makes sense, thanks. Not a problem for my application, but it's good
> to be aware of a potential problem.

(It should be noted that the same thing proved true of X applications;
Nagle isn't just bad for Plan 9 RPC's; it's bad for *any* RPC
application.)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-21 23:59 ` Andrew Simmons
@ 2001-11-22  9:57   ` Thomas Bushnell, BSG
  0 siblings, 0 replies; 37+ messages in thread
From: Thomas Bushnell, BSG @ 2001-11-22  9:57 UTC (permalink / raw)
  To: 9fans

andrew@mbmnz.co.nz (Andrew Simmons) writes:

> That makes sense, thanks. Not a problem for my application, but it's good
> to be aware of a potential problem.

(It should be noted that the same thing proved true of X applications;
Nagle isn't just bad for Plan 9 RPC's; it's bad for *any* RPC
application.)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
  2001-11-21 23:38 David Gordon Hogan
@ 2001-11-21 23:59 ` Andrew Simmons
  2001-11-22  9:57   ` Thomas Bushnell, BSG
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Simmons @ 2001-11-21 23:59 UTC (permalink / raw)
  To: 9fans

That makes sense, thanks. Not a problem for my application, but it's good
to be aware of a potential problem.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [9fans] Nagle algorithm
@ 2001-11-21 23:38 David Gordon Hogan
  2001-11-21 23:59 ` Andrew Simmons
  0 siblings, 1 reply; 37+ messages in thread
From: David Gordon Hogan @ 2001-11-21 23:38 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

It's inappropriate for RPC protocols.  Nagle's algorithm
attempts to buffer up multiple writes and send them as
a single TCP packet -- where the writes are small.  As
such, it's fine for telnet conversations where you don't
really want each keystroke generating a separate packet
if it can be avoided.  The mechansim is to delay the
transmission of small packets in the hope that more
data will show up.

Consider 9P over TCP.  Many of the messages are quite
small, and will be delayed.  9P is RPC based, so latency
is a big concern.  The sender of the small message is
typically blocked waiting for a response, so Nagle's
hope that more data will be sent is (in this case) invalid.
All is succeeds in doing is drastically increasing the
effective round trip time.

This was a big issue for drawterm.  Use of the mouse
in drawterm results in small Rread messages being
sent, in response to reads of /dev/mouse at the
other end.  These were being delayed by a significant
fraction of a second, resulting in really sluggish
interactive performance, until we figured out that
Nagle's algorithm was responsible and inserted the
appropriate setsockopt call into drawterm.

[-- Attachment #2: Type: message/rfc822, Size: 1835 bytes --]

From: Andrew Simmons <andrew@mbmnz.co.nz>
To: 9fans@cse.psu.edu
Subject: [9fans] Nagle algorithm
Date: Thu, 22 Nov 2001 12:20:50 +1300
Message-ID: <3.0.6.32.20011122122050.00974e88@pop3.clear.net.nz>

Slightly off-topic, but I seem to detect a certain hostility to the Nagle
algorithm in this group, whereas the conventional wisdom in the Winsock
world where I currently spend my days is that the Nagle algorithm should
almost never be disabled. What's wrong with Nagle? (and please, no cracks
about wisdom and Windows programming being mutually exclusive).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [9fans] Nagle algorithm
@ 2001-11-21 23:20 Andrew Simmons
  2001-11-26 10:57 ` Sean M. Doran
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Simmons @ 2001-11-21 23:20 UTC (permalink / raw)
  To: 9fans

Slightly off-topic, but I seem to detect a certain hostility to the Nagle
algorithm in this group, whereas the conventional wisdom in the Winsock
world where I currently spend my days is that the Nagle algorithm should
almost never be disabled. What's wrong with Nagle? (and please, no cracks
about wisdom and Windows programming being mutually exclusive).

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2001-11-29 17:33 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-27  5:31 [9fans] Nagle algorithm David Gordon Hogan
  -- strict thread matches above, loose matches on Subject: below --
2001-11-29 17:33 jmk
2001-11-27 11:28 forsyth
2001-11-27 11:00 forsyth
2001-11-26 12:07 Fco.J.Ballesteros
2001-11-26 11:52 Fco.J.Ballesteros
2001-11-26 11:23 Sean M. Doran
2001-11-26 19:28 ` Dan Cross
2001-11-27  8:57   ` Steve Kilbane
2001-11-27 14:39     ` Boyd Roberts
2001-11-27 19:56       ` Steve Kilbane
2001-11-27 22:26         ` Boyd Roberts
2001-11-29 16:55           ` Matt
2001-11-27 10:16   ` Thomas Bushnell, BSG
2001-11-27 18:59     ` Dan Cross
2001-11-26 11:09 nigel
2001-11-24 10:48 forsyth
     [not found] <rsc@plan9.bell-labs.com>
2001-11-24  5:32 ` Russ Cox
2001-11-24 20:04   ` Scott Schwartz
2001-11-24  3:26 Scott Schwartz
2001-11-23 11:58 forsyth
2001-11-26 11:07 ` Sean M. Doran
2001-11-26 19:22   ` Dan Cross
2001-11-27 10:16     ` Thomas Bushnell, BSG
2001-11-27 18:55       ` Dan Cross
2001-11-23  9:44 forsyth
2001-11-26  9:59 ` Thomas Bushnell, BSG
2001-11-22 13:24 forsyth
2001-11-22 13:29 ` Boyd Roberts
2001-11-23  9:34 ` Thomas Bushnell, BSG
2001-11-26 19:13   ` Dan Cross
2001-11-21 23:38 David Gordon Hogan
2001-11-21 23:59 ` Andrew Simmons
2001-11-22  9:57   ` Thomas Bushnell, BSG
2001-11-21 23:20 Andrew Simmons
2001-11-26 10:57 ` Sean M. Doran
2001-11-26 19:11   ` Dan Cross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).